#

MGHPCC data center annual power downtime 5/22/18

Monday May 21, 2018 6:00 PM to 12:00 AM
Tuesday May 22, 2018
Wednesday May 23, 2018 12:00 AM to 8:00 PM

(last updated May 15th, 2018)

The upgrade is complete and Odyssey 3 is all CentOS7. See the CentOS7 Transition FAQ for additional help and tips. Also, for more Odyssey 3 info, see the O3 page.


Each year our primary data center, MGHPCC (Holyoke), performs a full power shutdown for electrical maintenance. This requires us to power down all FASRC systems at MGHPCC starting the evening before. It also allows us a window to fit in maintenance, including the CentOS 7 cluster upgrade, that would otherwise require us shutting off various resources during normal operations. Note that this power event will mean the termination of all running and scheduled jobs as power to the entire facility will be out.

SCHEDULE

  • Monday 5/21/18 at 6PM: All running jobs will be terminated and we will begin powering down all devices at MGHPCC/Holyoke. All scheduled jobs will also be dropped to avoid a 'thundering herd' situation on power-up.
  • Tuesday 5/22/18: Power will be out the entire day as MGHPCC performs their work. Harvard network updates 9pm-1am.
  • Wednesday 5/23/18: We expect to be back to normal operations by approximately 8PM. 
  • Office Hours will be held on Thursday, 5/24/18

WHAT IS AFFECTED

  • Resources in Holyoke will be affected for the duration of the event. This includes the compute cluster, scheduler, regal, storage, and other devices housed at MGHPCC/Holyoke.
  • Resources in Boston and Cambridge, including storage, are likely to be affected by network updates which will take place Tuesday night 5/22/18 between 9PM and 1AM. Please plan accordingly. 

The list of directly affected filesystems includes, but is not limited to: aagfs01, bulfs01, cepr, hbsfs04, holyflash, holyiml, holylfs, holyserv01, klecknergfs03, klecknergfs04, kovacfs01, kuang, mvogelsfs01, nese [all], nss2deep, pan2, pan3, regal, seasasfs01, seasfs01

CENTOS 7 UPGRADE (O3)

During this event, once basic power is available to us, we will also be upgrading all of our partitions to CentOS 7 as part of the "O3" (Odyssey 3) update. Please note this will affect all users of the cluster once complete on 5/23/18. NCF and ATLAS are not a part of this upgrade.

Be Aware: As the login nodes will also be upgraded, you will receive SSH key errors. See this FAQ page for instructions on clearing entries from your SSH known_hosts file.

More details on O3/CentOS 7 and test resources can be found on our O3 page.

Additional Office Hours will be held to help with the transition: 5/24 FAS, 5/29 HSPH, 5/30 FAS, 5/31 FAS, 6/5 HSPH, 6/6 FAS - See our events calendar for times and locations.

We encourage all cluster users to begin testing in advance of this upgrade.

 

 

Reminder: During the downtime, all jobs still running will be terminated on Monday evening. As power will be out at MGHPCC and the cluster will be upgraded to CentOS7, jobs cannot be paused, they must be stopped before we begin power-down. 

We will notify the community via our email lists when we are back to normal operations. You can also check back here or on our Status Page

Event Types: