Each year our primary data center, MGHPCC (Holyoke), performs a full power shutdown for electrical maintenance. This requires us to power down all FASRC systems at MGHPCC starting the evening before. It also allows us a window to fit in maintenance that would otherwise require us shutting off various resources during normal operations. Note that this power event will mean the termination of all running and scheduled jobs as power to the entire facility will be out.
- Evening Before: All running jobs will be terminated and we will begin powering down all devices at MGHPCC/Holyoke. All scheduled jobs will also be dropped to avoid a 'thundering herd' situation on power-up.
- Day OF: Power will be out the entire day as MGHPCC performs their work.
- Following Day: We will perform tasks after power-up and expect to be back to normal operations by [To Be Determined]
WHAT IS AFFECTED
- Resources in Holyoke will be affected for the duration of the event. This includes the compute cluster, scheduler, scratchlfs, storage, and other devices housed at MGHPCC/Holyoke.
- Resources in Boston and Cambridge, including storage, will also be affected, including home directories. Please plan accordingly as all resources will be affected at some point during the event.
COMPUTE OS UPDATE
During this event, once basic power is available to us, we will also be upgrading compute nodes to the latest CentOS. The is a minor version update. No impact after upgrade expected.
- LNET router rebuild and Lustre upgrade - Lustre (LFS) filesystems affected across the board
- Infiniband updates
- Home directory storage move - Transparent to users once complete - All home directories affected
- Physical move of several servers - Transparent to users once complete
- Firewall upgrade - Network affected, transparent to users once complete
- Re-cabling of various storage
Reminder: During the downtime, all jobs still running will be terminated on the evening of 6/10/19. As power will be out at MGHPCC, jobs cannot be paused. They must be stopped before we begin power-down.
We will notify the community via our email lists when we are back to normal operations. You can also check back here or on our Status Page