#

MGHPCC (Holyoke) Annual Power Shutdown 8/9/21-8/12/21

Cluster open with caveats. See updates on Status Page https://status.rc.fas.harvard.edu

 


The annual MGHPCC data center power shutdown and maintenance will occur August 9th through August 12th.

  • Power-down will begin at 6PM on August 9th. (NOTE: some jobs will be terminated at 9am due to rack shutdowns in 7C, see TASKS below)
  • Power will be out that night and through the following day , August 10th. (See also Boston below)
  • Maintenance and network upgrades will occur on August 11th.
  • Power-up ETA and return to service is noon on August 12th. ETA UNKNOWN - SEE EMAIL OR STATUS PAGE

While this outage impacts all services and resources in the MGHPCC/Holyoke data center, please be aware that this can have a knock-on effect for some Boston services as well. See below: 

OFFICE HOURS
Office hours will not be held on Wednesday August 11th.

BOSTON DATA CENTER
Boston storage WILL be affected on August 10th.
Boston login, and VDI will be affected for the duration of the downtime. 
Any additional Boston outages will be noted here closer to the date.

 

TASKS

  • Row 7C work (starts Aug 9th 9am): Jobs running on any node in the following racks will be terminated by 9am to facilitate shutting down these racks for hardware changes/cooling shutoff: holy7c16, holy7c18, holy7c20, holy7c22, holy7c24, holy7c26
    This will impact jobs in the following partitions: arguelles_delgado, davies, edwards, fasse, geophysics, giribet, huce_cascade, huce_cascade_priority, imasc, itc_cluster, kovac, ncf, ncf_interact, ncf_nrg, ortegahernandez, phelevan, shared (partial outage), test, unrestricted, xlin, zon
    New Ice Lake nodes, bigmem, and A100 GPU-equipped nodes will be added in this row. Cooling shutdown to these racks is necessary in order for Lenovo to install this new hardware.
  • Login and compute OS upgrades
    CentOS 7.8.2003 to CentOS 7.9.2009
    Note: After upgrade SSH keys may change. See: https://docs.rc.fas.harvard.edu/kb/ssh-key-error/
  • Infiniband upgrades
  • SLURM master replacement
  • Core and distribution equipment replacement
  • Isilon firmware upgrades
  • Network maintenance and upgrades: Major upgrades, replacing the 8 year old distribution and core switches to support 2 x 100Gbps connectivity to campus and Internet

The event is finished.

Date

Aug 09 - 12 2021
Expired!

Time

All Day
Category
QR Code
FASRC staff will be attending an all-hands Monday October 18th 10am-3pm.
Next monthly maintenance November 1st [DETAILS]