#

Winter maintenance downtime 12-17-18 – All Day Event

Monday December 17, 2018
COMPLETED
 
We will be having an all-day maintenance event on December 17th, 2018 to cover maintenance that cannot be performed during the annual MGHPCC shutdown. We expect this to be a yearly event which will allow us time to perform maintenance on systems which are powered on 24/7 and do not get the benefit of a forced downtime as MGHPCC systems do.
 
This is primarily targeted at our Boston data center, but will affect most users and jobs in some way. Primary services such as firewalls, home directory storage, licensing, VPN, and various lab storage and other services housed in Boston will be worked on.

This downtime will interrupt many services including, but not limited to: storage, home directories, authentication, accounts and Portal, as well as licensing and virtual machines.

  • Any running jobs will be paused for most of the day (no action is required on your part)
  • New jobs cannot be scheduled until jobs are resumed
  • Any running GPU jobs will need to be terminated in order to perform CUDA/GPU updates

Status during the day will be updated at https://status.rc.fas.harvard.edu/

Planned tasks:

  • login and NX nodes will be rebooted
  • RC firewall software upgrade
  • rclic1 license server maintenance *will affect license checkout for geneious and comsol, as well as other licensed software*
  • Nvidia driver updates (All GPU queues - GPU jobs will be terminated)
  • Storage firmware updates: uchidafs1, lichtmanfs2boslfs
  • NCF full cutover to CentOS 7 (CentOS 6 nodes will be decommissioned)
  • Webserver maintenance (see list at bottom) - hosted sites will be unresponsive at times
  • VPN upgrade (will disrupt new connections)
  • New Open OnDemand VDI solution deployment
NEW VIRTUAL DESKTOP SOLUTION
A new VDI (virtual desktop) solution based on Open OnDemand will be deployed during the Dec. 17th downtime. This will replace the outdated NX/NoMachine servers. This new service will provide easier setup and access to not only a remote desktop, but quick, direct startup of several popular VDI packages and workflows. It is completely browser-based and will not require downloading or running any client software. The NX servers will remain online through the holidays, but will be decommissioned early in the new year.
Links to the new VDI service documentation will be provided as soon as they are complete.
 
A NOTE ABOUT REGAL SCRATCH
The Regal scratch filesystem has served us venerably since 2013, processing billions of jobs and numerous petabytes of data. But as recent issues involving object identifier errors and aging hardware have shown, it is time to retire it and implement a new fast scratch system. To that end we are looking to roll out a Regal replacement in very early 2019. We thank you for your patience and want to assure you there is light at the end of the tunnel.
 
Storage affected:
uchidafs1
lichtmanfs2
boslfs
 
Websites Affected:
bicepkeck.org
decaps.rc.fas.harvard.edu
evans.rc.fas.harvard.edu
faun.rc.fas.harvard.edu
h3survey.rc.fas.harvard.edu
lichtman.rc.fas.harvard.edu
lipovsky.rc.fas.harvard.edu
lucasjanson.rc.fas.harvard.edu
meltonlab.rc.fas.harvard.edu
mist.rc.fas.harvard.edu
narayan.rc.fas.harvard.edu
provinggroundnetwork.org
schiertracks.rc.fas.harvard.edu
srivastavalab.rc.fas.harvard.edu
zke.fas.harvard.edu

Event Types: