Monthly Maintenance, February 3rd, 2020 7am-11am

Monday February 3, 2020 7:00 AM to 11:00 AM

Monthly maintenance will occur on Monday, February 3rd from 7am to 11am.

  • Login and VDI nodes will be rebooted
  • new FASRC-built scratch system comes online (see description below)
  • user documentation will move to https://docs.rc.fas.harvard.edu
  • Status page will be upgraded - major change, history will be lost
  • Network DHCP and NPS updates - Should be transparent to end-users
  • NCF: ncf_holy queue decommissioned, please use ncf queue (on new hardware)
We recognize and understand the frustration that the scratch situation has caused for many users. We are giving this issue the highest priority. Our intention was to move to an appliance-based vendor solution which could be leased and replaced regularly. The original DDN appliance, scratchlfs, had severe stability issues which could not be resolved by their engineering team. The vendor then deployed an all-new pre-tested system, scratchlfs02, to try to remedy this, but this system still has not remained stable and performant enough to serve as fast scratch in our environment.
As such, we have decided to go back to the model of building our own scratch and adding in higher overall throughput and capacity. We are running more real-world 'chaos' testing now and feel confident this will alleviate the frequent outages we're currently dealing with. This system, name to be determined, should be online on or before February 3rd. The $SCRATCH environment variable remains pointed to scratchlfs02 so as not to impact running jobs. It will be changed to point to the new FASRC-built system during the March 2nd maintenance.
Fast scratch for such a large and varied cluster is a challenge in general, and we appreciate your understanding and patience as we work through this issue. Thank you for your patience so far!
  • The original appliance /n/scratchlfs (and /n/scratchssdlfs) will be shut down during this maintenance
  • /n/scratchlfs02 remains online and and the $SCRATCH environment variable remains pointed to it until March 2nd
  • The new FASRC-built scratch will be available for use on or before February 3rd
  • scratchlfs02 will be decommissioned March 2nd and $SCRATCH will be pointed to scratchlfs04 on that date

Event Types: