#

Blog

Cannon 2.0

Summary FASRC is adding 216 Intel Sapphire Rapids nodes with 1TB of RAM each, 4 Intel Sapphire Rapids nodes with 2TB of RAM each and 144 A100 80GB GPUs to the Cannon cluster. The Sapphire Rapids cores will be made available in the new ‘sapphire’ partition. The new A100 GPUs will be added to the ‘gpu’ partition.  Partitions will be…

2023 MGHPCC downtime, and major OS and software changes

DOWNTIME COMPLETE The annual multi-day power downtime at MGHPCC (https://www.rc.fas.harvard.edu/blog/2023-downtime/) is complete (with exceptions noted below). Normal service resumes today (Friday June 9th) at 9am. As noted in previous communications, many changes have been made to the cluster and software. If you run jobs on the cluster and did not previously try out the test cluster, you will need to make…

Cluster Fragmentation

by Paul Edmon June 3, 2022 A common pitfall of High Performance Computing (HPC) scheduling is cluster fragmentation.  This is not unique to HPC mind you, any system where you have a limited amount of space that you try to fill with quasi random sized blocks of stuff will end up fragmented at some level (see your last game of…

2021 Cluster Upgrades – Sept. 30th 2021

  We have some exciting news about new resources we want to share with you. In the summer of 2019 we invested in direct water cooling and were able to bring you the Cannon cluster with 32,000 of the fastest Intel Cascade Lake cores. This 8-rack block of compute has formed the core of our computational capacity for the past…

Security advisory regarding Python/Conda/pip/PyPI

AUDIENCE: All Python/Conda users IMPACT: Potential malicious packages installed or malware downloaded Numerous packages containing malware/malicious links have been uploaded to the PyPI (Python Package Index) repository. Many of these have names which are slight misspellings of the names of other packages. The intention is to cause an installation of one of these packages if the package name is mistyped…

2020 Year-End Recap and a Look Back at FASRC

2020 - A year full of changes   2020 began somewhat normally, except that the world was coming to grips with a global pandemic. By February plans were being formed for a probable change to 100% remote work for FASRC staff. This became a reality in March and we bid our offices in '38' farewell for some indeterminate time. This…

A Retrospective Odyssey

by Paul Edmon December 2, 2020 Back in 2008, the Odyssey supercomputer was installed on the seventh floor of the Markely-Boston co-location data center at 1 Summer Street, Boston, Massachusetts.  Until this point Harvard had not really been a player in the realm of supercomputing, unlike other major universities who had been doing so for decades.  However with Odyssey Harvard…

Congratulations to Madelyn Cain for JobID 1!

by Paul Edmon, September 1, 2020 One of the features of our scheduling software, Slurm, is that it has a max JobID of 67,043,328 (for the fascinating reason why see the Slurm docs).  This means that when we hit that limit the JobID rolls over to 1 and starts counting again from there.  We thought this would be a fun…

Summer 2020 datacenter consolidation

On-going work July - ? in our Boston data center Due to scheduling of resources around pandemic guidelines and rules, as well as scheduling with the data center and vendors, an end date is not yet known. As part of cost-savings plans, FASRC and Harvard Medical School will be combining space in our Boston data center. This allows FASRC, HMS,…