Summary
FASRC is adding 216 Intel Sapphire Rapids nodes with 1TB of RAM each, 4 Intel Sapphire Rapids nodes with 2TB of RAM each and 144 A100 80GB GPUs to the Cannon cluster. The Sapphire Rapids cores will be made available in the new ‘sapphire’ partition. The new A100 GPUs will be added to the ‘gpu’ partition. Partitions will be reorganized to account for the larger memory of the new nodes. FASSE will gain additional ‘fasse_bigmem’ and ‘fasse_gpu’ capacity. The base gratis fairshare will be increased to 200 on the Cannon cluster.
These updates will go live on January 22nd, 2024.
Overview
Cannon 2.0 represents the expansion of the liquid-cooled Cannon cluster, providing access to Intel's latest Sapphire Rapids processors and Nvidia's A100 GPUs to the Harvard research community. This update aims to reduce wait times and provide additional resources to the community. On the Cannon cluster, we observed that limited memory on the nodes was contributing to extended wait times for workflows requiring more memory. The additional memory in these new compute nodes will help bridge this gap.
Cannon 2.0 consists of:
- CPUs: 216 nodes with 1TB of RAM and 112 cores each of Lenovo SD650 V3 direct water cooling servers. These nodes offer a total of 24,192 cores of Intel 8480+ “Sapphire Rapids” processors. The interconnect is NDR 400 Gbps Infiniband (IB) connected to 400 Gbps IB core.
- CPUs: 4 nodes with 2TB of RAM and 112 cores each of Lenovo SD650 V3 direct water cooling servers. These nodes have 448 cores of Intel 8480+ “Sapphire Rapids” processors. The interconnect is NDR 400 Gbps Infiniband (IB) connected to 400 Gbps IB core.
- GPUs: 36 nodes each node with four Nvidia A100 80G GPUs for a total of 144 new GPUs of Lenovo SR670 V2 servers with direct water cooling. Each GPU node has 64 CPU cores, and 1TB of RAM.
As part of our standard process for new installs we follow a phased (formerly called tiered) testing plan:
- Phase 1: Lenovo burn in, HPL benchmarking, Top/Green 500 Runs (Done)
- Phase 2: Internal Testing (Done)
- Phase 3: Harvard Community Grand Challenge Runs (in progress)
- Production - Jan 22nd
Partitions
With the advent of Cannon 2.0, we have reconsidered the organization of the partitions. All the Sapphire Rapids nodes have 1TB of memory, which exceeds the capacity of our current ‘bigmem’ partition, rendering the name of that partition somewhat misleading. Additionally, the new A100s are the 80GB variety on the NDR (faster) fabric, indicating that we cannot simply merge them into the existing GPU partition. Finally, we need to consider the future needs of FASSE.
Thus the updated partitions are as follows:
Partition | Nodes | Cores per Node | CPU Core Types | Total # of Cores | Usable Mem per Node (GB) | Time Limit |
sapphire | 196 | 112 | Intel Sapphire Rapids | 21,952 | 990 | 3 days |
shared | 277 | 48 | Intel Cascade Lake | 13,296 | 184 | 3 days |
hsph | 36 | 112 | Intel Sapphire Rapids | 4,032 | 990 | 3 days |
test | 12 | 112 | Intel Sapphire Rapids | 1,344 | 990 | 12 hours |
intermediate | 12 | 112 | Intel Sapphire Rapids | 1,344 | 990 | 14 days |
bigmem | 4 | 112 | Intel Sapphire Rapids | 448 | 1988 | 3 days |
bigmem_intermediate | 3 | 64 | Intel Ice Lake | 192 | 2000 | 14 days |
gpu | 36 | 64 | Intel Ice Lake, A100 80GB | 144 GPUs | 990 | 3 days |
gpu_test | 14 | 64 | Intel Ice Lake, A100 40GB, MIG 3g.20gb | 112 MIG GPUs | 448 | 12 hours |
remoteviz | 1 | 32 | Intel Cascade Lake | 32 | 380 | 3 days |
unrestricted | 8 | 48 | Intel Cascade Lake | 384 | 184 | none |
fasse | 42 | 48 | Intel Cascade lake | 2,016 | 184 | 7 days |
fasse_bigmem | 16 | 64 | Intel Ice Lake | 1,024 | 500 | 7 days |
fasse_gpu | 4 | 64 | Intel Ice Lake, A100 40GB | 16 GPUS | 488 | 7 days |
The reshuffle of the Cannon 2.0 system aims to better serve the community by absorbing existing ‘bigmem’ jobs into the new higher capacity ‘sapphire’ partition and ‘bigmem_intermediate’ into the ‘intermediate’ partition. The use of MIG mode for ‘gpu_test’ will double the effective number of GPUs, allowing for more users.
All the Cannon changes will go live on Jan 22nd 2024. None of these changes require a downtime for the cluster nor interruption of user workflow or jobs. Existing jobs will finish on the older nodes. FASSE changes will occur throughout the week as nodes are moved over to the secure environment.
To take advantage of the new partitions, you will need to update your job scripts and adjust job parameters according to your needs. See the FAQ below for additional recommendations. Please free to join our office hours https://www.rc.fas.harvard.edu/training/office-hours/ or contact us https://www.rc.fas.harvard.edu/about/contact/ if you have any questions. To cite use of this resource please see: https://www.rc.fas.harvard.edu/cluster/publications/
Fairshare
This new hardware will increase our computational power on our public Cannon partitions by roughly 70% over Cannon 1.0. As a result we have recalculated our base gratis fairshare for all groups. The base gratis fairshare on Cannon will be changed from 120 to 200. This new base gratis fairshare score will be applied to all groups on Cannon when the new hardware is made live.
For FASSE the new hardware does not significantly impact the computational power of that cluster. As such the base gratis fairshare for FASSE will remain the same at 100.
Cannon FAQ
Q. I use bigmem/bigmem_intermediate, do I need to update my scripts?
Yes, you should move to the updated ‘sapphire’ and ‘intermediate’ partitions. The new partitions now have 1T of RAM.
Q. I use ultramem, do I need to update my scripts?
Yes, you should move to ‘bigmem’ or ‘bigmem_intermediate’. The updated partitions now have 2T of memory which is the same as what ‘ultramem’ offered.
Q. I use gpu_mig, do I need to update my scripts?
Yes, you should move to ‘gpu_test’ which is set up in MIG mode and allows for experimentation with that feature.
Q. I use gpu_test, do I need to update my scripts?
Maybe. ‘gpu_test’ will be moving from V100’s to A100’s in MIG mode. Depending on your script you may need to adjust for the new GPU type.
Q. I use test/intermediate/gpu, do I need to update my scripts?
No. Changes to these partitions do not necessitate updating your script.
Q. I use shared, do I need to update my scripts?
Maybe. ‘shared’ will remain as is, but if you want to leverage the new Sapphire Rapids nodes you should consider updating your script to point to the ‘sapphire’ partition, or adding ‘sapphire’ as an additional partition (i.e. -p shared,sapphire). It is worth testing to see which partition will give you better performance.
Q. I am part of Kempner, does the gratis base fairshare impact me?
No. The base gratis fairshare only impacts your Cannon Slurm account, not the Kempner Slurm accounts.
FASSE FAQ
Q. I use fasse_bigmem, do I need to update my scripts?
No. Changes to these partitions do not necessitate updating your script.
Q. I use fasse_gpu, do I need to update my scripts?
Maybe. ‘fasse_gpu’ will be moving from V100’s to A100’s. Depending on your script you may need to adjust for the new GPU type.
Resources
- FASRC self-help documentation and instructions: https://docs.rc.fas.harvard.edu/
- Office hours: https://www.rc.fas.harvard.edu/training/office-hours/
- Training: https://www.rc.fas.harvard.edu/upcoming-training/
- Support: https://www.rc.fas.harvard.edu/about/contact/
- Cannon cluster dashboard: https://dash.rc.fas.harvard.edu/d/000000017/cannon-cluster-node-usage?orgId=1&refresh=5m
- FASSE cluster dashboard: https://dash.rc.fas.harvard.edu/d/tUVpWaHMk/fasse-cluster-node-usage?orgId=1&refresh=5m
- Other dashboards: https://xdmod4.rc.fas.harvard.edu/