#

FASRC Cluster Refresh 2019

Annie Jump Cannon silhouette and the word CANNON on a field of stars

The lease on the Odyssey compute cluster comes to an end this year, and a refresh of the cluster will occur in September 2019.

Our new cluster will be provided by Lenovo and utilize their SD650 NeXtScale servers with direct-to-node water-cooling for increased performance, density, ease of expansion, and controlled cooling.

These units contain two nodes per 1U rack space, so this will change the purchasing model for those who wish to have dedicated hardware. Because of the new cooling system, new nodes will only be added at specific times of the year. We will coordinate this with PIs during the purchasing phase for any new hardware.

Lenovo SD650 top view showing the  disconnect inlet and outlets at rear and the basic interior component layout.

Lenovo SD650 rear showing quick-connect water-cooling and routing of water block

CURRENT STATUS: Tier2 Testing

The refreshed cluster, named Cannon in honor of Annie Jump Cannon, is comprised of 670 plus 16 new GPU nodes.  This new cluster will have 30,000 cores of Intel 8268 "Cascade Lake" processors.  Each node will have 48 cores and 192 GB of RAM.  The interconnect is HDR 100 Gbps Infiniband (IB) connected in a single Fat Tree with 200 Gbps IB core.  The entire system is water cooled which will allow us to run these processors at a much higher clock rate of ~3.4GHz.  In addition to the general purpose compute resources we are also installing 16 SR670 servers each with four Nvidia V100 GPUs and 384 GB of RAM all connected by HDR IB.

Lenovo SD650 top view showing the water cooling tubes and blocks inside which are served by a disconnect inlet and outlet at rear.

Top view showing water-cooling paths.

Once roll-out begins, the Odyssey cluster will be drained of jobs and decommissioned. We will provide more communication on that process closer to roll-out.