#

New compute and partition reconfiguration

Fall greetings from RC!

We are very pleased to unveil the new compute purchased by FAS as part of the Odyssey 3 project (https://www.rc.fas.harvard.edu/odyssey-3-the-next-generation/). This new compute hardware has been benchmarked at 0.5 petaflops, making it the fastest block of compute to date at Harvard. This hardware is now available on the cluster for people to use in these new partitions:

* "shared" - This is the majority of the new compute. Comprised of 14,592 Intel Broadwell cores, each node has 128 GB of RAM and all are connected via Mellanox FDR Infiniband. By our own internal benchmarking this compute is four times faster than the older AMD chips that make up the general partition. This partition has a 7 day run time limit, similar to general. The general partition is still available for jobs requiring larger amounts of contiguous memory.

* "test" - IMPORTANT - we have retired the interact partition for reasons of resource efficiency. You will need to adjust any scripts which reference 'interact' and replace that with 'test'. This partition is primarily for code testing ahead of large scale runs on the shared partition and for interactive jobs. To enable faster access and to ensure that everyone can find space here for interactive and testing purposes, we have set the time limit for this partition to 8 hours. This will also cut down on abuse/misuse of this partition. The test partition is made up of 8 nodes of the same Intel compute as shared. We have set a limit of a maximum of 5 jobs per user and 64 cores across all jobs for a user on this partition.

* "gpu_requeue" - This partition encompasses all the GPU hardware that RC runs. While serial_requeue will still contain all the same hardware, gpu_requeue is dedicated to those runs that actually use GPUs and not solely the Intel processors on those nodes. If you have been using serial_requeue to do GPU work, we recommend you now use this partition. Jobs submitted to this partition that do not use GPUs will be denied permission to run.This partition is still subject to the same requeue rules as serial_requeue, so please be mindful of that. The gpu partition is still available, but is a single node and more difficult to schedule on.

We hope that these new partitions will be beneficial to your ongoing research. For more details about these and all other partitions see:

https://www.rc.fas.harvard.edu/resources/running-jobs/#SLURM_partitions
You can also find out about which partitions you have access to by using the sinfo command
(or see: https://www.rc.fas.harvard.edu/slurm-commands )

Thanks for your continued support of Odyssey. We look forward to sharing with you our future work on the Odyssey 3 project. As always, please contact us at with any questions or comments.
https://portal.rc.fas.harvard.edu/rcrt/submit_ticket or email rchelp@rc.fas.harvard.edu

FAS Research Computing
https://www.rc.fas.harvard.edu