Common Cluster Pitfalls

These are some of the common problems that people have when using the cluster. We hope that these will not be a problem for you as well.

Asking for multiple cores but forgetting to specify one node. -n 4 -N 1 is very different from -n 4

Problem Symptom/Reason
Throwing multiple cores at Python and R code Without special programming, code written for Python and R is single-threaded. That means give more cores in SLURM to your code will do nothing except waste resources. If you which to use multiple cores, you must explicitly write your code to use them, using modules such as 'multiprocessing' or packages like 'Rparallel' or 'RMPI'.
Jobs PENDing for >48 hrs Asking for very large resource requests (cores & memory): adjust lower and try again. Or very low Fairshare score: contact us
Quick run and FAIL...Not including -t parameter no -t means shortest possible in all partitions == 10 min
Not specifying enough cores prog1 | prog2 | prog3 > outfile should run with 3 cores!
Causing massive disk I/O on home folders/lab disk shares Your work & others on the same filesystem slows to a crawl: simple commands like ls take forever
Hundreds/thousands of jobs access one common file Your work & others on the same filesystem slows to a crawl. Make copies of the file and have jobs access one of the group
Don’t pack more than 5K files in one directory I/O for your jobs will slow to a crawl
Bundle your work into ~10 min jobs Kinder for us, kinder for you, kinder for the cluster
Please understand your software -- look at the options! Who knows what could happen?? You wouldn't use an instrument without reading the instructions, would you?
Trying to sudo when installing software Please don’t -- we admin the boxes for you.

Last updated: September 6, 2019 at 14:32 pm

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.