My job is PENDING. How can I fix this?

How soon a job is scheduled is due to a combination of factors: the time requested, the resources requested (e.g. RAM, # of cores, etc), the partition, and one's FairShare score.

Quick solution? The Reason column in the squeue output can give you a clue:

  • If there is no reason, the scheduler hasn't attended to your submission yet.
  • Resources means your job is waiting for an appropriate compute node to open.
  • Priority indicates your priority is lower relative to others being scheduled.

There are other Reason codes; see the SLURM squeue documentation for full details.

Your priority is partially based on your FairShare score and determines how quickly your job is scheduled relative to others on the cluster. To see your FairShare score, enter the command sshare -u RCUSERNAME. Your effective score is the value in the last column, and, as a rule of thumb, can be assessed as lower priority ≤ 0.5 ≤ higher priority.

In addition, you can see the status of a given partition and your position relative to other pending jobs in it by entering the command showq-slurm -p PARTITION -o. This will order the pending queue by priority, where jobs listed at the top are next to be scheduled.

For both Resources and Priority squeue Reason output codes, consider shortening the runtime or reducing the requested resources to increase the likelihood that your job will start sooner.

Please see this document for more information and this presentation for a number of troubleshooting steps.

Last updated: March 27, 2018 at 11:56 am

Posted in: c. Jobs and SLURM

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.