What's up? My SLURM output file terminates early with the following error:
"slurmstepd: error: _slurm_cgroup_destroy: problem deleting step cgroup
path /cgroup/freezer/slurm/uid_57915/job_25009017/step_batch: Device or
Well, usually this is a problem in which your job is trying to write to a network storage device that is busy -- probably overloaded by someone doing high amounts of I/O (input/output) where they shouldn't, usually on low throughput storage like home directories or lab disk shares.
Please contact RCHelp about this problem, giving us the jobID, the filesystem you are working on, and additional details that may be relevant. We'll use this info to track down the problem (and, perhaps, the problem user(s)).
(If you know who it is, tap them on the shoulder and show them our Odyssey Storage page.)
Last updated: November 19, 2014 at 17:44 pm
Posted in: c. Jobs and SLURM
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.