Table of Contents
The Odyssey cluster uses SLURM to manage jobs
SLURM is a queue management system and stands for Simple Linux Utility for Resource Management. SLURM was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world. SLURM replaces the commercial LSF as the primary job manager on Odyssey.
SLURM is similar in many ways to LSF or most other queue systems. You write a batch script then submit it to the queue manager. The queue manager then schedules your job to run on the queue (or partition in SLURM parlance) that you designate. Below we will provide an outline of how to submit jobs to SLURM, how SLURM decides when to schedule your job and how to monitor progress.
SLURM has a number of features that make it more suited to our environment than LSF:
- Kill and Requeue SLURM’s ability to kill and requeue is superior to that of LSF. It waits for jobs to be cleared before scheduling the high priority job. It also does kill and requeue on memory rather than just on core count.
- Memory Memory requests are sacrosanct in SLURM. Thus the amount of memory you request at run time is guaranteed to be there. No one can infringe on that memory space and you cannot exceed the amount of memory that you request.
- Accounting Tools SLURM has a back end database which stores historical information about the cluster. This information can be queried by the users who are curious about how much resources they have used.
General SLURM documentation is widely available.
The primary source for documentation on SLURM usage and commands can be found at the SLURM site. If you Google for SLURM questions, you'll often see the Lawrence Livermore pages as the top hits, but these tend to be outdated.
A great way to get details on the SLURM commands is the man pages available from the Odyssey cluster. For example, if you type the following command:
you'll get the manual page for the sbatch command.
Odyssey jobs are generally run from the command line
Once you've gone through the account setup procedure and obtained a suitable terminal application, you can login to the Odyssey system via ssh.
where <USERNAME> is the RC login you received from the account request tool. This is generally not the same as your HUIT machine login and is not your Harvard ID.
Odyssey computers run the CentOS 6 version of the Linux operating system and commands are run under the "bash" shell. There are a number of Linux and bash references, cheat sheets and tutorials available on the web.
RC's own training slides are also available.
module system is used for enabling applications
Because of the diversity of investigations currently supported by FAS, thousands of applications and libraries are supported on the Odyssey cluster. Technically, it is impossible to include all of these tools in every user's environment. The Linux module system is used to enable subsets of these tools for a particular user's computational needs.
Please note that we are switching to the new
lmod module system. Please see the most current information on our Software on Odyssey pages.
module load command enables a particular application in the environment, mainly by adding the application to your PATH variable. For example, to enable the currently supported R package:
module load R/3.0.2-fasrc01
Loading more complex modules can affect a number of environment variables including PYTHONPATH, LD_LIBRARY_PATH, PERL5LIB, etc. Modules may also load dependencies. An application that uses Java may load the module for the appropriate Java interpreter.
To determine what has been loaded in your environment, the
module list command will print all loaded modules.
module purge command will remove all currently loaded modules. This is particularly useful if you have to run incompatible software (e.g. python 2.x or python 3.x). The
module unload command will remove a specific module and any dependencies that were loaded with the module.
Finding the modules that are appropriate for your needs can be done in a couple of different ways. First, there is a page on this site that will allow you to browse and search (via your browser's page search functionality) the list of modules that have been deployed to Odyssey. Second, the
module avail command can be used along with a pattern match of the output.
module avail by itself will list every module deployed on Odyssey, so a tool like
grep can be used to narrow it down. For example,
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.