Cluster Quick Start Guide

This guide will provide you with the basic information needed to get up and running on the FASRC cluster for simple command line access. If you'd like more detailed information, each section has a link to fuller documentation


1. Get a FASRC account using the account request tool.

Before you can access the cluster you need to request a Research Computing account. 

See How Do I Get a Research Computing Account for instructions if you do not yet have an account.

See the account confirmation email for instructions on setting your password and getting started.

2. Setup OpenAuth for two factor authentication

Once you have your new FASRC account, you will need to set up our OpenAuth tool for two-factor authentication.

See the OpenAuth Guide for instructions if you have not yet set up OpenAuth.

For troubleshooting issues you might have, please see our troubleshooting page.

3. Use the FASRC VPN when connecting to storage, VDI, or other resources available only on our networks.

FASRC VPN Setup Guide

4. Review our introductory training

See: Introduction to Cluster Computing

Accessing the Cluster

Use a terminal to ssh to login.rc.fas.harvard.edu

NOTE: If you did not request cluster access when signing up, you will not be able to log into the cluster or login node. See this doc for how to add cluster access.

For command line access to the cluster, connect to login.rc.fas.harvard.edu using ssh. If you are running Linux or Mac OSX, open a terminal and type ssh USERNAME@login.rc.fas.harvard.edu, where USERNAME is the name you were assigned when you received your account. Enter the password you setup in the account request tool. When prompted for the Verification code, use the OpenAuth supplied number.


The OpenAuth application (upper right corner) displays the value to be used for the Verification code prompt.

Add -CY if you have an X11 server installed and desire graphics support (ssh -CY yourusername@login.rc.fas.harvard.edu). For help with X11 forwarding, start with our Access and Login page.

For Windows users, we recommend PuTTy for SSH. HUIT (Harvard IT) also provides newer versions of SecureCRT (SSH) and SecureFX (SFTP). If you are in FAS and would like to try them, go to the HUIT download page (uses HarvardKey). Older versions of these programs will not work with modern SSH.

See our Access and Login page for more details on ways to connect to FASRC resources, including terminal applications.

Transfer any files you may need

If you're using a Linux-y terminal like the Mac OSX Terminal tool or a Linux xterm, you'll want to use scp for transferring data

scp hg19.chr1.fasta akitzmiller@login:

This will transfer the data into the root of your home directory.

There are also graphical scp tools available. The Filezilla SFTP client is available cross-platform for Mac OSX, Linux, and Windows. See our SFTP file transfer using Filezilla document for more information. Windows users who prefer SCP can download it from WinSCP.net.

NOTE: If you are off campus or behind a firewall, you should first connect to the Research Computing VPN.

Familiarize yourself with proper decorum on the cluster

The FASRC cluster is a massive system of shared resources. While much effort is made to ensure that you can do your work in relative isolation, some rules must be followed to avoid interfering with other user's work.

The most important rule on the cluster is to avoid performing computations on the login nodes. Once you've logged in, you must either submit a batch processing script or start an interactive session (see below). Any significant processing (high memory requirements, long running time, etc.) that is attempted on the login nodes will be killed.

See the full list of Cluster Customs and Responsibilities.

Determine what software you'd like to load and run

An enhanced module system called Helmod is used on the cluster to control the run-time environment for individual applications. To find out what modules are available you can either look at the module list on the RC / Informatics portal, or use the module avail command. By itself, module avail will print out the entire list of packages. To find a specific tool, use the module spider or module-query command.

module-query MODULENAME

Once you've determined what software you would like to use, load the module:

module load MODULENAME

where MODULENAME is the specific software you want to use. You can use module unload MODULENAME to unload a module. To see what modules you have loaded type module list. This is very helpful information to provide when you submit help tickets.

For errors in loading modules after the O3 upgrade, see Modules on CentOS7 upgrade page. 

For details on finding and using modules effectively, see Software on the cluster page.

For details on running software on the cluster, including graphical applications, see module section of the Running Jobs page.

Determine where your files will be stored

Users of the cluster are granted 100Gb of storage in their home directory. This volume has decent performance and is regularly backed up. For many, this is enough to get going. However, there are a number of other storage locations that are important to consider when running software on the FASRC cluster.

  1. /n/scratchlfs02 Scratchlfs02 is large, high performance temporary Lustre filesystem. We recommend that people use this filesystem as their primary working area, as this area is highly optimized for cluster use. Use this for processing large files, but realize that files will be removed after 90 days and the volume is not backed up. Create your own folder inside the folder of your lab group. If that doesn't exist, contact RCHelp.
  2. /scratch When running batch jobs (see below), /scratch is a large, very fast temporary store for files created while a tool is running. It is a good place for temporary files created while a tools is executing because the disks are local to the node that is performing the computation making access is very fast. However, data is only accessible from the node itself so you cannot directly retrieve it after calculations are finished.
  3. Lab storage Each lab that is doing regular work on the cluster can request an initial 4Tb of group accessible storage at no charge. Like home directories, this is a good place for general storage, but it is not high performance and should not be used during I/O intensive processing.

Do NOT use your home directory or lab storage for significant computation. This degrades performance for everyone on the cluster.

For details on different types of storage and how obtain more, see the Cluster Storage page

Run a batch job...

The cluster is managed by a batch job control system called SLURM. Tools that you want to run are embedded in a command script and the script is submitted to the job control system using an appropriate SLURM command.

For a simple example that just prints the hostname of a compute host to both standard out and standard err, create a file called hostname.slurm with the following content:

#SBATCH -n 1 # Number of cores requested
#SBATCH -N 1 # Ensure that all cores are on one machine
#SBATCH -t 15 # Runtime in minutes
#SBATCH -p serial_requeue # Partition to submit to
#SBATCH --mem=100 # Memory per cpu in MB (see also --mem-per-cpu)
#SBATCH --open-mode=append
#SBATCH -o hostname_%j.out # Standard out goes to this file
#SBATCH -e hostname_%j.err # Standard err goes to this filehostname

Then submit this job script to SLURM

sbatch hostname.slurm

When command scripts are submitted, SLURM looks at the resources you've requested and waits until an acceptable compute node is available on which to run it. Once the resources are available, it runs the script as a background process (i.e. you don't need to keep your terminal open while it is running), returning the output and error streams to the locations designated by the script.

You can monitor the progress of your job using the squeue -j JOBID command, where JOBID is the ID returned by SLURM when you submit the script. The output of this command will indicate if your job is PENDING, RUNNING, COMPLETED, FAILED, etc. If the job is completed, you can get the output from the file specified by the -o option. If there are errors, the should appear in the file specified by the -e option.


If you need to terminate a job, the scancel command can be used (JOBID is the number returned when the job is submitted).

scancel JOBID

SLURM-managed resources are divided into partitions (known as queues in other batch processing systems). Normally, you will be using the shared or serial_requeue partitions, but there are others for interactive jobs (see below), large memory jobs, etc.

For more information on the partitions on the cluster, please see the SLURM partitions page.

For more information and running batch jobs, including MPI code, please see the Running Jobs page.

For a list of useful SLURM commands, please see the Convenient SLURM Commands page.

... or an interactive job.

Batch jobs are great for long-lasting computationally intensive data processing. However, many activities like one-off scripts, graphics and visualization, and exploratory analysis do not work well in a batch system, but are too resource intensive to be done on a login node. There is a special partition on the cluster called "test" that is designed for responsive, interactive shell and graphical tool usage.

You can start an interactive session using a specific flavor of the srun command.

srun -p test --pty --mem 500 -t 0-08:00 /bin/bash

srun is like sbatch, but it runs synchronously (i.e. it does not return until the job is finished). The example starts a job on the "test" partition, with pseudo-terminal mode on (--pty), an allocation of 500 MB RAM (--mem 500), and for 6 hours (-t in D-HH:MM format). It also assumes one core on one node. The final argument is the command that you want to run. In this case you'll just get a shell prompt on a compute host. Now you can run any normal Linux commands without taking up resources on a login node. Make sure you choose a reasonable amount of memory (--mem) for your session.

For graphical tools we have the Virtual Desktop through Open OnDemand.  Simply get on the RC VPN and go to https://vdi.rc.fas.harvard.edu to get started using OnDemand.

Getting further help

If you have any trouble with running jobs on the cluster, first check the comprehensive Running Jobs page and our FAQ. Then, if your questions aren't answered there, feel free to contact us at RCHelp. Tell us the job ID of the job in question. Also provide us with what script you ran, the error and output files, and where they're located as well. The output of module list is helpful, too.

A note on requesting memory (--mem or --mem-per-cpu)

In SLURM you must declare how much memory you are using for your job using the --mem or --mem-per-cpu command switches. By default SLURM assumes you need 100 MB. If you don't request enough the job can be terminated, often times without very useful information (error files can show segfault, file write errors, etc. that are downstream symptoms). If you request too much, it can increase your wait time (it's harder to allocate a lot of memory than a little), crowd out jobs for other users, and lower your fairshare.

You can view the runtime and memory usage for a past job with

seff JOBID

where JOBID is the numeric job ID of a past job:

[user@boslogin01 home]# seff 1234567
Job ID: 1234567
Cluster: odyssey
User/Group: user/user_lab
State: COMPLETED (exit code 0)
Nodes: 8
Cores per node: 64
CPU Utilized: 37-06:17:33
CPU Efficiency: 23.94% of 155-16:02:08 core-walltime
Job Wall-clock time: 07:17:49
Memory Utilized: 1.53 TB (estimated maximum)
Memory Efficiency: 100.03% of 1.53 TB (195.31 GB/node)

This job had a maximum memory footprint of about 196 GB per node, and took a little over 7 hrs to run.


CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.