Search Docs by Keyword

Table of Contents

Home / Lab directories and Scratch

Research Computing maintains multiple petabytes of storage systems in association with the FASRC cluster. This storage is not uniform and each class of storage is designed and maintained with different purposes in mind.

Please see the Data Storage on our main website information on other storage tiers and for clarification on any unfamiliar terms.

This page describes the resources which are part of/available to each user account and lab, and is a guide for day-to-day usage.

See also our Introduction to FASRC Cluster Storage video


Home Directories

Every user whose account has cluster access receives a 100 GB home directory. Your initial working directory upon login is your home directory. This location is for your use in storing everyday data for analysis, scripts, documentation, etc. This is also where files such as  you .bashrc reside. Home directories paths look like /n/homeNN/XXXX where homeNN is home01home15 and XXXX is your login. For example, user jharvard’s home directory might be /n/home12/jharvard. You can also reach your home directory using the Unix shortcut ~, as in: cd ~

Size limit 100 GB (Hard Limit)
Availability All cluster nodes.
Can be mounted on desktops and laptops.
Backup Daily snapshot. Kept for 2 weeks.
Retention policy Indefinite
Performance Moderate
Not appropriate for I/O intensive or large numbers of jobs. Not expandable.
Cost None: This is a necessary part of each user account

Your home volume has good performance for most simple tasks. However, I/O intensive or large numbers of jobs should not be processed in home directories. Widespread computation against home directories would result in poor performance for all users. For these types of tasks, the scratch filesystem is better suited.

Home directories are private to your account, but are not suitable for storing HRCI/level 3 or above data. This is a violation of Harvard security policies.

Your home directory is exported from the disk arrays using CIFS/SMB file protocols and so can be mounted as a ‘shared drive’ on your desktop or laptop. Please see this help document for step-by-step instructions.

Home directories are backed up into a directory called .snapshot in your home. This directory will not appear in directory listings. You can cd or ls this directory specifically to make it visible. Contained herein are copies of your home directory in date specific subdirectories. Hourly, daily, weekly snapshots can be found. To restore older files, simply copy them from the correct .snapshot subdirectory. NOTE: If you delete your entire home directory, you will also delete the snapshots. This is not recoverable.

The 100 GB quota is enforced with a combination of a soft quota warning at 95GB and a hard quota stop at 100 GB. Hitting quota during processing of large data sets can result in file write/read failures or segmentation faults. You can check your usage using the df command: df -h ~ (where ~ is the unix shortcut for ‘home’)

TIP: If you are trying to determine usage, you might try using du -h -d 1 ~ to see the usage by sub-directory, or du -ax . | sort -n -r | head -n 20 to get a sorted list of the top 20 largest.

When attempting to log in when your home directory is over quota, you will often see an error in the .Xauthority file:
/usr/bin/xauth: error in locking authority file .Xauthority Logging into an NX or other virtual service will fail as the service cannot write to your home directory.

When at or over quota, you will need to remove unneeded files. Home directory quotas are global and cannot be increased for individuals. You may be able to use lab or scratch space to assist with copying or moving files from your home directory to free up space.

 


Lab Directories

Each lab which uses the cluster receives a 4 TB lab directory (as of 2022 – these will reside in /n/holylabs/LABS). This location is for each lab group’s use in storing everyday data for analysis, scripts, documentation, etc. Each such lab will have a directory on our high-performance scratch filesystem (see below).

 

Size limit 4 TB (Hard Limit), 1M inodes
Availability All cluster nodes.
Cannot be mounted on desktops and laptops.
Backup Highly redundant, No backups
Retention policy Life of the lab group
Performance Moderate
Not appropriate for I/O intensive or large numbers of jobs. Not expandable.
Cost None: This is a necessary part of each cluster lab group

 

Lab directories have good performance for most simple tasks. However, I/O intensive or large numbers of jobs should not be processed in lab directories. Widespread computation against lab directories would result in poor performance for all users. For these types of tasks, the scratch filesystem is better suited.

Lab directories are not suitable for storing HRCI/level 3 or above data. This is a violation of Harvard security policies.

The 4 TB quota is enforced with a combination of a soft quota warning and a hard quota stop at 4 TB. Hitting quota during processing of large data sets can result in file write/read failures or segmentation faults. If your lab requires additional storage, see our Data Storage page for a list of available tiers.


Networked, High-performance Shared (Scratch) Storage

The cluster has storage built specifically for high-performance temporary use. You can create your own folder inside the folder of your lab group. If that doesn’t exist or you do not have write access, contact us.
IMPORANT: scratch is temporary scratch space and has a strict retention policy: Scratch Policy

Size limit 4 Pb total, 50TB max. per group, 100M inodes
Availability All cluster nodes.
Cannot be mounted on desktops/laptops.
Backup NOT backed up
Retention policy 90 day retention policy. Deletions are run during the cluster maintenance window.
Performance High: Appropriate for I/O intensive jobs

/n/holyscratch01 is short-term, volatile, shared scratch space for large data analysis projects
The /n/holyscratch01 filesystem is managed by the Lustre parallel file system and provides excellent performance for HPC environments. This file system can be used for data intensive computation, but must be considered a temporary store. Files are not backed up and will be removed after 90 days. There is a 50TB total usage limit per group.

Large data analysis jobs that would fill your 100 Gb of home space can be run from this volume. Once analysis has been completed, however, data you wish to retain must be moved elsewhere (lab storage, etc.). The retention policy will remove data from scratch storage after 90 days.


Local (per node), Shared Scratch Storage

Each node contains a disk partition /scratch, and is useful for large temp files created while an application is running.
IMPORTANT: Local scratch is highly volatile and should not be expected to persist beyond job duration.

Size limit Variable (200-300GB total typical)
Availability Node only.
Cannot be mounted on desktops/laptops.
Backup Not backed up
Retention policy Not retained – Highly Volatile
Performance High: Suited for limited I/O intensive jobs

The /scratch volumes are a directly connected (and therefore, fast), temporary storage location that is local to the compute nodes. Many high performance computing applications use temporary files that go to /tmp by default. On the cluster we have pointed /tmp to /scratch. Network-attached storage, like home directories, is slow compared to disks directly connected to the compute node. If you can direct your application to use /scratch for temp files, you can gain significant performance improvements and ensure that large files can be supported.

Though there are /scratch directories available to each compute node, they are not the same volume. The storage is specific to the host and is not shared. Files written to /scratch from holy2a18206, for example, are only visible on that host. /scratch should only be used for temporary files written and removed during the running of a process. Although a ‘scratch cleaner’ does run hourly, we ask that at the end of your job you delete the files that you’ve created.


Custom Storage

For information on custom storage and/or backed-up storage, please see our Data Storage page.

© The President and Fellows of Harvard College
Except where otherwise noted, this content is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.