Table of Contents
Research Computing maintains multiple petabytes of storage in association with the Odyssey cluster. This storage is not uniform, however, and each class of storage is designed and maintained with different purposes in mind. This page describes cluster storage in detail as a guide for day-to-day usage. Please see the terminology section at the end of this document for clarification on any unfamiliar terms.
Home Directory Storage
|Size limit||100 GB (Hard Limit)|
|Availability||All cluster nodes.
Can be mounted on desktops and laptops.
|Backup||Daily snapshot. Kept for 2 weeks.|
Not appropriate for I/O intensive or large numbers of jobs. Not expandable.
Every user whose account has cluster access receives a 100 GB home directory.Your initial working directory upon login is your home directory. This is location is for your use in storing everyday data for analysis, scripts, documentation, etc. This is also where files such as you .bashrc reside. Home directories paths look like
XXXX is your login. For example, user jharvard's home directory might be /n/home12/jharvard. You can also reach your home directory using the Unix shortcut ~, as in:
Your home volume has good performance for most simple tasks. However, I/O intensive or large numbers of jobs should not be processed in home directories. Widespread computation against home directories would result in poor performance for all users. For these types of tasks, the scratchlfs filesystem is better suited.
Your home directory is exported from the disk arrays using CIFS/SMB file protocols and so can be mounted as a 'shared drive' on your desktop or laptop. Please see this help document for step-by-step instructions.
Home directories are backed up into a directory called
.snapshot in your home. This directory will not appear in directory listings. You can
ls this directory specifically to make it visible. Contained herein are copies of your home directory in date specific subdirectories. Hourly, daily, weekly snapshots can be found. To restore older files, simply copy them from the correct .snapshot subdirectory. NOTE: If you delete your entire home directory, you will also delete the snapshots. This is not recoverable.
The 100 GB quota is enforced with a combination of a soft quota warning at 90GB and a hard quota stop at 100 GB. Hitting quota during processing of large data sets can result in file write/read failures or segmentation faults. You can check your usage using the du command:
du -sch ~ (where ~ is the unix shortcut for 'home')
When attempting to log in when your home directory is over quota, you will often see an error in the .Xauthority file:
/usr/bin/xauth: error in locking authority file .Xauthority Logging into an NX or other virtual service will fail as the service cannot write to your home directory.
When at or over quota, you will need to remove unneeded files. Home directory quotas are global and cannot be increased for individuals. You may be able to use lab or scratch space to assist with archiving or moving files from your home directory to free up space.
|Size limit||None (costs involved after 4TB)|
|Availability||Varies among cluster nodes.
Most can be mounted on desktops and laptops.
|Backup||Varies, but typically up to 2 weeks. For large shares, backups may take days or even weeks to make a complete copy.
NOTE: Backups are to allow us to restore the entire volume in the even of catastrophic failure. They are not for individual file or folder recovery.
Not appropriate for I/O intensive or large numbers of jobs
Not appropriate for non-research/administrative data
|Cost||4 TB at $0 cost, expansion on a TB basis available for purchase: Billing FAQ
See also: Custom Storage)
Each laboratory using the Odyssey cluster is granted an initial storage allocation of 4 TB. These conventional disk arrays are mounted via NFS and can be used for a variety of purposes. Laboratories may purchase additional storage and backup space as needed. Contact rchelp to get details.
Your lab volume has good performance for most tasks. However, I/O intensive or large numbers of jobs should not be processed in these directories. Widespread computation against these directories would result in poor performance for all users. For these types of tasks one of the "scratch" filesystems is better suited.
Lab storage is intended for research data used by or derived on the cluster. It is not intended for administrative or departmental use. For administrative storage, please contact HUIT.
Most lab directories are exported from the disk arrays using CIFS/SMB file protocols, and so can be mounted as a 'shared' volume on your desktop or laptop. Please see this help document for step-by-step instructions. For groups handling HRCI, this option may not be available.
Odyssey has storage built specifically for high-performance temporary use. You can create your own folder inside the folder of your lab group in /n/scratchlfs . If that doesn't exist or you do not have write access, contact us.
IMPORANT: Scratchlfs is temporary scratch space and has a strict retention policy: https://rc.fas.harvard.edu/policy-scratch/
|Size limit||1.2 Pb total, 50TB max. per group|
|Availability||All cluster nodes.
Cannot be mounted on desktops/laptops.
|Backup||NOT backed up|
|Retention policy||90 day retention policy. Deletions are run during the cluster maintenance window.|
|Performance||High: Appropriate for I/O intensive jobs|
/n/scratchlfs is short-term, volatile, shared scratch space for large data analysis projects
/n/scratchlfs filesystem is managed by the Lustre parallel file system and provides excellent performance for HPC environments. This file system can be used for data intensive computation, but must be considered a temporary store. Files are not backed up and will be removed after 90 days. There is a 50TB total usage limit per group.
Large data analysis jobs that would fill your 100 Gb of home space can be run from this volume. Once analysis has been completed, however, data you wish to retain must be moved elsewhere (lab storage, etc.). The retention policy will remove data from scratch storage after 90 days.
Each node contains a disk partition
/scratch, and is useful for large temp files created while an application is running.
IMPORTANT: Local scratch is highly volatile and should not be expected to persist beyond job duration.
|Size limit||Variable (200-300GB total typical)|
Cannot be mounted on desktops/laptops.
|Backup||Not backed up|
|Retention policy||Not retained - Highly Volatile|
|Performance||High: Suited for limited I/O intensive jobs|
/scratch volumes are a directly connected (and therefore, fast), temporary storage location that is local to the compute nodes. Many high performance computing applications use temporary files that go to
/tmp by default. On Odyssey we have pointed
/scratch. Network-attached storage, like home directories, is slow compared to disks directly connected to the compute node. If you can direct your application to use /scratch for temp files, you can gain significant performance improvements and ensure that large files can be supported.
Though there are
/scratch directories available to each compute node, they are not the same volume. The storage is specific to the host and is not shared. Files written to
/scratch from holy2a18206, for example, are only visible on that host.
/scratch should only be used for temporary files written and removed during the running of a process. Although a 'scratch cleaner' does run hourly, we ask that at the end of your job you delete the files that you've created.
|Availability||A per group basis, depending on needs and funding|
|Backup||A per group basis, depending on needs and funding|
|Retention policy||A per group basis, depending on needs and funding|
|Performance||A per group basis, depending on needs and funding|
|Cost||A per group basis, depending on needs and funding|
In addition to the storage tiers listed above, Research Computing hosts a number of owned, custom storage systems. Once storage sizes/specifications are proposed by RC, they are paid for by specific groups and then housed, maintained, and integrated into our infrastructure like any other system. These systems range from dedicated group storage with backups, to scratch-style systems, to dedicated parallel systems. These are often designed for very specific application/instrument requirements, or when the cost model of our shared storage no longer makes sense for the amount of storage desired. Please contact RCHelp to get details.
Please contact us for $/TB/yr pricing.
Lead time will vary depending on the amount of storage needed. While smaller additions can be accommodated fairly quickly, larger amounts will require more lead time, especially if additional storage capacity needs to be provisioned.
Billing requirements: A 33 digit billing code, unit details and description of the the storage (for the invoice line item) from the PI or their faculty/financial admin.
Snapshots/Checkpoints are point-in-time copies of files/directories as they were at the time of the snapshot. They are useful for protecting from accidental deletion, bad edits, and a quick, general safety net.
CAVEATS: These are stored on the same storage system as the primary data, and would not be useful in the event of system failure for recovery.
Backups, unlike snapshots/checkpoints are full copies of the primary data, which resides on another storage system in another physical location (Data center). This provides protection from system failures and physical issues. These backups are done in an incremental manner (only changed data is copied, with 1 full copy of the primary data and a number of incremental backups providing a bit of history, without storing the data multiple times for unchanged files). Backups are typically done daily, keeping 2 weeks worth of daily history, unless otherwise requested or stated.
CAVEATS: These backups are intended for recovering from catastrophic failures, and not recovering from accidental file deletions.
Replication refers to storage systems which are real-time replicated to a paired system, typically in another physical location. There is no history here, and a file deleted is deleted in both storage locations. Replication provides protection from system/data center/network failures, without interrupting access to the storage.
CIFS/SMB refers to the now standard way of accessing RC storage resources from systems outside our infrastructure, such as workstations/laptops etc. Also known as Windows Drive Mapping, Samba, or 'shared drives'. This is an authenticated method of connecting available on Windows, OSX and Linux systems. This is typically available from wired campus connections as well as via VPN if on campus WiFi or off campus. Please see this help document for step-by-step instructions.