Storage provides large-scale networked attached access to research data from a variety of endpoints: compute cluster, instruments workstations, laptops, etc. Various storage types are typically designed for specific I/O (literally data in/out) needs that stems from raw data collection, data processing and analysis, data transfer and sharing, data preservation, and archiving. These storage types usually have different data retention and replication policies as well.
Key Features and Benefits
- Quota / Logical Volume – data storage may be allocated as separate logical volumes, which allows for separation of the network export of the data. Alternatively, in a shared filesystem, a quota may be applied to provide a specific allocation size of storage and/or number of files (or objects). A soft quota is the point in which warnings could be issued above a specified limit, and hard quotas do not allow for overage of data or file capacity.
- Access: Files may be accessed through a number of different protocols. The following are common to Research Computing:
- NFS (network filesystem) – Linux POSIX based file system served from single storage server to internal network attached NFS clients, usually compute nodes or VMs. Use of the service looks transparent to the end users within a POSIX filesystem and requires no authentication. Client and server share the same authentication service (Windows AD or LDAP), so UID/GID (user and group names) are preserved.
- Parallel filesystem – Linux POSIX based file system served from multiple storage servers to internal network attached clients via NFS protocol. Open source Lustre and IBM’s GPFS are two commonly used parallel filesystems in cluster computing due to their performance and scalability. Individual files can be split (aka stripped) across multiple storage targets and multiple clients can read or write to the same file via MPI-IO in MPI version 2.
- SMB (Windows), Samba (Linux/Mac), CIFS – Linux POSIX based file system served from a single server by re-exporting a local or remote filesystem as a mapped network drive. Use of this service requires the end user to establish a connection and requires authentication. Connections from the client that do not use central authentication will not have proper UID/GID mappings, and changed of file permissions are not preserved as they are enforced from the server
- Backups: The following methodologies are used to provide loss of data:
- Snapshot – A point in time reference to a collection of files or objects is maintained for a specified duration. Thus, the total amount of space used is the current collection of data, plus the change in data kept during the snapshot duration. Use of this service allows the end user to recover files or objects to a prior point it time. A changed file could be recovered to the previous state if done after the frequency of the snapshots and before the duration of snapshots expire. For example, if snapshots are taken daily for two weeks, and you altered or deleted a file 3 days ago, you could recover this file from a snapshots directory. However, if you created a file today and removed it, there would not be a snapshot of this file, /. as its lifetime was shorter than the snapshot frequency. Also, if you tried to recover a file past two weeks, the snapshot would have already expired.
- Disaster Recovery (DR) Copy – Is a point in time transfer of the entire filesystem to a different storage server in a remote data center for the purpose of recovering data from a complete filesystem lost. This could be due to physical damage to the system (fire, electrical, or water damage) or due too many drives lost. As this is completed asynchronously, it is not necessarily a point in time reference, as changes to the source filesystem could happen during the transfer.
- No backup / Single copy – Files are located on a single storage device, when files are removed by users, they are immediately deleted from the filesystem. This storage device still has resiliency to individual drive failure from RAID/ erasure coding.
- Retention – Some filesystems, primarily shared general filesystems such as scratch, have a retention policy that sets the amount of time files can remain on the filesystem before being moved or removed.
- Home Directory- In order to facilitate basic login and research workflow for new users, each new user with cluster access is allotted 100GB of home directory space on our shared Isilon storage, included as part of the overall HPC cost model. This directory is only accessible by the user and is intended only to be used for active research on the cluster and is not suitable for archiving, workstation backups, or administrative files. This space cannot be expanded. For additional storage needs, see Lab Directory and Lab Storage below.
- Lab Directory- (aka 'Starter Space') In order to facilitate basic research workflow for new labs, each new lab group with cluster access is allotted 4TB of initial space on our shared Lustre storage on /n/holylabs, included as part of the overall HPC cost model. This lab directory is intended only to be used for active research on the cluster and is not suitable for archiving, workstation backups, or administrative files. This space is not mountable (SMB/Samba), but is accessible via Globus. This space cannot be expanded. For additional storage needs, see Lab Storage below.
- Scratch / Temp – While performing research computations, it is common that many files are created during runtime that are not needed after the series of computations is complete. Thus, most compute nodes have a local (/scratch) and a mounted high-performance general purpose scratch filesystem (/n/holyscratch01). The latter has a specified retention policy, whereby files are only kept a short duration.
- Lab Storage - Storage on one of our Tiered Storage systems is billed as part of our storage service center. See "Performance/Tiers" below. Costing is based on the total allocated share size.
Service Expectations and Limits:
Research data storage is not intended to be an enterprise service and is generally operated more with a more conservative cost basis. Many factors affect performance of storage including the network, percent full of filesystem, age of hardware, mixture of read/write patterns from endpoints, single server vs clustered storage. In addition, due to the nature of research that is being performed, it isn’t always well understood by the end user how scaling out their computations on a cluster is affecting the underlying filesystem.
At FASRC, availability, uptime, and backup schedule is provided as best effort with staff that do not have rotating 24/7/365 shifts.
All PIs with an active FASRC account from any Harvard School.
Service manager and Owner:
Service Manager: Brian White, Associate Director of Systems Engineering and Operations
For Storage Service Center information and to request allocations, see: Storage Service Center
Offerings (Tiers of Service)
Request An Allocation
To request new allocation or update exiting allocations please refer to the Storage Service Center Document.
With modern drives and networks, there are a wide variety of performance characteristics of storage, shown here broken up into service tiers. By offering multiple tiers of storage, different workflows can be accommodated, and moving data through the storage tiers will facilitate the storage life-cycle. Storage tiers typically start with tier zero being the most performant tier, subsequent tiers one, two, three, etc. step down the performance with each additional tier.
Tier 0: Bulk - Lustre
This tier generally has the highest performance and capacity and is designed to sustain thousands of computing jobs simultaneously. Tier 0 does not have snapshots nor a DR copy, therefore labs are responsible for backing up any critical data.
- Features: High-performance, single copy, network attached to cluster via (Lustre/NFS), quota (files + size), Starfish data management web access, Globus transfer access.
- Mount point(s): Other (varies)
- Quota: 1-1024 TB
- Cost: $50/TB/yr (excluding scratch) - Cost is based on the total allocation.
Tier 1: Enterprise - Isilon
This tier is often the “general purpose” storage tier. This is where most labs who require snapshots or a DR copy will store their data as it is created and will keep the data that they are actively using. This tier is generally performant enough to use for hundreds of clustered compute jobs simultaneously and is typically the class of storage used for general file sharing, SMB and NFS.
- Features: Tiered performance, snapshot, DR copy, network attached to cluster (NFS), quota, SMB, Starfish data management web access, Globus transfer access.
- Mount point: /n/pi_lab
- Quota: 1-1024 TB
- Cost: $250/TB/yr - Cost is based on the total allocation.
Tier 3: Attic Storage - Tape
This tier is generally used for data that is static (no longer need to make changes to) but which needs to be kept for compliance or publishing reasons. It is available in 20TB increments. Tier 3 data utilizes tape for storage and, as such, access times are slower. Globus and S3 transfer access. Please note that this tier does not qualify as 'archival' storage. It is not intended for long-term or perpetual archiving.
- Features: Tape-based. Low performance. S3 object store access, single copy, network attached to data transfer nodes. Globus and S3 transfer access.
- Available in 20TB increments
- Cost: estimate $5/TB/yr - Cost is based on each 20TB tape allocation.
Not available at this time - ETA TBD
Tier 2: Lab Share
This tier is generally used for data that is less active and not for high throughput jobs. For example, data associated with a recently completed experiment or data gathering from an instrument. It is not appropriate to compute against data on this tier with more than ~10-20 computations at a time.
- Features: Regular performance, object store, DR copy, network attached to cluster (NFS), SMB
- Mount point: N/A
- Quota: N/A
- Cost: N/A
|Tier 0||Tier 1||Tier 2||Tier 3|
|Description:||High-performance Lustre||Enterprise Isilon||NFS Storage||Tape|
|Cost per TB/year:||$50||$250||N/A||$5|
|Mounted to cluster:||Yes||Yes||Yes||No|
|Encrypted at rest:||No||No||No||No|
|Data Management:||Starfish||Data IQ||Starfish||N/A|
|Location:||Holyoke and Boston||Holyoke and Boston||Holyoke and Boston||Holyoke|
|Documentation:||Tier 0 Doc||Tier 1 Doc||Tier 2 Doc||Tier 3 Doc|