New CHPC storage option - archive storage

Date Posted: December 1st, 2016

CHPC now has an additional storage offering – archive storage – to complement the existing home, group and scratch storage options. Specifically, this archive storage presents a good complement or alternative to the current group spaces for data that is not being actively used/updated.

This new storage solution is an archive storage option based around object storage, specifically ceph.  We have an initial raw capacity of 1.15PB, with a cost of $80/TB raw space.  In order to calculate the cost per TB of usable space you must consider the replication configuration.  Initially, we will be offering an 6+3 erasure coding configuration which results in a price of $120/TB of usable capacity for the 5 year lifetime of the hardware.  As we currently do with our group space, we will operate this space in a condominium model by reselling this space in TB chunks.  This space is a stand-alone entity, and will not be mounted on other CHPC resources. Therefore to use data stored in this space, users will have to move it to a location such as one of the scratch file systems or an available group space.

One of the key features of the archive system is that users manage the archive directly, unlike the tape archive option. Users can move data in and out of the archive storage as needed -- they can archive milestone moments in their research, store an additional copy of crucial instrument data, or retrieve data as needed. This archive storage solution will be accessible via applications that use Amazon’s S3 API.  GUI tools such as transmit  (for Mac) as well as command-line tools such as s3cmd and rclone can be used to move the data. In addition, globus can be used to access the data.

Please note that this archive storage space is for use in the general environment, and is not for use with regulated data;  CHPC is actively working on vetting this solution for human genomic data that is covered by NIH’s dbGaP policies.

