Skip to content

Data Transfer Node Access via SLURM

CHPC now has enabled the use of the data transfer nodes via the SLURM job scheduleron both notchpeak in the general environment as well as redwood in the protected environment. While each of the dtn nodes have 24 cores and 128 GB of RAM, only 12 cores and 96 GB of RAM are made available to run SLURM jobs.

Cluster  SLURM Partition SLURM Account Nodes
notchpeak notchpeak-dtn dtn dtn05, dtn06, dtn07, dtn08
redwood redwood-dtn dtn pe-dtn03, pe-dtn04

All CHPC user accounts in both the general and the protected environment have been set up to use the dtns.

The notchpeak data transfer nodes have 100 gigabit per second connections to the University’s Science DMZ, a segment of the university network with streamlined data flow across the campus firewall to and from off-campus locations. In the redwood cluster, these nodes are connected at 40 gigabits per second.

Not all data transfer applications can take advantage of the high network bandwidth provided by the data transfer nodes – you should test your data transfer application to determine whether running it on a data transfer node yields better performance, and which file system yields the best performance. Our scratch file systems are excellent choices as the source or destination of high-speed data transfers.

How to specify requested resources:

The notchpeak-dtn and redwood-dtn SLURM partitions are similar to other shared SLURM partitions at CHPC, with multiple transfer jobs sharing a node. By default, each SLURM job running on a data transfer node is allocated a single core and 2 GB of memory. The notchpeak-dtn and redwood-dtn quality of service (QOS) has a maximum time limit of 72 hours per job, so transfers must complete within that time period.

Setting up a basic slurm script to do only a download is straightforward. Using the appropriate account and partition from the table above, you need to navigate to the directory containing the data being transferred from CHPC, or in the case of moving data to CHPC, you need to create and move to the directory into which you will be moving data. This is followed by the transfer by whichever mechanism you wish. Here is a quick example, using wget to download a file:

#!/bin/tcsh 

#SBATCH --partition=notchpeak-dtn

#SBATCH --account=dtn

#SBATCH --time=1:00:00

#SBATCH -o slurm-%j.out-%N

#SBATCH -e slurm-%j.err-%N s

setenv SCR /scratch/general/lustre/$USER/$SLURM_JOB_ ID

mkdir -p $SCR

cd $SCR

wget https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2020/CRND0103-2020-AK_Aleknagik_1_NNE.txt

 

For redwood, you would only need to change the partition and the scratch file system in the above example. For parallel transfers, users can request the required number of cores and memory using #SBATCH directives.

You can also build a transfer using the SLURM-capable data transfer nodes into a snakemake workflow, by creating a snakemake rule responsible for downloading the data:

rule download_data:

output: “CRND0103-2020-AK_Aleknagik_1_NNE.txt”

message: “Downloading data file {output}.”

shell: “wget https://www1.ncdc.noaa.gov/pub/data/ uscrn/products/daily01/2020/{ouput}”

A typical cluster configuration file for a snakemake workflow would execute all the workflow rules on compute nodes:

# cluster.yaml - cluster configuration for snakemake workflow

__default__:

cluster: notchpeak

partition: notchpeak-shared

account: jones

nodes: 1

ntasks: 1

time: 01:00:00

 

With these additional lines we can direct the execution of the download_data rule to a data transfer node:

download_data:

partition: notchpeak-dtn

account: dtn

 

DTN local storage

Both sets of new schedulable DTNs mount a CephFS filesystem: 44 TB in capacity in the general and 22 TB capacity in the PE. Each filesystem is built entirely on fast NVME drives and was designed as a staging area for transfers. If your workflow or transfer could benefit from staging your data to these filesystems please be aware that due to their smaller capacities they will be aggressively scrubbed. Currently the scrub policy on these filesystems is 14 days. They are mounted at /scratch/general/dtn and /scratch/general/pe-dtn respectively, and are symlinked to /scratch/local.

 

 

Last Updated: 7/5/23