Data Transfer Node Access via Slurm

The CHPC has enabled the use of the Data Transfer Nodes via the Slurm job scheduler in the General Environment and Protected Environment. While each of the DTN nodes has 24 cores and 128 GB of RAM, only 12 cores and 96 GB of RAM are made available to run Slurm jobs.

Cluster

SLURM Partition

Slurm Account

Nodes

notchpeak

notchpeak-dtn

dtn

dtn05, dtn06, dtn07, dtn08

redwood

redwood-dtn

dtn

pe-dtn03, pe-dtn04

All CHPC user accounts in both the General and the Protected Environment have been set up to use the DTNs.

The notchpeak data transfer nodes have 100 gigabit per second connections to the University’s Science DMZ, a segment of the university network with streamlined data flow across the campus firewall to and from off-campus locations. In the redwood cluster, these nodes are connected at 40 gigabits per second.

Not all data transfer applications can take advantage of the high network bandwidth provided by the data transfer nodes; you should test your data transfer application to determine whether running it on a data transfer node yields better performance, and to determine which file system yields the best performance. Our scratch file systems are excellent choices as the source or destination of high-speed data transfers.

How to specify requested resources:

The notchpeak-dtn and redwood-dtn Slurm partitions are similar to other shared Slurm partitions at the CHPC, with multiple transfer jobs sharing a node. By default, each Slurm job running on a data transfer node is allocated a single core and 2 GB of memory. The notchpeak-dtn and redwood-dtn quality of service (QoS) has a maximum time limit of 72 hours per job, so transfers must complete within that time period.

Setting up a basic Slurm script to do only a download is straightforward. Using the appropriate account and partition from the table above, you need to navigate to the directory containing the data being transferred from CHPC, or in the case of moving data to CHPC, you need to create and move to the directory into which you will be moving data. This is followed by the transfer by whichever mechanism you wish. Here is a quick example, using wget to download a file:

#!/bin/tcsh

#SBATCH --partition=notchpeak-dtn

#SBATCH --account=dtn

#SBATCH --time=1:00:00

#SBATCH -o slurm-%j.out-%N

#SBATCH -e slurm-%j.err-%N s

setenv SCR /scratch/general/lustre/$USER/$SLURM_JOB_ ID

mkdir -p $SCR

cd $SCR

wget https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2020/CRND0103-2020-AK_Aleknagik_1_NNE.txt

For redwood, you would only need to change the partition and the scratch file system in the above example. For parallel transfers, users can request the required number of cores and memory using #SBATCH directives.

You can also build a transfer using the Slurm-capable data transfer nodes into a Snakemake workflow, by creating a Snakemake rule responsible for downloading the data:

rule download_data:

output: “CRND0103-2020-AK_Aleknagik_1_NNE.txt”

message: “Downloading data file {output}.”

shell: “wget https://www1.ncdc.noaa.gov/pub/data/ uscrn/products/daily01/2020/{ouput}”

A typical cluster configuration file for a snakemake workflow would execute all the workflow rules on compute nodes:

# cluster.yaml - cluster configuration for snakemake workflow

__default__:

cluster: notchpeak

partition: notchpeak-shared

account: jones

nodes: 1

ntasks: 1

time: 01:00:00

With these additional lines we can direct the execution of the download_data rule to a data transfer node:

download_data:

partition: notchpeak-dtn

account: dtn