Scheduling Jobs at the CHPC with Slurm

Slurm is a scalable, open-source scheduler used by over 60% of the world's top clusters and supercomputers. There are several short training videos about Slurm, including concepts such as batch scripts and interactive jobs.

On this page

The table of contents requires JavaScript to load.

About Slurm

Slurm – Simple Linux Utility for Resource Management, is used for managing job scheduling on clusters. It was originally created by people at the Livermore Computing Center and has grown into a full-fledged open-source software backed up by a large community, commercially supported by the original developers, and is installed in many of the Top500 supercomputers. The Slurm development team is based close by to the University of Utah in Lehi, Utah.

You may submit jobs to the Slurm batch system in two ways:

Submitting a batch script
Submitting for an interactive job

Using Slurm to Submit Jobs: #SBATCH Directives

To create a batch script, use your favorite text editor to create a text file that details job requirements and instructions on how to run your job.

All job requirements passed to Slurm are prefaced by #SBATCH directives. The #SBATCH commands are used to pass computational requirements of your job to Slurm, which Slurm uses to determine what resources to give to your job. The two most important parameters that you can include are account and partition information, in the form:

#SBATCH --account=<youraccount>

#SBATCH --partition=<yourpartition>

Accounts are typically named as your group name, which is likely your PI's lastname. If your group has owner nodes, the account is usually <unix_group>-<cluster_abbreviation> (where cluster abbreviation is np, kp, lp, rw, ash). There are other types of accounts, typically named for specific partitions. These can include owner-guest, <cluster>-gpu, notchpeak-shared-short, and smithp-guest.

Partitions are virtual groups of node types. Naming mechanisms include cluster, cluster-shared, cluster-gpu, cluster-gpu-guest, cluster-guest, cluster-shared-guest, and pi-cl, where cluster is the full name of the cluster and cl is the abbreviated form. We have our partition names described here.

How to Determine which Slurm Accounts you are in

The easiest method to find the accounts and partitions you have access to at the CHPC is to use the mychpc batch command. This command will output the cluster, the applicable account and partition for that cluster, and your allocation status for that partition.

An example would look like the below:

GENERAL

CPU --partition=kingspeak-shared --qos=kingspeak --account=baggins [21% idle]

The above shows a general (i.e. non-preemptable) allocation on the kingspeak cluster under the baggins account within the kingspeak-shared partition. It also indicates how much of the partition is available without a wait - in this example, 21% of the CPUs within the kingspeak-shared partition are idle and available for jobs.

If you notice anything incorrect in the output from the mychpc batch command that you feel should be changed, please let us know.

Other Important #SBATCH Directives

Other important #SBATCH parameters to inform Slurm of include the amount of time your job will run, number of nodes needed, number of cpus/tasks needed, amount of memory needed, and specifications for stdout and stderr files. These are designated as such:

#SBATCH --time=DD-HH:MM:SS #DD is days, HH is hours, MM is minutes, SS is seconds

#SBATCH --nodes=<number-of-nodes>

#SBATCH --ntasks=<number-of-cpus>

#SBATCH --mem=<size>[units]

#SBATCH -o slurmjob-%j.out-%N #stdout file in format slurmjob-SLURM_JOB_ID.out-NODEID

#SBATCH -e slurmjob-%j.err-%N #stderr file in format slurmjob-SLURM_JOB_ID.err-NODEID

*Note*There is a walltime limit of 72 hours for jobs on general cluster nodes and 14 days on owner cluster nodes. If your job requires more time than these hard limits, you can email the CHPC at helpdesk@chpc.utah.edu, providing the job ID, cluster, and length of time you would like to extend the job to.

Where to Run Your Slurm Job

There are three main places you can run your job: your home directory, /scratch spaces, or group spaces (available if your group has purchased group storage). This will determine where I/O is handled during the duration of your job. Each has its own benefits, outlined below:

Home	Scratch	Group Space
Free	Free	$150/TB without backups
Automatically provisioned per user	60 day automatic deletion of untouched files	$450/TB with backups
50 GB soft limit	Two files systems: vast and nfs1	Is shared among your group

Due to the memory limits in each users home directory, we recommend setting up your jobs to run in our scratch file systems. It must be noted that files in the CHPC's scratch file systems will be deleted if untouched for 60 days.

To run jobs in the CHPC scratch file systems (vast or nfs1), place the following commands in your Slurm batch script. The commands that you use depend on what Linux shell you have. Unsure? Type 'echo $SHELL' in your terminal.

BASH	TCSH
SCRDIR=/scratch/general/<file-system>/$USER/$SLURM_JOB_ID mkdir -p $SCRDIR cp <input-files> $SCRDIR cd $SCRDIR	set SCRDIR = /scratch/general/<file-system>/$USER/$SLURM_JOB_ID mkdir -p $SCRDIR cp <input-files> $SCRDIR cd $SCRDIR

BASH

TCSH

SCRDIR=/scratch/general/<file-system>/$USER/$SLURM_JOB_ID

mkdir -p $SCRDIR

cp <input-files> $SCRDIR

cd $SCRDIR

set SCRDIR = /scratch/general/<file-system>/$USER/$SLURM_JOB_ID

mkdir -p $SCRDIR

cp <input-files> $SCRDIR

cd $SCRDIR

Replace <file-system> with either vast or nfs1.

$USER points to your uNID and $SLURM_JOB_ID points to the job ID that Slurm assigned your job.

Running Your Program in Slurm

To run the software/script you have against your input data, simply pass the same commands that you would use at the command line to your Slurm script.

Putting it all Together: An Example Slurm Script

Below is an example job that combines all of the information from above. In this example below, we will suppose your PI is Frodo Baggins (group ID baggins) and is requesting general user access to 1 lonepeak node with at least 8 cpus and 32GB of memory. The job will run for two hours.

#!/bin/bash
#SBATCH --account=baggins
#SBATCH --partition=lonepeak
#SBATCH --time=02:00:00
#SBATCH --ntasks=8
#SBATCH --mem=32G
#SBATCH -o slurmjob-%j.out-%N
#SBATCH -e slurmjob-%j.err-%N

#set up scratch directory
SCRDIR=/scratch/general/vast/$USER/$SLURM_JOB_ID
mkdir -p $SCRDIR

#copy input files and move over to the scratch directory
cp inputfile.csv myscript.r $SCRDIR
cd $SCRDIR

#load your module
module load R/4.4.0

#run your script
Rscript myscript.r inputfile.csv

#copy output to your home directory and clean up
cp outputfile.csv $HOME
cd $HOME
rm -rf $SCRDIR

NOTE When specifying an account or paritition, you may use either an equals sign or a space before the account or parition name, but you may not use both in the same line. For example, "#SBATCH --account=kingspeak-gpu" and "#SBATCH --account kingspeak-gpu" are acceptable, but "#SBATCH --account = kingspeak-gpu" is not.

For more examples of SLURM jobs scripts see CHPC MyJobs templates.

Submitting your Job to Slurm

In order to submit a job, one has to be logged onto the CHPC systems. Once logged on, job submission is done with the sbatch command in slurm.

For example, to submit a script named SlurmScript.sh, type:

sbatch SlurmScript.sh

NOTE: sbatch by default passes all environment variables to the compute node, which differs from the behavior in PBS (which started with a clean shell). If you need to start with a clean environment, you will need to use the following directive in your batch script:

#SBATCH --export=NONE

This will still execute .bashrc/.tcshrc scripts, but any changes you make in your interactive environment will not be present in the compute session. As an additional precaution, if you are using modules, you should use module purge to guarantee a fresh environment.

Checking the Status of your Job

To check the status of your job, use the squeue command. The output from the squeue command on its own will output all jobs currently submitted to the cluster you are logged onto. You can filter the output of squeue to jobs that only pertain to you in a number of ways:

squeue --me
squeue -u uNID
squeue -j job#

Adding -l (for "long" output) gives more details in the squeue output.

Special Circumstances

Slurm Reservations

Upon request we can create reservations for users to guarantee node availability via an email to helpdesk@chpc.utah.edu. Once a reservation is in place, reservations can be passed to Slurm with the --reservation flag (abbreviated as -R ) followed by the reservation name.

For policies regarding reservations see the Batch Policies document.

QOS

Every account (found through the 'mychpc batch' command) is associated with at least one QOS, otherwise known as Quality of Service. The QOS dictates a job's base priority. In some cases, there may be multiple QOS's associated with a single account that differ on preemption status and maximum job walltime.

One example of multiple QOS's to a single account is when a user needs to override the normal 3 day wall time limit. In this case, the user can request access to a special long QOS that we have set up for the general nodes of a cluster, <cluster>-long, that allow for a longer wall time to be specified. In order to get access to the long QOS of a given cluster, send a request with an explanation on why you need a longer wall time to helpdesk@chpc.utah.edu.

Requesting GPUs

If you would like to request GPU resources, please refer to the following page. It also includes information on how to use the GPUs.

Slurm Job Arrays

Slurm arrays enable quick submission of many related jobs. In this case, Slurm provides an environment variable, SLURM_ARRAY_TASK_ID, which differentiates Slurm jobs with an array by a given index number.

For example, if we need to run the same program against 30 different samples, we can utilize Slurm arrays to run the program across the 30 different samples with a naming convention such as sample_[1-30].data using the following script:

#!/bin/bash
#SBATCH -n 1 # Number of tasks 
#SBATCH -N 1 # All tasks on one machine 
#SBATCH -p PARTITION # Partition on some cluster
#SBATCH -A ACCOUNT # The account associated with the above partition
#SBATCH -t 02:00:00 # 2 hours (D-HH:MM) 
#SBATCH -o myprog%A%a.out # Standard output 
#SBATCH -e myprog%A%a.err # Standard error
#SBATCH --array=1-30

./myprogram input_$SLURM_ARRAY_TASK_ID.data

You can also limit the number of jobs that can be running simultaneously to "n" by adding a %n after the end of the array range:

#SBATCH --array=1-30%5

Apart from $SLURM_ARRAY_TASK_ID, Slurm also utilizes a few environmental variables to represent various variables important to Slurm arrays. These include:

%A and %a, which represent the job ID and the job array index, respectively. These can be used in the #SBATCH parameters to generate unique names.
SLURM_ARRAY_TASK_COUNT is the number of arrays.
SLURM_ARRAY_TASK_MAX is the highest job array index value.
SLURM_ARRAY_TASK_MIN is the lowest job array index value.

When submitting jobs that use less than the full CPU count per node, use the shared partitions to allow multiple array jobs on one node. For more information, see the Node Sharing page.

Depending on the characteristics of your job, there may be a number of other solutions you could use, detailed on the running multiple serial jobs page.

Interactive Batch Jobs

Submitting for an interactive job can happen interactively on the command line. In order to launch an interactive session on a compute node, use the salloc command and pass flags to it using the same format for #SBATCH directives:

salloc --time=02:00:00 --ntasks 2 --nodes=1 --account=baggins --partition=lonepeak

The salloc flags can be abbreviated as:

salloc -t 02:00:00 -n 2 -N 1 -A baggins -p lonepeak

CHPC cluster queues tend to be very busy; it may take some time for an interactive job to start. For this reason, we have added two nodes in a special partition on the notchpeak cluster that are geared more towards interactive work. Job limits on this partition are 8 hours wall time, a maximum of ten submitted jobs per user, with a maximum of two running jobs with a maximum total of 32 tasks and 128 GB memory. To access this special partition, notchpeak-shared-short, request both an account and partition under this name, e.g.:

salloc -N 1 -n 2 -t 2:00:00 -A notchpeak-shared-short -p notchpeak-shared-short

Handy Slurm Information

Slurm User Commands

Slurm Command	What it does
sinfo	reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options. For a personalized view, showing only information about the partitions to which you have access, see mysinfo.
squeue	reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. For a personalized view, showing only information about the jobs in the queues/partitions to which you have access, see mysqueue.
sbatch	is used to submit a job script for later execution. The script will typically contain one or more #SBATCH directives.
scancel	is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
sacct	is used to report job or job step accounting information about active or completed jobs.
srun	is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation.
spart	list partitions and their utilization
pestat	list efficiency of cluster utilization on per node, per user, or per partition basis. By default it prints utilization of all cluster nodes. To select only nodes utilized by an user, run `pestat -u $USER`.

Useful Slurm Aliases

Bash to add to .aliases file:
#SLURM Aliases that provide information in a useful manner for our clusters
alias si="sinfo -o \"%20P %5D %14F %8z %10m %10d %11l %32f %N\""
alias si2="sinfo -o \"%20P %5D %6t %8z %10m %10d %11l %32f %N\""
alias sq="squeue -o \"%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R\""

Tcsh to add to .aliases file:
#SLURM Aliases that provide information in a useful manner for our clusters
alias si 'sinfo -o "%20P %5D %14F %8z %10m %11l %32f %N"'
alias si2 'sinfo -o "%20P %5D %6t %8z %10m %10d %11l %32f %N"'
alias sq 'squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"'

sview GUI Tool

sview is a graphical user interface to view and modify a Slurm state. Run it by typing sview in a FastX (or X-11 forwarded) terminal session. It is useful for viewing partitions, nodes characteristics, and information on jobs. Right clicking on the job, node, or partition allows you to perform actions on them. Use this carefully so as not to accidentally modify or remove your job.

sview

Logging Onto Computational Nodes: Checking Job Stats

Sometimes it is useful to connect to the node(s) where a job runs to monitor the executable and determine if it is running correctly and efficiently. For that, we allow users with active jobs on compute nodes to ssh to these compute nodes. To determine the name of your compute node, run the squeue -u $USER command, and then ssh to the node(s) listed.

Once logged onto the compute node, you can run the top command to view CPU and memory usage of the node. If using GPUs, you can view GPU usage through the nvidia-smi command.

Other Good Sources of Information

http://slurm.schedmd.com/pdfs/summary.pdf This is a two page summary of common SLURM commands and options.
http://slurm.schedmd.com/documentation.html Best source for online documentation
http://slurm.schedmd.com/slurm.html
http://slurm.schedmd.com/man_index.html
man <slurm_command> (from the command line)
http://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html A more complete comparison table between slurm and moab
http://www.schedmd.com/slurmdocs/rosetta.pdf is a table of slurm commands and their counterparts in a number different batch systems