Running Independent Serial Calculations with Slurm

Some data analyses require running a lot of similar, independent serial calculations. Due to CHPC cluster scheduling policies and the logistics in setting up these calculations, it is adviseable not to run each single calculation as a separate job. Depending on the characteristics of the calculations, strategies may differ. These strategies differ mainly by time and number of jobs. In this page we list and detail available strategies.

If the calculations take about the same time, there are many ways to pack them into a single job. If the job runs on a single node, we can combine the execution in the background with the wait statement. On multiple nodes, similar thing may be achieved with GNU Parallel, however, calculation distribution is easier with SLURM's srun --multi-prog option.

If the calculations have variable runtime, they need to be somehow scheduled inside of the job in order to efficiently use all available CPU cores on the allocated nodes. GNU Parallel is the easiest choice for running within a single node. For multiple nodes, we have developed a mini-scheduler, called submit, for this purpose.

If the number of calculations is larger (> ~100), we recommend to either split the srun --multi-prog into multiple jobs, or chain the calculations one after with GNU Parallel (on a single node), or with using submit (on multiple nodes). The reason for this is that it takes longer to allocate a larger job, and, in case of using owner-guest queue, the chances of preemption increase as well.

Finally, if there are a lot of serial calculations and unless they are very short, the Open Science Grid (OSG) may be the best choice due to the vast amount resources the OSG provides. If the calculation is thread-parallelized, it is also possible to use Slurm job arrays to submit multiple jobs at once, each occupying one full node.

Independent Serial Calculations with Similar Run Time
Multiple Calculations in the Background
Multiple Program Configuration with srun
Independent Serial Calculations with Variable Run Time
Gnu Parallel
Submit Mini-Scheduler
Xargs
Launcher
Calculating the Number of Cores per Task
Summary Table of Strategy by Type of Serial Calculation
Other CHPC Documentation on Slurm

Independent serial calculations with similar run time

Multiple calculations in the background

This is the simplest way to run multiple programs within a single job, however, it works only on a single node.

If we only run on a single node, the process is very simple, e.g. in bash:

#!/bin/bash
for (( i=0; i < $SLURM_NTASKS ; i++ )); do
   /path_to/myprogram $i &
done
wait

We differentiate between the calculations inside of myprogram through the loop index, $i. The "&" will put the processes in the background, thus launching all $SLURM_NTASKS. The wait statement will cause the script to wait until all of the background processes finish.

The advantage of this approach is the simplicity; the drawback is that it only works on a single node and for calculations that roughly take the same amount of time.

Srun multiple program configuration

The --multi-prog option allows users to assign each task in the job a different option through the use of a configuration file. This allows to both differentiate serial runs from each other and run them inside a single parallel Slurm job. This is our preferred way to launch independent serial calculations that take about the same time. A basic Slurm job script can look like this:

#!/bin/sh
#SBATCH -n 16 
#SBATCH -N 1
#SBATCH -t 1-03:00:00 # 1 day and 3 hours
#SBATCH -p CLUSTER # partition name
#SBATCH -A chpc # account name
#SBATCH -J my_job_name # job name

srun --multi-prog my.conf

In the example above, we submit one job on one node with 16 tasks, and then use the srun command with the --multi-prog option, followed by a configuration file for the multiple programs. This file has the following three fields per line, separated by spaces:

task number
executable file
arguments to the executable file

The executable arguments may be augmented by the expression "%t", which gets replaced by the task number, and "%o", which gets replaced with tasks offset within this range.

Please, note that if the executable is not in the default $PATH (as defined when a new shell is opened or when a module is loaded), the full path to this executable has to be specified. The same is true if the executable is a script that is then calling a program. Due to our modules setup, running this script will reset the module environment and as such program modules need to be loaded again inside of this script.

For example, to run quantum chemistry program Mopac, we have mopac.conf as follows:

0-11 ./example.sh %t

Where example.sh script contains:

#!/bin/tcsh
module load mopac
mopac example-$1.mop

A complete example for running multiple serial R simulations using --multi-prog is described on our R documentation page.

We have also developed a simpler multiple serial program launch script that can be obtained here. This script runs as many serial tasks as specified in the #SLURM -n line, where each task uses one entry from the WORKDIR and PROGRAM arrays listed, copies data from WORKDIR to unique scratch directory for each serial task, and runs the PROGRAM, which can be the same program or a unique program for each serial task.

Independent Serial Calculations with Variable Run Time

GNU Parallel

GNU Parallel is an easy way to run the same program or script with different arguments (e.g. input files). It also schedules the calculations in case there are more calculations than the job tasks, thus allowing to "queue" the calculations inside of the job and process them as the previous calculations finish. Running GNU Parallel in a single node job is very simple. For example, to compress all the text files in the current directory and its subdirectories:

$ find . -name "*.txt" -print | parallel -j $SLURM_NTASKS gzip

A good short tutorial on a single node GNU parallel use is here.

GNU Parallel is also capable of running on multiple nodes, but the easiest way to invoke this is through the --sshloginfile option. This option lists the nodes to run on but is not very efficient since all the jobs are sent to the remote nodes at the start. There is a tutorial to set up a GNU parallel queue for each node, however, that load balances the calculations only within each node, not across all the nodes. The tools Submit or Launcher would be a better choice for a multi-node job.

Submit Mini-Scheduler

The Submit program allows one to run many serial calculations inside of a parallel cluster job using a controller-worker model. The program is a simple MPI-based scheduler that reads a list of calculations to do from a file, one per line, and runs them in parallel, filling as many calculations as there are parallel tasks. Once one calculation is finished, the worker asks for another calculation, which keeps repeating until all calculations are done.

This is our preferred way to run independent serial calculations that may take different amounts of time to finish. As long as there are many more calculations than job tasks, this allows to chain the calculations one after another and fill in the resources better. If one roughly knows the runtime of each calculation, listing them with respect to the calculation time in the descending order, the longest first, will provide the best packing of the calculations on the job tasks.

For the basic documentation, an example, and source code, see the submit GitHub page.

The Submit program reads in an input file called job.list, which syntax is as follows:
first line - # of serial jobs to run
all other lines - command for these serial jobs (including program arguments). Make sure there is only single space between the program arguments - more than a single space will break the command line.

For example (or for testing purposes), you can make your job.list file as:

4
/bin/hostname -i
/bin/hostname -i
/bin/hostname -i
/bin/hostname -i

The above will run 4 serial jobs and execute the hostname command, which returns name of the node this command ran on.

NOTE - since Submit launches the items in job.list directly, it does not use the environment. Therefore, we need to specify full path to the command or run a shell script with a full path to the shell script in job.list, where the shell script initializes a new shell session with user default environment.

The differentiation between different calculations can be built into the job.list through program arguments, as shown in the example below.

A complete example using Slurm and a set of serial R calculations, similar to the srun --multi-prog example shown above, can be found on the Submit GitHub page or at /uufs/chpc.utah.edu/sys/installdir/submit/std/examples/R/.

Running Multiple Tasks on a Single Node with Xargs

If you intend to run multiple independent tasks in parallel on a single node, the Linux shell command xargs is a great tool. In its simplest form, xargs reads values from the standard input and applies a command to each value that was read.

For example, to compress all the text files in the current directory and its subdirectories:

$ find . -name "*.txt" -print | xargs gzip

To run these commands in parallel, simply add the -P option, specifying the number of processes to run concurrently:

$ find . -name "*.txt" -print | xargs -P 5 gzip

This works nicely with Slurm, since the $SLURM_NTASKS environment variable is set automatically when the job starts. This variable can be used to define the number of cores to be used for the task:

$ find . -name "*.txt" -print | xargs -P $SLURM_NTASKS gzip

The command passed to xargs can even be a shell function, assuming your Slurm script uses bash and the function has been exported with export -f. An alternative to reading the command arguments from the standard input is to specify a file of commands using the -a or --arg-file options. Consult the manual page on xargs for more details.

Launcher

Launcher is a utility developed by the Texas Advanced Computing Center (TACC) that simplifies the task of running multiple parallel tasks within a single multi-node Slurm job. To use Launcher, you must enter your commands into a file, create a Slurm script to start launcher, and submit your Slurm script using sbatch. Here is an example Slurm script that starts Launcher:

#!/bin/bash 
# 
# Simple Slurm script for submitting multiple serial jobs (e.g.  
# parametric studies) using Launcher to launch the jobs. 
#  
#SBATCH -J Parametric  
#SBATCH -N 4 
#SBATCH -n 8 
#SBATCH -o Parametric.o%j 
#SBATCH -t 00:05:00 
#SBATCH --account=associated_account
#SBATCH --partition=your_partition_of_choice

# The script must load the launcher module:
module load launcher

# The script must set the LAUNCHER_JOB_FILE environment variable to the name of your command file:
export LAUNCHER_JOB_FILE=helloworldmulti 

# Finally, your script must call the "paramrun" command: 
paramrun

Submit your Slurm script with the sbatch command. In the above example, the job will use 4 nodes (#SBATCH -N 4) and will run your commands in parallel, 2 on each node, for a total of 8 concurrent processes (#SBATCH -n 8). The output will be written to the file Parametric.o#####, where ##### is the Slurm job ID number. Here is a snippet of the command file from this example:

echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
...

Calculating the Number of Cores per Task

If the commands in your command file need to specify the number of cores to use for each task (for example, if your command file consists of "mpirun -np $cores_per_task ..." commands) then you need to calculate the number of cores yourself. Unfortunately, there is no single Launcher or Slurm variable that contains this information. However, the core count per task can be calculated in your Slurm script using values provided by Slurm. Depending upon the shell, you use:

# In bash:
export cores_per_task=$(( $SLURM_CPUS_ON_NODE * $SLURM_NNODES / $SLURM_NTASKS ))

# In csh / tcsh:
@ cores_per_task = $SLURM_CPUS_ON_NODE * $SLURM_NNODES / $SLURM_NTASKS

The variable "cores_per_task" is exported to the environment within the Slurm script - this is required for the value to be available in the commands started by Launcher.