Running independent serial calculations
Some data analyses require running a lot of similar independent serial calculations. Due to logistics in setting up these calculations, and to CHPC cluster scheduling policies, it is adviseable not to run each single calculation as a separate job. Depending on the characteristics of the calculations, strategies on how to implement them may differ. In this page we list and detail available strategies.
If the calculations take about the same time, there are many ways how to pack them into a single job. If the job runs on a single node, we can combine execution in the background with the wait statement. On multiple nodes, similar thing may be achieved with GNU Parallel, however, calculation distribution is easier with SLURM's srun --multi-prog option.
If the calculations have variable runtime, they need to be somehow scheduled inside of the job in order to efficiently use all available CPU cores on the allocated nodes. GNU Parallel is the easiest choice for running within a single node. For multiple nodes, we have developed a mini-scheduler, called submit, for this purpose.
If the number of calculations is larger (> ~100), we recommend to either split the srun --multi-prog into multiple jobs, or chain the calculations one after with GNU Parallel (on a single node), or with using submit (on multiple nodes). The reason for this is that it takes longer to allocate a larger job, and, in case of using owner-guest queue, the chances of preemption increase as well.
Finally, if there are a lot of serial calculations and unless they are very short, the Open Science Grid (OSG) may be the best choice due to the vast amount resources the OSG provides.
Below is a table that summarizes the best strategies for different types of serial calculations:
|
About the same runtime | Variable runtime |
Single calculation runtime < 15 min |
srun --multi-prog for < ~100 calculations shell script loop - only on a single node submit for chaining > ~100 calculations |
|
Single calculation runtime > 15 min |
srun --multi-prog, multiple jobs if > ~100 calculations |
If the calculation is thread-parallelized, it is also possible to use SLURM job arrays to submit multiple jobs at once, each occupying one full node.
This is the simplest way to run multiple programs within a single job, however, it works only on a single node.
If we only run on a single node, the process is very simple, e.g. in bash:
#!/bin/bash
for (( i=0; i < $SLURM_NTASKS ; i++ )); do
/path_to/myprogram $i &
done
wait
We differentiate between the calculations inside of myprogram through the loop index $i. The & will put the processes in the background, thus allowing to launch all $SLURM_NTASKS of them. The wait statement will cause the script to wait till all of the background processes finish.
The advantage of this approach is the simplicity, the drawback is that it only works on a single node, and for calculations that roughly take the same amount of time.
The --multi-prog
option allows to assign each task in the job a different option. This allows to differentiate
serial runs from each other and run them inside a single parallel Slurm job. This
is our preferred way to launch independent serial calculations that take about the
same time. A basic Slurm job script can look like this:
#!/bin/sh
#SBATCH -n 16
#SBATCH -N 1
#SBATCH -t 1-03:00:00 # 1 day and 3 hours
#SBATCH -p CLUSTER # partition name
#SBATCH -A chpc # account name
#SBATCH -J my_job_name # job name
srun --multi-prog my.conf
Here we submit a job on one node with 16 tasks, and then run the srun with the --multi-prog
option, which is followed by configuration file for the multiple programs. This file
has the following three fields per line, separated by spaces:
- task number
- executable file
- arguments to the executable file
The executable arguments may be augmented by expression "%t" which gets replaced by the task number, and "%o" which gets replaced with task's offset within this range.
Please, note that if the executable is not in the default PATH (as defined when new shell is opened), the full path to this executable has to be specified. The same is true if the executable is a script that is then calling a program. Due to our modules setup, running this script will reset the module environment and as such program modules need to be loaded again inside of this script.
For example, to run quantum chemistry program Mopac, we have mopac.conf as follows:
0-11 ./example.sh %t
Where example.sh script contains:
#!/bin/tcsh
module load mopac
mopac example-$1.mop
A complete example for running multiple serial R simulations using --multi-prog is described on our R documentation page.
We have also developed a simpler multiple serial program launch script which can be obtained here. This script runs as many serial tasks as specified in the #SLURM -n line, each of which uses one entry from the WORKDIR and PROGRAM arrays listed, copies data from WORKDIR to unique scratch directory for each serial task and runs PROGRAM which can be the same or unique for each serial task
GNU Parallel is an easy way to run the same program or script with different arguments (e.g. input files). It also schedules the calculations in case there are more calculations than the job tasks, thus allowing to "queue" the calculations inside of the job and process them as the previous calculations finish. Running GNU Parallel in a single node job is very simple. For example, to compress all the text files in the current directory and its subdirectories:
$ find . -name "*.txt" -print | parallel -j $SLURM_NTASKS gzip
A good short tutorial on a single node GNU parallel use is here.
GNU Parallel is also capable of running on multiple nodes, but the easy way using the --sshloginfile option to list the nodes to run on is not very efficient since all the jobs are sent to the remote nodes at the start. There is a tutorial to set up a GNU parallel queue for each node, however, that load ballance the calculations only within each node, not across all the nodes. Submit or Launcher would be a better choice for a multi-node job.
The submit program allows to run many serial calculations inside of a parallel cluster job using master-worker model. The program is a simple MPI based scheduler which reads a list of calculations to do from a file, one per line, and runs them in parallel, filling as many calculations as there are parallel tasks. Once one calculation is finished, the worker asks for another calculation, which keeps repeating until all calculations are done.
This is our preferred way to run independent serial calculations that may take different amount of time to finish, as long as there are many more calculations than job tasks, as this allows to chain the calculations one after another and fill in the resources better. If one roughly knows the runtime of each calculation, listing them with respect to the calculation time in the descending order, the longest first, will provide the best packing of the calculations on the job tasks.
For the basic documentation, example and source code see the submitGitHub page.
The submit program reads in input file called job.list, which syntax is as follows:
first line - # of serial jobs to run
other lines - command line for these serial jobs (including program arguments). Make
sure there is only single space between the program arguments - more that single space
will break the command line.
For example (for testing purpose), you can make job.list as:
4
/bin/hostname -i
/bin/hostname -i
/bin/hostname -i
/bin/hostname -i
This will run 4 serial jobs, executing the hostname command - which returns name of the node this command ran on.
NOTE - since submit launches the items in job.list directly, it does not use the environment. Therefore we need to specify full path to the command, or, run a shell script (with a full path to the shell script in job.list, where the shell script initializes a new shell session with user default environment).
The differentiation between different calculations can be built into the job.list through program arguments, as shown in the example below.
A complete example using SLURM and a set of serial R calculations, similar to the srun --multi-prog example shown above, can be found on the submit github page or at /uufs/chpc.utah.edu/sys/installdir/submit/std/examples/R.
If you intend to run multiple independent tasks in parallel on a single node, the Linux shell command xargs is a great tool. In its simplest form xargs reads values from the standard input and applies a command to each value that was read. For example, to compress all the text files in the current directory and its subdirectories:
$ find . -name "*.txt" -print | xargs gzip
To run these commands in parallel, simply add the -P option, specifying the number of processes to run concurrently:
$ find . -name "*.txt" -print | xargs -P 5 gzip
This works nicely with SLURM, since the SLURM_NTASKS environment variable is set automatically when the job starts. This variable can be used to define the number of cores to be used for the task:
$ find . -name "*.txt" -print | xargs -P $SLURM_NTASKS gzip
The command passed to xargs can even be a shell function, assuming your SLURM script uses bash, and the function has been exported using "export -f". An alternative to reading the command arguments from the standard input is to specify a file of commands using the -a or --arg-file options. Consult the manual page on xargs for more details.
Launcher is a utility from the Texas Advanced Computing Center (TACC) that simplifies the task of running multiple parallel tasks within a single multi-node SLURM job. To use launcher you must enter your commands into a file, create a SLURM script to start launcher, and submit your SLURM script using sbatch. Here is an example launcher SLURM script:
#!/bin/bash
#
# Simple SLURM script for submitting multiple serial jobs (e.g.
# parametric studies) using launcher to launch the jobs.
#
#SBATCH -J Parametric
#SBATCH -N 4
#SBATCH -n 8
#SBATCH -o Parametric.o%j
#SBATCH -t 00:05:00
#SBATCH --account=your_cluster_account
#SBATCH --partition=your_cluster_partition
# The script must load the launcher module:
module load launcher
# The script must set the LAUNCHER_JOB_FILE environment variable to the name of your command file:
export LAUNCHER_JOB_FILE=helloworldmulti
# Finally, your script must call the "paramrun" command:
paramrun
Finally, submit your SLURM script with the sbatch command. In this example, the job will use 4 nodes (-N 4) and will run your commands in parallel, 2 on each node, for a total of 8 concurrent processes (-n 8). The output will be written to the file Parametric.o#####, where ##### is the SLURM job id number. Here is a snippet of the command file from this example:
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
echo "Hello, World! from job $LAUNCHER_JID running on task $LAUNCHER_TSK_ID, host `hostname`"
...
If the commands in your command file need to specify the number of cores to use for each task (for example, if your command file consists of "mpirun -np $cores_per_task ..." commands) then you need to calculate the number of cores yourself; unfortunately, there is no single launcher or SLURM variable that contains this information. However, the core count per task can be calculated in your SLURM script using values provided by SLURM. Depending upon the shell you use:
# In bash:
export cores_per_task=$(( $SLURM_CPUS_ON_NODE * $SLURM_NNODES / $SLURM_NTASKS ))
# In csh / tcsh:
@ cores_per_task = $SLURM_CPUS_ON_NODE * $SLURM_NNODES / $SLURM_NTASKS
Notice that the variable "cores_per_task" is exported to the environment by the SLURM script - this is required for the value to be available in the commands started by launcher.
Launcher is described in more detail here: https://www.tacc.utexas.edu/research-development/tacc-software/the-launcher and here: https://github.com/TACC/launcher .