You are here:

Matlab

CHPC administers a joined license with College of Mines and Earth Sciences (CMES) which includes 56 Matlab seats and toolboxes listed in FAQ 5 here. We also have a 160 processor license of the Distributed Computing Server (DCS) which allows one to run in parallel on multiple nodes. The Linux version of Matlab for CHPC and CMES Linux desktops and clusters is installed in /uufs/chpc.utah.edu/sys/installdir/matlab/std. We also install Matlab on CHPC and CMES Windows and Mac machines on demand.

Researchers with desktop admin rights affiliated with CHPC or CMES can contact issues@chpc.utah.edu for information how to install Matlab. Other CHPC users should purchase Matlab license from OSL.

Matlab on Linux machines

Matlab, including many toolboxes and DCS is installed on all clusters and Linux desktops in /uufs/chpc.utah.edu/sys/installdir/matlab. There are different versions of Matlab available, which can be accessed by loading the appropriate version module. If the module version is not specified, the latest version will be loaded.

To run Matlab, first load Matlab module to set up it in your environment:
module load matlab

Single instance Matlab including Parallel Computing Toolbox on one node

Although Matlab will run on the interactive nodes, please, take note that we don't allow running executables longer than ca. 15 minutes on the interactive nodes due to the load they put on them and inconvenience to other users. For that reason, we recommend to run Matlab through interactive Slurm session.

Note that running a single Matlab in a job, even if just on one node, may not efficiently utilize the multi-core node. Matlab uses internal multi-threading by default, running as many threads as there are available cores on the node, but, not all internal Matlab functions are multi-threaded. This Mathworks document has some information on multi-threading. To evaluate speedup from internal multi-threading, use the maxNumCompThreads function as described here.

To run one Malab instance in a cluster job, follow these steps:

  • Start interactive Slurm session, with X forwarding e.g.:

 

srun -t 2:00:00 -n 1 -N 1 -p cluster -A account --pty /bin/tcsh -l
  • Load Matlab environment and run Matlab
    module load matlab
    matlab

The above method has one serious limitation - it requires running interactive Slurm job on the compute nodes, which can sometime be difficult to obtain quickly due to the load on the cluster. It is therefore recommended, once you make sure your Matlab program runs as intended, to run it non-interactively through Slurm scripts.

Our preferred way is to create a wrapper Matlab script to run the program of choice and run this wrapper right after Matlab launch via the -r Matlab start option. The best way to implement this is to create a launch script that has the following three lines:

addpath path_to_my_matlab_script
my_matlab_script
exit

This script adds to Matlab path the path to the program we want to run, run the program and then exit Matlab. The exit is important since if we don't exit, the Matlab will hang till the job runs out of walltime. See file run_matlab.m for an example of the wrapper Matlab script.
Once the script is in place, in your Slurm script file, just cd to the directory with your data files, and run Matlab as:
matlab -nodisplay -r my_launch_script -logfile my_log.out

Here we are telling Matlab to start without the GUI (as we don't need it in the batch session), start the launch script my_launch_script.m and log the Matlab output to my_log.out. See file run_matlab.slr for an example of a Slurm script that launches Matlab with the run_matlab.m script.

Alternatively, consider compiling the Matlab programs using the Matlab Compiler and run them as a standalone executable. In this case, you don't call Matlab in the Slurm script; call the compiled executable itself (that is, just replace the matlab -r .... line with the name of the compiled executable). The advantage of this approach is calling a single executable instead of the whole Matlab environment. The disadvantage is less flexibility in editing the Matlab source between the run since that requires recompilation of the executable. The compilation itself is an extra step which can be complicated if the Matlab program is large.

Compiling Matlab program is usually fairly simple. First make sure that all your Matlab programs are functions, not scripts. Function is a code that starts with function statement. Suppose we have functions main.mf1.m and f2.mmain.m is the main function. To compile these three into an executable, do:
mcc -m main f1 f2. This will produce executable named main. There are some limitations in the compilation. For this and other details, consult the Matlab Compiler help page.

Note that if you are running simulatneously more than one Matlab compiled executables, set the MCR_CACHE_ROOT environment variable to a unique location for each run. This variable specifies the Matlab Runtime cache location. By default it is ~/.mcrCache, which is shared by all the runs, and may lead to the cache corruption. When running multiple SLURM jobs, set MCR_CACHE_ROOT=/scratch/local/$SLURM_JOB_ID.

When running a single instance of Matlab, the parallelization is limited only to the threads internal to Matlab. From our experience some Matlab routines thread quite well, while some not much and some are not threaded at all. It is a good idea to run the top command to monitor how much CPU usage Matlab uses, we want to see the MATLAB process to use up to 100% times number of CPU cores on the node.

To run multiple parallel Matlab workers, use the Parallel Computing Toolbox as described in the Parallel Matlab on a desktop section below, or, if you need more than ca. 20 workers that can be accommodated by a single node, use the Matlab Distributed Computing Server.

Local parallel Matlab on a desktop or a compute node

The easiest way to run Matlab in parallel is to use Parallel Computing Toolbox (PCT) directly on a single node. To start PCT simply use command parpool with the arguments being the 'local' parallel profile and the number of processors (called labs by Matlab), e.g. poolobj=parpool('local',8). Using the 'local' profile will ensure that the parallel pool will run on the local machine. When you are done, please, exit the parallel pool with delete(poolobj) command, this frees the PCT license for other users. We recommend to have these two commands embedded in your Matlab code. Just open the parallel pool at the start of your program and close it at the end.

For details on Parallel Computing Toolbox see the Mathworks PCT page.

Please, note that if you are running more than one parallel Matlab session on a shared file system (e.g. running multiple jobs on our clusters), there is a chance for a race condition on the file system I/O that results in errors when starting the parallel pool. To work around this, define unique Job Storage Location, as described on this Harvard FASRC help page.

As of Matlab R2014a, the Parallel Computing Toolbox maximum worker limit has been removed, so, we recommend using as many workers as there are physical CPU cores on the system.

Matlab Distributed Computing Server (MDCS)

Matlab's DCS allows to run parallel Matlab workers on more than one node. The job launch requires Matlab running on the interactive node, and launching the parallel job from within the Matlab. Matlab then submits a job to the cluster scheduler and keeps track of the progress of the job.

Configuring MDCS and jobs

First time users of the MDCS, or when setting up a new Matlab version on each new CHPC cluster, one has to configure Matlab to run parallel jobs on that cluster with the configCluster command. Note that the configCluster command needs to be run only once per cluster, not before every job.

Then prior to submitting a job, other specific parameters need to be defined, some of which may be unique for the job (such as walltime), and some of which stay the same so they need to be defined only once (such as user's e-mail that the SLURM scheduler uses to send e-mails about the job status). All this information is done with the ClusterInfo class and is persistent between Matlab sessions.   Some basic understanding of SLURM scheduling is needed to enter the job parameteres. Please, see our SLURM help page for more information. Below are several important ClusterInfo commands, which also support tab completion :

ClusterInfo.state display current configuration
ClusterInfo.setEmailAddress('test@foo.com') specify e-mail address for job notifications
ClusterInfo.setQueueName(‘partition’) set partition used for the jobs

ClusterInfo.setAccount(‘account_name’)

set account used for the job

ClusterInfo.setWallTime(’00:20:00’)

set job walltime
ClusterInfo.setUseGpu(true) request use of GPUs
ClusterInfo.setGpusPerNode(2) specify how many GPUs per node to use
ClusterInfo.setGpuType(‘k80’) request particular GPU
ClusterInfo.clear clear all configurations
ClusterInfo.setEmailAddress('') clear configuration item that takes string as an input
ClusterInfo.setUserDefinedOptions('-C c20') set additional sbatch options, in this case constraint to use only 20 core nodes ('-C c20')

 
Certain cluster configuration parameters can also be modified through the Matlab GUI parallel options menu. The ClusterInfoobject is unique for each cluster. In order to zero out a numerical value, feed in empty quotes (''), e.g. ClusterInfo.setGpusPerNode('') .

Running independent jobs

 Independent serial Matlab jobs can be submitted throught the MDCS interface. However, please keep in mind that if node-sharing is not enabled (currently it is not, but plans are to do so in the future), only one SLURM task, and thus one Matlab instance will run on each node, likely not utilizing efficiently all CPU cores on that node. Still, running independent Matlab jobs is a good way to test the functionality of MDCS. Additionally, since MDCS license comes with all Matlab toolboxes, functions from toolboxes that we don't license can be accessed this way.

 To submit an independent job to the cluster, use the batch command. This command returns a handle to the job which can be then used to query the job and fetch the results.

c = parcluster; % get a handle to the cluster
j = c.batch(@pwd, 1, {}); % submit a job, pwd queries where Matlab is running on a cluster, j is a handle to the job
j.State % query the state of the job (e.g. idle, running, finished)
j.fetchOutputs{:} % will display the results if the job is finished.
j.delete % deletes the job
jobs = c.Jobs % displays all the jobs that have been finished and not deleted (queued, running or finished)
c.getDebugLog(j.Tasks(1)) % if the job gives an error, view the error log file

Note that fetchOutputs is used to retrieve function output arguments. If using batch within a script, use loadinstead. Data that has been written to files need to be retrieved directly from the file system.

Running parallel jobs

Parallel Matlab jobs use the Parallel Computing Toolbox to provide concurrent execution. The most common way to achieve this is through the parfor loop statement.

For example, if we have a program parallel_example.m, as:

function t = parallel_example
t0 = tic;
parfor idx = 1:16
A(idx) = idx;
pause(2);
end
t = toc;

We can submit a parallel job on the cluster as:

c = parcluster;  % Get a handle to a cluster
j = c.batch(@parallel_example, 1, {}, 'Pool', 4); % Submit a batch pool job using 4 workers
j.State % View the job status
j.fetchOutputs{:} % Fetch the job results, after finished state is retrieved
id = j.ID % retrieve the Matlab job ID (MDCS has its own job tracking)
clear j; % clear the handle to the job (handle also gets cleared when quitting Matlab)

Notice that MDCS requests # of workers+1 number of SLURM tasks. This is because one worker is required to manage the batch job and the pool of workers. Note also that the communication overhead may slow down the parallel program runtime if too many communicating workers are used. We therefore recommend to time the runtime of your application with varying worker count to find their optimal number.

As Matlab logs information on the jobs ran through MDCS, past job information can be retrieved:

c = parcluster; % Get a handle to a cluster
j = c.findJob('ID',4); % Find old job #4
j.State % Retrieve the state of this job
c.getDebugLog(j) % Retrieve output/error log file

A general approach of developing and running Matlab parallel program would be to develop the parallel program in the Matlab GUI with the Parallel Computing Toolbox, and then scale it up to multiple cluster nodes using the MCDS by calling the batch command with the parallel program as the function that the batch command calls.

Note that if you run a program with parfor or other parallelization command without explicit submission with the batch command, Matlab will create a cluster job automatically with the default job parameters (1 worker and 3 days wall time). This cluster job will continue running when the program finishes until the 30 minutes Matlab idle timeout is reached. To get a handle to the parallel pool created by this program and to delete the pool, which deletes the cluster job, do:

poolobj = gcp('nocreate');
delete(poolobj)

Difference between parpool() and batch()

Parallel worker pool can be initiated either with the parpool() or the batch() command.
In a program with parpool(), serial sections of the code are executed in the Matlab instance that runs the code (e.g. if Matlab runs on the interactive node, the serial sections of the code will be run there). Parallel sections are offloaded to the cluster batch job (if the parallel profile defaults to the cluster profile, or is specified explicitly).
The batch() command starts a cluster batch job from the start of the function that is specified in the batch command, thus executing both the serial and parallel sections of the code inside of the cluster batch job, i.e. on the cluster interactive nodes.

Therefore, in order to minimize performance impact on the interactive nodes, users need to submit their parallel Matlab jobs using the batch() command.

The only exception to this rule is if one would run a Matlab job inside of a single compute node as described in section "Local parallel Matlab on a desktop or a compute node".

CHPC MDCS installation notes

MDCS uses MPI for worker communication. Our setup uses Intel MPI in order to use the InfiniBand network on the clusters that have it, as compared to stock supplied MPICH. Intel MPI is picked up automatically.

The MDCS integration scripts provided by Mathworks are located in /uufs/chpc.utah.edu/sys/installdir/matlab/VERSION/toolbox/local/mdcs_slurmand added to user path by default.

MDCS licensing

CHPC has a 160 worker license of the MDCS, which means that up 160 workers can run concurrently. However, keep in mind that this license is shared among all the users and clusters. SLURM scheduler can keep track of license usage per cluster, but, not across the clusters. We are running MDCS with SLURM license support, so, SLURM should manage the jobs in such a way that the maximum license count of the running jobs does not exceed 160, but, this is the case only for a single cluster. If some MDCS jobs run on one cluster and other on another, there is a chance that the MDCS license count will get exceeded resulting in an out of licenses message. Therefore, we recommend to check current MDCS license usage on other cluster to get an idea of current license usage.

The slurm command to check the license usage is scontrol show lic, e.g.

[user@ember2 ~/]$ scontrol show lic
LicenseName=matlab_distrib_comp_engine@slurmdb
    Total=160 Used=5 Free=155 Remote=yes

One can also query the license server for the current license use, which will list the total license usage on all CHPC clusters.

[user@ember2 ~/]$ $MATLAB_ROOT/etc/glnxa64/lmutil lmstat -S MLM -a |grep MATLAB_Distrib_Comp_Engine
Users of MATLAB_Distrib_Comp_Engine: (Total of 160 licenses issued; Total of 5 licenses in use)

 

For more information on MDCS, see the Mathworks DCS page.

Last Updated: 5/18/17