Table of Contents

**NOTE:** `matlab-default`

is no longer needed to run parallel MATLAB applications. This has been restored to `matlab`

only. Please update your workflows accordingly to reflect this change.

## Introduction

This page is intended to help you with running parallel MATLAB codes on the Odyssey cluster. The latest software modules supporting parallel computing with MATLAB available on the cluster are:

matlab/R2018b-fasrc01 matlab/R2018a-fasrc01 matlab/R2017b-fasrc01 matlab/R2017a-fasrc02 matlab/R2016b-fasrc02 matlab/R2016a-fasrc02

Parallel processing with MATLAB is performed with the help of two products, **Parallel Computing Toolbox** (PCT) and **Distributed Computing Server** (DCS).

## Parallel Computing Toolbox

Currently, PCT provides up to 32 workers (MATLAB computational engines) to execute applications locally on a multicore machine. This means that with the toolbox one could run parallel MATLAB codes locally on the compute nodes and use up to 32 cores.

### Parallel FOR loops (parfor)

Below is a simple code illustrating the use of PCT to calculate PI via a parallel Monte-Carlo method. This example also illustrates the use of **parfor** (parallel FOR) loops. In this scheme, suitable FOR loops could be simply replaced by parallel FOR loops without other changes to the code:

%============================================================================ % Parallel Monte Carlo calculation of PI %============================================================================ parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK'))) R = 1; darts = 1e7; count = 0; tic parfor i = 1:darts % Compute the X and Y coordinates of where the dart hit the............... % square using Uniform distribution....................................... x = R*rand(1); y = R*rand(1); if x^2 + y^2 <= R^2 % Increment the count of darts that fell inside of the................. % circle............................................................... count = count + 1; % Count is a reduction variable. end end % Compute pi................................................................. myPI = 4*count/darts; T = toc; fprintf('The computed value of pi is %8.7f.n',myPI); fprintf('The parallel Monte-Carlo method is executed in %8.2f seconds.n', T); delete(gcp); exit;

**Important: When using parpool in MATLAB, you need include the statement parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK'))) in your code**. This statement tells MATLAB to start

`SLURM_CPUS_PER_TASK`

workers on the local machine (the compute node where your job lands). When the parallel computation is done, the MATLAB workers are released with the statement `delete(gcp)`

. If the above code is named, e.g., `pfor.m`

, it can be sent to the queue with the below batch-job submission script. It starts a MATLAB parallel job with 8 workers:#!/bin/bash #SBATCH -J pfor #SBATCH -o pfor.out #SBATCH -e pfor.err #SBATCH -N 1 #SBATCH -c 8 #SBATCH -t 0-00:30 #SBATCH -p shared #SBATCH --mem=32G module load matlab/R2018b-fasrc01 srun -c $SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "pfor"

The highlighted (in red) SBATCH directives reassure that there are 8 processing cores for the calculation, and they all reside on the same compute node.

If the submission script is named `pfor.run`

, it is submitted to the queue by typing in:

$ sbatch pfor.run Submitted batch job 1885302

When the job has completed the `pfor.out`

output file is generated.

< M A T L A B (R) > Copyright 1984-2018 The MathWorks, Inc. R2018b (9.5.0.944444) 64-bit (glnxa64) August 28, 2018 To get started, type doc. For product information, visit www.mathworks.com. Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers. ans = Pool with properties: Connected: true NumWorkers: 8 Cluster: local AttachedFiles: {} AutoAddClientPath: true IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true The computed value of pi is 3.1410644. The parallel Monte-Carlo method is executed in 2.14 seconds.

Any runtime errors would go to the file `pfor.err`

.

### Single Program Multiple Data (SPMD)

In addition, MATLAB also provides a **single program multiple data** (SPMD) parallel programming model, which allows for a greater control over the parallelization -- tasks could be distributed and assigned to parallel processes ( labs or workers in MATLAB's terminology ) depending on their ranks. The below code provides a simple illustration -- it prints out the worker rank from each MATLAB lab:

%==================================================================== % Illustration of SPMD Parallel Programming model with MATLAB %==================================================================== parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK'))) % Start of parallel region........................................... spmd nproc = numlabs; % get total number of workers iproc = labindex; % get lab ID if ( iproc == 1 ) fprintf ( 1, ' Running with %d labs.n', nproc ); end for i = 1: nproc if iproc == i fprintf ( 1, ' Rank %d out of %d.n', iproc, nproc ); end end % End of parallel region............................................. end delete(gcp); exit;

If the code is named `spmd_test.m`

, it could be sent to the queue with this script

#!/bin/bash #SBATCH -J spmd_test #SBATCH -o spmd_test.out #SBATCH -e spmd_test.err #SBATCH -N 1 #SBATCH -c 8 #SBATCH -t 0-00:30 #SBATCH -p shared #SBATCH --mem=4000 module load math/matlab-R2018b-farc01 srun -c $SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "spmd_test"

If the batch-job submission script is named `spmd_test.run`

, then it is sent to the queue with

$ sbatch spmd_test.run Submitted batch job 1896986

The output is printed out to the file `spmd_test.out`

:

< M A T L A B (R) > Copyright 1984-2018 The MathWorks, Inc. R2018b (9.5.0.944444) 64-bit (glnxa64) August 28, 2018 To get started, type doc. For product information, visit www.mathworks.com. Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers. ans = Pool with properties: Connected: true NumWorkers: 8 Cluster: local AttachedFiles: {} AutoAddClientPath: true IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true Lab 1: Running with 8 labs. Rank 1 out of 8. Lab 2: Rank 2 out of 8. Lab 3: Rank 3 out of 8. Lab 4: Rank 4 out of 8. Lab 5: Rank 5 out of 8. Lab 6: Rank 6 out of 8. Lab 7: Rank 7 out of 8. Lab 8: Rank 8 out of 8. Parallel pool using the 'local' profile is shutting down.

## Distributed Computing Server

The DCS allows for a larger number of MATLAB workers to be used on a single node and/or across several compute nodes. The current DCS license we have on the cluster allows for using up to 256 MATLAB workers. DCS is integrated with SLURM and works with MATLAB versions **R2017a**, **R2017b**, **R2018a** and **R2018b**, available with modules **matlab/R2017a-fasrc02**, **matlab/R2017b-fasrc01**, **matlab/R2018a-fasrc01** and **matlab/R2018b-fasrc01**. The below example steps describe how to set up and use DCS on the Research Computing cluster:

(1) Log on to the cluster and start an interactive / test bash shell.

$ srun -p test -N 1 -c 4 -t 0-06:00 --pty --mem=16G bash

(2) Start MATLAB on the command line and configure DCS to run parallel jobs on Odyssey by calling `configCluster`

. This command needs to be run only once for each MATLAB version.

- Start an interactive bash-shell:

# Load a MATLAB software module, e.g., $ module load matlab/R2018b-fasrc01 # Start MATLAB interactively without a GUI $ matlab -nosplash -nodesktop -nodisplay

- Run
`configCluster`

in the MATLAB shell:

>> configCluster Must set WallTime and QueueName before submitting jobs to ODYSSEY. E.g. >> c = parcluster('odyssey'); >> % 5 hour walltime >> c.AdditionalProperties.WallTime = '05:00:00'; >> c.AdditionalProperties.QueueName = 'test-queue'; >> c.saveProfile

(3) Setup job parameters, e.g., Wall Time, queue / partition, Memory-Per-CPU, etc. The below example illustrates how this can be done interactively. Once these parameters are set up, their values become default unless changed.

>> c = parcluster('odyssey'); % Define a cluster object >> c.AdditionalProperties.WallTime = '05:00:00'; % Time limit >> c.AdditionalProperties.QueueName = 'shared'; % Partition >> c.AdditionalProperties.MemUsage = '4000'; % Memory per CPU in MB >> c.saveProfile % Save cluster profile. This becomes default until changed

(4) Display parallel cluster configuration with `c.AdditionalProperties`

.

NOTE: This lists the available cluster options and their current values. These options could be set up as desired.

>> c.AdditionalProperties ans = AdditionalProperties with properties: AccountName: '' AdditionalSubmitArgs: '' Constraint: '' DebugMessagesTurnedOn: 0 GpusPerNode: 0 MemUsage: '4000' ProcsPerNode: 0 QueueName: 'shared' WallTime: '05:00:00'

(5) Submit parallel DCS jobs. There are two ways to submit parallel DCS jobs - from within MATLAB, and directly through SLURM.

### Submitting DCS jobs from within MATLAB

We will illustrate submitting DCS jobs from within MATLAB with a specific example. Below is a simple function evaluating the integer sum from 1 through N in parallel:

%========================================================== % Function: parallel_sum( N ) % Calculates integer sum from 1 to N in parallel %========================================================== function s = parallel_sum(N) s = 0; parfor i = 1:N s = s + i; end fprintf('Sum of numbers from 1 to %d is %d.n', N, s); end

Use the `batch`

command to submit parallel jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted jobs. See the example below and refer to the official MATLAB documentation for more help on **batch**. This assumes that the MATLAB function is named `parallel_sum.m`

. Note that these jobs will always request n+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs 8 workers will consume 9 CPU cores.

% Define a cluster object >> c = parcluster('odyssey'); % Define a job object using batch >> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);

Notice, that this will start a job with one more MATLAB worker (9 instead of 8). This is because one parallel instance is required to manage the pool of workers (see below).

>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8); additionalSubmitArgs = '--ntasks=9 -c 1 --ntasks-per-core=1 -p shared -t 05:00:00 --mem-per-cpu=4000 --licenses=MATLAB_Distrib_Comp_Engine:9'

You can quire the job status with `j.Status`

>> j.State ans = 'finished'

Once the job completes, we can retrieve the job results. This is done by calling the function `fetchOutputs`

.

>> j.fetchOutputs{:} ans = 5050

NOTE: `fetchOutputs`

is used to retrieve function output arguments. Data that has been written to files on the cluster needs to be retrieved directly from the filesystem.

If needed, one may also access job log files. This is particularly useful for debugging. This is done with the `c.getDebugLog(j)`

command, e.g.,

>> c.getDebugLog(j) LOG FILE OUTPUT: Node list: holy7c[03205-03206] mpiexec.hydra -l -n 9 /n/sw/helmod/apps/centos7/Core/matlab/R2018b-fasrc01/bin/worker -parallel [3] [3] < M A T L A B (R) > [3] Copyright 1984-2018 The MathWorks, Inc. [3] R2018b (9.5.0.944444) 64-bit (glnxa64) [3] August 28, 2018 [3] [4] [4] < M A T L A B (R) > [4] Copyright 1984-2018 The MathWorks, Inc. [4] R2018b (9.5.0.944444) 64-bit (glnxa64) [4] August 28, 2018 [4] [5] [5] < M A T L A B (R) > [5] Copyright 1984-2018 The MathWorks, Inc. [5] R2018b (9.5.0.944444) 64-bit (glnxa64) [5] August 28, 2018 [5] [6] [6] < M A T L A B (R) > [6] Copyright 1984-2018 The MathWorks, Inc. [6] R2018b (9.5.0.944444) 64-bit (glnxa64) [6] August 28, 2018 [6] [7] [7] < M A T L A B (R) > [7] Copyright 1984-2018 The MathWorks, Inc. [7] R2018b (9.5.0.944444) 64-bit (glnxa64) [7] August 28, 2018 [7] [8] [8] < M A T L A B (R) > [8] Copyright 1984-2018 The MathWorks, Inc. [8] R2018b (9.5.0.944444) 64-bit (glnxa64) [8] August 28, 2018 [8] [3] [4] [6] [7] [5] [8] [3] To get started, type doc. [4] To get started, type doc. [5] To get started, type doc. [6] To get started, type doc. [8] To get started, type doc. [3] For product information, visit www.mathworks.com. [5] For product information, visit www.mathworks.com. [5] [8] For product information, visit www.mathworks.com. [8] [4] For product information, visit www.mathworks.com. [4] [6] For product information, visit www.mathworks.com. [6] [3] [7] To get started, type doc. [7] For product information, visit www.mathworks.com. [7] [0] [0] < M A T L A B (R) > [0] Copyright 1984-2018 The MathWorks, Inc. [0] R2018b (9.5.0.944444) 64-bit (glnxa64) [0] August 28, 2018 [0] [1] [1] < M A T L A B (R) > [1] Copyright 1984-2018 The MathWorks, Inc. [1] R2018b (9.5.0.944444) 64-bit (glnxa64) [1] August 28, 2018 [1] [2] [2] < M A T L A B (R) > [2] Copyright 1984-2018 The MathWorks, Inc. [2] R2018b (9.5.0.944444) 64-bit (glnxa64) [2] August 28, 2018 [2] [0] [1] [2] [0] To get started, type doc. [1] To get started, type doc. [2] To get started, type doc. [1] For product information, visit www.mathworks.com. [2] For product information, visit www.mathworks.com. [0] For product information, visit www.mathworks.com. [1] [0] [2] [0] Sending a stop signal to all the labs... [0] 2019-02-26 15:30:18 | About to exit MATLAB normally [0] 2019-02-26 15:30:19 | About to exit with code: 0 Exiting with code: 0

When the results are no longer needed the job could be deleted.

% Delete the job after the results are no longer needed j.delete

### Submitting DCS jobs directly through SLURM

Parallel DCS jobs could be submitted directly from the Unix command line through SLURM. For this, in addition to the MATLAB source, one needs to prepare a MATLAB submission script with the job specifications. An example is shown below:

%========================================================== % MATLAB job submission script:parallel_batch.m%========================================================== c = parcluster('odyssey'); c.AdditionalProperties.QueueName = 'shared'; c.AdditionalProperties.WallTime = '05:00:00'; c.AdditionalProperties.MemUsage = '4000'; j = c.batch(@parallel_sum, 1, {100}, 'pool', 8); exit;

If this is script is named, for instance, `parallel_batch.m`

, it is submitted to the queue with the help of the following SLURM batch-job submission script:

#!/bin/bash #SBATCH -J parallel_sum_DCS #SBATCH -o parallel_sum_DCS.out #SBATCH -e parallel_sum_DCS.err #SBATCH -p shared #SBATCH -c 1 #SBATCH -t 0-00:20 #SBATCH --mem=4000 srun -c 1 matlab -nosplash -nodesktop -r "parallel_batch"

Assuming the above script is named `parallel_sum_DCS.run`

, for instance, the job is submitted as usual with

sbatch parallel_sum_DCS.run

NOTE: This scheme dispatches 2 jobs - one serial that spawns the actual DCS parallel jobs, and another, the actual parallel job.

Once submitted, the DCS parallel job can be monitored and managed directly through SLURM.

$ sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 1916487 parallel_+ shared rc_admin 1 COMPLETED 0:0 1916487.bat+ batch rc_admin 1 COMPLETED 0:0 1916487.ext+ extern rc_admin 1 COMPLETED 0:0 1916487.0 matlab rc_admin 1 COMPLETED 0:0 1916831 Job3 shared rc_admin 9 COMPLETED 0:0 1916831.bat+ batch rc_admin 8 COMPLETED 0:0 1916831.ext+ extern rc_admin 9 COMPLETED 0:0 1916831.0 pmi_proxy rc_admin 2 COMPLETED 0:0

After the job completes, one can fetch results and delete job object from within MATLAB. If program writes directly to disk fetching is not necessary.

>> j.fetchOutputs{:}; >> j.delete;

## References

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.