#

MATLAB Parallel – PCT and DCS

NOTE: matlab-default is no longer needed to run parallel MATLAB applications. This has been restored to matlab only. Please update your workflows accordingly to reflect this change.

Introduction

This page is intended to help you with running parallel MATLAB codes on the Odyssey cluster. The latest software modules supporting parallel computing with MATLAB available on the cluster are:

matlab/R2018b-fasrc01
matlab/R2018a-fasrc01
matlab/R2017b-fasrc01
matlab/R2017a-fasrc02
matlab/R2016b-fasrc02
matlab/R2016a-fasrc02

Parallel processing with MATLAB is performed with the help of two products, Parallel Computing Toolbox (PCT) and Distributed Computing Server (DCS).

Parallel Computing Toolbox

Currently, PCT provides up to 32 workers (MATLAB computational engines) to execute applications locally on a multicore machine. This means that with the toolbox one could run parallel MATLAB codes locally on the compute nodes and use up to 32 cores.

Parallel FOR loops (parfor)

Below is a simple code illustrating the use of PCT to calculate PI via a parallel Monte-Carlo method. This example also illustrates the use of parfor (parallel FOR) loops. In this scheme, suitable FOR loops could be simply replaced by parallel FOR loops without other changes to the code:

%============================================================================
% Parallel Monte Carlo calculation of PI
%============================================================================
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
R = 1;
darts = 1e7;
count = 0;
tic
parfor i = 1:darts
   % Compute the X and Y coordinates of where the dart hit the...............
   % square using Uniform distribution.......................................
   x = R*rand(1);
   y = R*rand(1);
   if x^2 + y^2 <= R^2
      % Increment the count of darts that fell inside of the.................
      % circle...............................................................
     count = count + 1; % Count is a reduction variable.
   end
end
% Compute pi.................................................................
myPI = 4*count/darts;
T = toc;
fprintf('The computed value of pi is %8.7f.n',myPI);
fprintf('The parallel Monte-Carlo method is executed in %8.2f seconds.n', T);
delete(gcp);
exit;

Important: When using parpool in MATLAB, you need include the statement parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK'))) in your code. This statement tells MATLAB to start SLURM_CPUS_PER_TASK workers on the local machine (the compute node where your job lands). When the parallel computation is done, the MATLAB workers are released with the statement delete(gcp). If the above code is named, e.g., pfor.m, it can be sent to the queue with the below batch-job submission script. It starts a MATLAB parallel job with 8 workers:

#!/bin/bash
#SBATCH -J pfor
#SBATCH -o pfor.out
#SBATCH -e pfor.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p shared
#SBATCH --mem=32G
 
module load matlab/R2018b-fasrc01
srun -c $SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "pfor"

The highlighted (in red) SBATCH directives reassure that there are 8 processing cores for the calculation, and they all reside on the same compute node.

If the submission script is named pfor.run, it is submitted to the queue by typing in:

$ sbatch pfor.run
Submitted batch job 1885302

When the job has completed the pfor.out output file is generated.

                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018b (9.5.0.944444) 64-bit (glnxa64)
                              August 28, 2018


To get started, type doc.
For product information, visit www.mathworks.com.

Starting parallel pool (parpool) using the 'local' profile ...
connected to 8 workers.

ans =

 Pool with properties:

            Connected: true
           NumWorkers: 8
              Cluster: local
        AttachedFiles: {}
    AutoAddClientPath: true
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

The computed value of pi is 3.1410644.
The parallel Monte-Carlo method is executed in     2.14 seconds.

Any runtime errors would go to the file pfor.err.

Single Program Multiple Data (SPMD)

In addition, MATLAB also provides a single program multiple data (SPMD) parallel programming model, which allows for a greater control over the parallelization -- tasks could be distributed and assigned to parallel processes ( labs or workers in MATLAB's terminology ) depending on their ranks. The below code provides a simple illustration -- it prints out the worker rank from each MATLAB lab:

%====================================================================
% Illustration of SPMD Parallel Programming model with MATLAB
%====================================================================
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
% Start of parallel region...........................................
spmd
  nproc = numlabs;  % get total number of workers
  iproc = labindex; % get lab ID
  if ( iproc == 1 )
     fprintf ( 1, ' Running with  %d labs.n', nproc );
  end
  for i = 1: nproc
     if iproc == i
        fprintf ( 1, ' Rank %d out of  %d.n', iproc, nproc );
     end
  end
% End of parallel region.............................................
end
delete(gcp);
exit;

If the code is named spmd_test.m, it could be sent to the queue with this script

#!/bin/bash
#SBATCH -J spmd_test
#SBATCH -o spmd_test.out
#SBATCH -e spmd_test.err
#SBATCH -N 1
#SBATCH -c 8
#SBATCH -t 0-00:30
#SBATCH -p shared
#SBATCH --mem=4000
 
module load math/matlab-R2018b-farc01
srun -c $SLURM_CPUS_PER_TASK matlab -nosplash -nodesktop -r "spmd_test"

If the batch-job submission script is named spmd_test.run, then it is sent to the queue with

$ sbatch spmd_test.run
Submitted batch job 1896986

The output is printed out to the file spmd_test.out:

                            < M A T L A B (R) >
                  Copyright 1984-2018 The MathWorks, Inc.
                   R2018b (9.5.0.944444) 64-bit (glnxa64)
                              August 28, 2018


To get started, type doc.
For product information, visit www.mathworks.com.

Starting parallel pool (parpool) using the 'local' profile ...
connected to 8 workers.

ans =

 Pool with properties:

            Connected: true
           NumWorkers: 8
              Cluster: local
        AttachedFiles: {}
    AutoAddClientPath: true
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

Lab 1:
   Running with  8 labs.
   Rank 1 out of  8.
Lab 2:
   Rank 2 out of  8.
Lab 3:
   Rank 3 out of  8.
Lab 4:
   Rank 4 out of  8.
Lab 5:
   Rank 5 out of  8.
Lab 6:
   Rank 6 out of  8.
Lab 7:
   Rank 7 out of  8.
Lab 8:
   Rank 8 out of  8.
Parallel pool using the 'local' profile is shutting down.

Distributed Computing Server

The DCS allows for a larger number of MATLAB workers to be used on a single node and/or across several compute nodes. The current DCS license we have on the cluster allows for using up to 256 MATLAB workers. DCS is integrated with SLURM and works with MATLAB versions R2017a, R2017b, R2018a and R2018b, available with modules matlab/R2017a-fasrc02, matlab/R2017b-fasrc01, matlab/R2018a-fasrc01 and matlab/R2018b-fasrc01. The below example steps describe how to set up and use DCS on the Research Computing cluster:

(1) Log on to the cluster and start an interactive / test bash shell.

$ srun -p test -N 1 -c 4 -t 0-06:00 --pty --mem=16G bash

(2) Start MATLAB on the command line and configure DCS to run parallel jobs on Odyssey by calling configCluster. This command needs to be run only once for each MATLAB version.

  • Start an interactive bash-shell:
# Load a MATLAB software module, e.g.,
$ module load matlab/R2018b-fasrc01
# Start MATLAB interactively without a GUI
$ matlab -nosplash -nodesktop -nodisplay
  • Run configCluster in the MATLAB shell:
>> configCluster

    Must set WallTime and QueueName before submitting jobs to ODYSSEY.  E.g.

    >> c = parcluster('odyssey');
    >> % 5 hour walltime
    >> c.AdditionalProperties.WallTime = '05:00:00';
    >> c.AdditionalProperties.QueueName = 'test-queue';
    >> c.saveProfile

(3) Setup job parameters, e.g., Wall Time, queue / partition, Memory-Per-CPU, etc. The below example illustrates how this can be done interactively. Once these parameters are set up, their values become default unless changed.

>> c = parcluster('odyssey');                    % Define a cluster object
>> c.AdditionalProperties.WallTime = '05:00:00'; % Time limit
>> c.AdditionalProperties.QueueName = 'shared';  % Partition
>> c.AdditionalProperties.MemUsage = '4000';     % Memory per CPU in MB
>> c.saveProfile                                 % Save cluster profile. This becomes default until changed

(4) Display parallel cluster configuration with c.AdditionalProperties.

NOTE: This lists the available cluster options and their current values. These options could be set up as desired.

>> c.AdditionalProperties

ans = 

  AdditionalProperties with properties:

              AccountName: ''
     AdditionalSubmitArgs: ''
               Constraint: ''
    DebugMessagesTurnedOn: 0
              GpusPerNode: 0
                 MemUsage: '4000'
             ProcsPerNode: 0
                QueueName: 'shared'
                 WallTime: '05:00:00'

(5) Submit parallel DCS jobs. There are two ways to submit parallel DCS jobs - from within MATLAB, and directly through SLURM.

Submitting DCS jobs from within MATLAB

We will illustrate submitting DCS jobs from within MATLAB with a specific example. Below is a simple function evaluating the integer sum from 1 through N in parallel:

%==========================================================
% Function: parallel_sum( N )
%           Calculates integer sum from 1 to N in parallel
%==========================================================
function s = parallel_sum(N)
  s = 0;
  parfor i = 1:N
    s = s + i;
  end
  fprintf('Sum of numbers from 1 to %d is %d.n', N, s);
end

Use the batch command to submit parallel jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted jobs. See the example below and refer to the official MATLAB documentation for more help on batch. This assumes that the MATLAB function is named parallel_sum.m. Note that these jobs will always request n+1 CPU cores, since one worker is required to manage the batch job and pool of workers. For example, a job that needs 8 workers will consume 9 CPU cores.

% Define a cluster object
>> c = parcluster('odyssey');
% Define a job object using batch
>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);

Notice, that this will start a job with one more MATLAB worker (9 instead of 8). This is because one parallel instance is required to manage the pool of workers (see below).

>> j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);

additionalSubmitArgs =

    '--ntasks=9 -c 1 --ntasks-per-core=1 -p shared -t 05:00:00 --mem-per-cpu=4000 --licenses=MATLAB_Distrib_Comp_Engine:9'

You can quire the job status with j.Status

>> j.State

ans =

    'finished'

Once the job completes, we can retrieve the job results. This is done by calling the function fetchOutputs.

>> j.fetchOutputs{:} 

ans =

        5050

NOTE: fetchOutputs is used to retrieve function output arguments. Data that has been written to files on the cluster needs to be retrieved directly from the filesystem.

If needed, one may also access job log files. This is particularly useful for debugging. This is done with the c.getDebugLog(j) command, e.g.,

>> c.getDebugLog(j)
LOG FILE OUTPUT:
Node list: holy7c[03205-03206]
mpiexec.hydra -l -n 9 /n/sw/helmod/apps/centos7/Core/matlab/R2018b-fasrc01/bin/worker -parallel
[3] 
[3]                             < M A T L A B (R) >
[3]                   Copyright 1984-2018 The MathWorks, Inc.
[3]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[3]                               August 28, 2018
[3] 
[4] 
[4]                             < M A T L A B (R) >
[4]                   Copyright 1984-2018 The MathWorks, Inc.
[4]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[4]                               August 28, 2018
[4] 
[5] 
[5]                             < M A T L A B (R) >
[5]                   Copyright 1984-2018 The MathWorks, Inc.
[5]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[5]                               August 28, 2018
[5] 
[6] 
[6]                             < M A T L A B (R) >
[6]                   Copyright 1984-2018 The MathWorks, Inc.
[6]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[6]                               August 28, 2018
[6] 
[7] 
[7]                             < M A T L A B (R) >
[7]                   Copyright 1984-2018 The MathWorks, Inc.
[7]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[7]                               August 28, 2018
[7] 
[8] 
[8]                             < M A T L A B (R) >
[8]                   Copyright 1984-2018 The MathWorks, Inc.
[8]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[8]                               August 28, 2018
[8] 
[3]  
[4]  
[6]  
[7]  
[5]  
[8]  
[3] To get started, type doc.
[4] To get started, type doc.
[5] To get started, type doc.
[6] To get started, type doc.
[8] To get started, type doc.
[3] For product information, visit www.mathworks.com.
[5] For product information, visit www.mathworks.com.
[5]  
[8] For product information, visit www.mathworks.com.
[8]  
[4] For product information, visit www.mathworks.com.
[4]  
[6] For product information, visit www.mathworks.com.
[6]  
[3]  
[7] To get started, type doc.
[7] For product information, visit www.mathworks.com.
[7]  
[0] 
[0]                             < M A T L A B (R) >
[0]                   Copyright 1984-2018 The MathWorks, Inc.
[0]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[0]                               August 28, 2018
[0] 
[1] 
[1]                             < M A T L A B (R) >
[1]                   Copyright 1984-2018 The MathWorks, Inc.
[1]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[1]                               August 28, 2018
[1] 
[2] 
[2]                             < M A T L A B (R) >
[2]                   Copyright 1984-2018 The MathWorks, Inc.
[2]                    R2018b (9.5.0.944444) 64-bit (glnxa64)
[2]                               August 28, 2018
[2] 
[0]  
[1]  
[2]  
[0] To get started, type doc.
[1] To get started, type doc.
[2] To get started, type doc.
[1] For product information, visit www.mathworks.com.
[2] For product information, visit www.mathworks.com.
[0] For product information, visit www.mathworks.com.
[1]  
[0]  
[2]  
[0] Sending a stop signal to all the labs...
[0] 2019-02-26 15:30:18 | About to exit MATLAB normally
[0] 2019-02-26 15:30:19 | About to exit with code: 0

Exiting with code: 0

When the results are no longer needed the job could be deleted.

% Delete the job after the results are no longer needed
j.delete

Submitting DCS jobs directly through SLURM

Parallel DCS jobs could be submitted directly from the Unix command line through SLURM. For this, in addition to the MATLAB source, one needs to prepare a MATLAB submission script with the job specifications. An example is shown below:

%==========================================================
% MATLAB job submission script: parallel_batch.m
%==========================================================
c = parcluster('odyssey'); 
c.AdditionalProperties.QueueName = 'shared';
c.AdditionalProperties.WallTime = '05:00:00';
c.AdditionalProperties.MemUsage = '4000';
j = c.batch(@parallel_sum, 1, {100}, 'pool', 8);
exit;

If this is script is named, for instance, parallel_batch.m, it is submitted to the queue with the help of the following SLURM batch-job submission script:

#!/bin/bash
#SBATCH -J parallel_sum_DCS
#SBATCH -o parallel_sum_DCS.out
#SBATCH -e parallel_sum_DCS.err
#SBATCH -p shared
#SBATCH -c 1
#SBATCH -t 0-00:20
#SBATCH --mem=4000
 
srun -c 1 matlab -nosplash -nodesktop -r "parallel_batch"

Assuming the above script is named parallel_sum_DCS.run, for instance, the job is submitted as usual with

sbatch parallel_sum_DCS.run

NOTE: This scheme dispatches 2 jobs - one serial that spawns the actual DCS parallel jobs, and another, the actual parallel job.

Once submitted, the DCS parallel job can be monitored and managed directly through SLURM.

$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- --------
1916487      parallel_+     shared   rc_admin          1  COMPLETED      0:0 
1916487.bat+      batch              rc_admin          1  COMPLETED      0:0 
1916487.ext+     extern              rc_admin          1  COMPLETED      0:0 
1916487.0        matlab              rc_admin          1  COMPLETED      0:0 
1916831            Job3     shared   rc_admin          9  COMPLETED      0:0 
1916831.bat+      batch              rc_admin          8  COMPLETED      0:0 
1916831.ext+     extern              rc_admin          9  COMPLETED      0:0 
1916831.0     pmi_proxy              rc_admin          2  COMPLETED      0:0 

After the job completes, one can fetch results and delete job object from within MATLAB. If program writes directly to disk fetching is not necessary.

>> j.fetchOutputs{:};
>> j.delete;

References

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.