Using the MATLAB Distributed/Parallel Computing Toolbox without modifying your cluster

This post shows how I set up MATLAB to perform parallel calculations on a multi-core server which is part of an existing cluster that is managed using Torque and Moab. I did this without making any changes to the cluster submission system. The installation documentation for the MATLAB Parallel Computing Toolbox and Distributed Computing Server is poor. The instructions make a lot of assumptions that don’t fit into an existing “production” cluster environment. For example, they assume that a GUI is available to configure the cluster (this step is not required for my method!) Further, the instructions give the impression that MATLAB components must be “installed” on both the head node and compute nodes by an administrator. THIS IS FALSE! Ordinary users can run concurrent (multicore) MATLAB jobs without administrative privileges!

The following method also allows you to run MATLAB with more than 8 or 12 cores-in fact, you can use 16 cores or more on a single machine! It also may be possible to distribute tasks across multiple nodes (hosts), but that’s for an upcoming post. See my repository of PBS submit script examples on GitHub.

Customize the script that launches the MATLAB Distributed Computing Server

Make a copy of the script

MATLAB/R2011a/toolbox/distcomp/bin/mdce_def.sh

and modify it to suite your cluster. In my case, I only had to set the variables PIDBASE, LOCKBASE, LOGBASE, and CHECKPOINTBASE to point to locations in the /tmp directory of each compute node so that ordinary users can write files to those locations. My full script will appear on GitHub.

Create a submit script that runs MATLAB

I created a Torque submit script that automatically starts a MATLAB DCE server, a job manager, and a set of worker processes on the node. The full script will be available on GitHub; I will only highlight a few snippets below. First, start the MATLAB distributed computing server (mdce) on the compute node using the launch script that I created:

export MATLABPARTEMP=/tmp/tmp$PBS_JOBID
mdce start -clean -mdcedef /apps/CompVis/MATLAB/R2011a/toolbox/distcomp/bin/mdce_def_STOKES.sh

Note that I have exported an environment variable that points to a directory within /tmp that will hold all our temporary files; we will clean up this directory when the job is done. My custom mdce launch script needs this environment variable.

Next, start the job manager on the compute node:

startjobmanager -clean -name MyJobManager -v

Start worker daemons on the compute node:

for (( i=1; i<=$NP; i++)); do
    echo "Starting worker process $i";
    startworker -jobmanagerhost localhost -jobmanager MyJobManager -name worker$i;
done

Now, start MATLAB and run your code.
When the job is done, we need to shut down all the daemons in reverse order:

## Cleanup
sleep 3
for (( i=1; i<=$NP; i++)); do
    echo "Stopping worker process $i";
    stopworker -clean -name worker$i;
done
stopjobmanager -clean -name MyJobManager
mdce stop -clean

Finally, let’s clean up the /tmp directory on the compute node:

rm -rfv $MATLABPARTEMP

Optional: Run MATLAB interactively

If you have used qsub -I … to obtain an interactive session on a compute node, you may manually run the commands above and run parallel commands in an interactive MATLAB session on the compute node. This approach is very helpful for debugging!

8 thoughts on “Using the MATLAB Distributed/Parallel Computing Toolbox without modifying your cluster

  1. Dario Paccagnan

    Hello Craig, your post has been really useful as I need to interface matlab and torque… Is it possible to have the source codes you used?

    Reply
          1. dinesh

            craig,I am doing project in switched reluctance motor.To reduce the torque ripple by using neural network,for that i need torque distributed function for dividing two input,so i need torque distribution function as toolbox in matlab.

        1. Craig Post author

          Dinesh, this post is not about mechanical torque. This post is about a software tool called Torque, which has nothing to do with the torque of a motor.

          Reply
  2. Jay

    Hi Craig,

    Thank you for this post, I found the information very useful.

    I am curious about how the cluster profile was configured in this setup?

    Best,
    Jay

    Reply

Leave a Reply