Category Archives: Scientific computing

Scientific computing, mathematical modeling, and computer simulation

Preventing “soft” failures due to memory fragmentation in Linux

A previous post documented that a Linux server running a pre-2.6.24 kernel can fail to allocate large chunks of memory after its memory has been fragmented by a “thrashing” incident. In this post, I will point out some ways to prevent this problem.

Use a Newer Kernel

We have some servers running RHEL 5.9 with the kernel updated to 2.6.34.14. After a thrashing incident, these servers do not experience the same problem with allocating large blocks of memory. I think the fix is documented in the release notes for kernel 2.6.24. Section 2.4 talks about “anti-fragmentation patches” and includes a link to this article about Linux memory management, which links to this thorough documentation of the anti-fragmentation patches.(BTW, here is the full list of 2.6 kernel changelogs) My plan is to deploy RHEL 5.9 with the updated kernel to all the compute nodes in our cluster. However, this still doesn’t solve the problem of a user who requests some portion of the RAM on a node and then proceeds to consume more memory than requested. This is unfair to another user whose job is running on the same node.

Limit RAM Used By a Process

There are ways to prevent servers from thrashing in the first place. My discussion will be specific to HPC compute nodes, not the more general case of web servers, mail servers, etc. I’ll start by saying that ulimit is not the solution because, in part, its limits don’t propagate to child processes. Read this thorough discussion of the limitations of ulimit, and check out this script to limit the time and memory used by a Linux program. I haven’t evaluated that script yet, and its approach (polling run time and memory usage of the process and all of its children, grandchildren, etc.) seems a bit brute-force. I hope that process groups and Control Groups (cgroups) can be used instead. Also check out this Red Hat documentation on the memory subsystem in Linux.

It is conceivable that a compute node could have processes owned by multiple users.

Memory fragmentation degrades performance in Linux kernels < 2.6.24 (RHEL 5.4)

I have data from the STOKES High Performance Compute Cluster which definitively shows that kernel versions prior to 2.6.24 can suffer significant performance degradation due to memory fragmentation. I noticed the problem on servers running Red Hat Enterprise Linux (RHEL) 5.4 with kernel version 2.6.18-164.el5. This post will document my findings. The graphs were taken from our Ganglia monitoring system.

This node has 24GB of RAM. As long as processes do not request more than 23GB of RAM, the node operates normally. Processes can use 23GB of RAM all day long:

ec133 working normally

ec133 working normally

Continue reading

Insert an EPS file into an Asymptote Vector Graphics document

It’s surprisingly difficult to find out whether it is possible to include an image from an Encapsulated PostScript (EPS) file into an Asymptote vector graphics document. It turns out that it is easy, but difficult to find in the Asymptote docs (I finally found the answer, via Google, in the FAQ). It turns out that you use the label function to insert an image into an Asymptote document. Here is a snippet of code that I used to assemble a multi-part figure from several EPS documents:

unitsize(1 inch);
size(3.33 inches);

pen label_pen = black + fontsize(16pt) + Helvetica();

label("(A)", (0.0, 0.3),p=label_pen, align=NE);
label(graphic("subfig1.eps"), (0.35, 0.0), align=NE);

label("(B)", (0.0, 1.1), align=NE, p=label_pen);
label(graphic("subfig2.eps"), (0.35, 0.75), align=NE);

label("+", (0.0, 0.0), align=NE);
label("+", (3.33, 2.0), align=SW);

The alignment is tricky! Note that I used an alignment of NE, which means that the label (text or graphic) will be placed to the UPPER RIGHT (NorthEast) of the specified point. This allows me use the left boundary of the image as 0.0 and insert everything to the right. I added two + signs, which would not normally be present, to show the extents of the image. If you create something that extends beyond the limits, Asymptote will silently increase the size of the canvas to accommodate it, despite the “size” directive! This is dangerous when you are working within a journal’s style guide. Thus, I used SW as the alignment for the upper right + sign, so that the + will lie entirely within the desired dimensions. You can check the actual dimensions of the final EPS file by opening it (I used Document Viewer in Linux) and checking the Properties.

Using the MATLAB Distributed/Parallel Computing Toolbox without modifying your cluster

This post shows how I set up MATLAB to perform parallel calculations on a multi-core server which is part of an existing cluster that is managed using Torque and Moab. I did this without making any changes to the cluster submission system. The installation documentation for the MATLAB Parallel Computing Toolbox and Distributed Computing Server is poor. The instructions make a lot of assumptions that don’t fit into an existing “production” cluster environment. For example, they assume that a GUI is available to configure the cluster (this step is not required for my method!) Further, the instructions give the impression that MATLAB components must be “installed” on both the head node and compute nodes by an administrator. THIS IS FALSE! Ordinary users can run concurrent (multicore) MATLAB jobs without administrative privileges!

The following method also allows you to run MATLAB with more than 8 or 12 cores-in fact, you can use 16 cores or more on a single machine! It also may be possible to distribute tasks across multiple nodes (hosts), but that’s for an upcoming post. See my repository of PBS submit script examples on GitHub.

Continue reading

Opportunity for postdoctoral research associate in high performance computing

My current employer, the STOKES Advanced Research Computing Center (STOKES ARCC), is hiring a postdoctoral research associate to conduct research in high performance computing with an emphasis on next-generation networking technologies. The ARCC has internal funding that will be used to upgrade our research network to the Internet2 Innovation Platform standard. We are also seeking external funding to extend the research network across the UCF campus. We are looking for a candidate with an interest in topics such as defining a “Science DMZ,” Internet2, GENI, perfSONAR, software-defined networks, etc. Please use the link above to apply for the position. Feel free to contact me if you have questions-my contact information is on the about page.

Updated GROMACS tutorials

I have published up-to-date versions of two classic GROMACS tutorials on GitHub. The Getting Started section of the GROMACS online documentation contains some helpful tutorials.  Unfortunately, these tutorials have not been updated in a while. They also don’t explain how to set up an efficient workflow to run large molecular dynamics simulations on a shared cluster using a resource manager such at Torque. I have created a set of files that implement the speptide tutorial from the GROMACS documentation.You can use my files and follow along with the explanations in the GROMACS manual. Continue reading

Installing Lumerical FDTD on a linux cluster

Most of the time, RPM (especially in conjunction with yum) is a decent package management solution. However, I can think of two common circumstances when you don’t want to let RPM install a package:

  • You don’t have root permissions on a system such as a shared cluster
  • You are an administrator on a shared cluster and you can’t risk having a package over-write system-critical files

Continue reading

OpenMPI, Intel Compilers and RedHat 5: cannot find -lnuma

I found an interesting quirk when trying to build an OpenMPI application on a visualization node with a “stock” version of Red Hat Enterprise Linux 5.8.  I used mpicc to compile the application and got the following error:

$ mpicc hello_world_mpi.c -o hello_world
/usr/bin/ld: cannot find -lnuma

Continue reading

Building NumPy and SciPy with Intel Composer 2013 and the MKL

Since Python is widely used as a high-productivity language for scientific computing, Intel has created a page showing how to build NumPy with Intel compilers and the Math Kernel Library (MKL). I would like to clarify a few items regarding building NumPy on a 64-bit Red Hat Enterprise Linux 5.4 system. Since this is a production system, I don’t want to replace the Python 2.4 binary -2.7.3-intel-composer-2013that ships with RHEL 5.4. Instead, I created a directory called

/apps/python/python/-2.7.3-intel-composer-2013

and set

PYTHONPATH=/apps/python/python-2.7.3-intel-composer-2013
LD_LIBRARY_PATH=/apps/python/python-2.7.3-intel-composer-2013/lib:$LD_LIBRARY_PATH

Continue reading