Modeling, simulation, and engineering

Profiling memory usage of Python code

April 17th, 2009 Posted in Linux, Python, Scientific computing, Software development | No Comments »

In a previous post, I explained how to use the Python profiler.  The profile is great for finding out which parts of the code run the slowest, or are called most often.  However, the profiler doesn’t give any information about how much RAM is being consumed, or where it’s being consumed.  If your program needs so much memory that it starts swapping to disk, its speed can be reduced by orders of magnitude.  On the positive side, your code may run much faster if it fits entirely in the processor cache.  In this post, I will introduce two tools that can help you understand the RAM usage of your Python code.

Read the rest of this entry »

How to make two mice work with xwindows (x.org)

April 10th, 2009 Posted in Linux | No Comments »

It’s a real pain to surf the Web when the batteries die in your wireless mouse or trackball.  I use some old rechargeables that are no longer fit for digital camera service, so this happens to me fairly often.   My batteries just died again, so I had to figure out how to configure X.org to use a backup mouse.  It’s actually not hard to have an old 3-button mouse plugged into the PS/2 port and tell X.org to use it as a backup.  First, define two input devices:

Section "InputDevice"
	Identifier  "Mouse0"
	Driver      "mouse"
	Option	    "Protocol" "auto"
	Option	    "Device" "/dev/input/mouse0"
	Option      "Buttons" "7"
	Option	    "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
	Identifier  "BackupMouse"
	Driver      "mouse"
	Option	    "Protocol" "auto"
	Option	    "Device" "/dev/input/mouse1"
	Option      "Buttons" "5"
	Option	    "ZAxisMapping" "4 5"
EndSection

The first mouse is my wireless Logitech trackball (which is currently a fancy paperweight while the batteries charge).  The second mouse is an old Microsoft two-button mouse with a scroll wheel that functions as buttons 3,4, and 5.  The trick is to define the ServerLayout as follows:

Section "ServerLayout"
	Identifier     "X.org Configured"
	Screen      0  "Screen0" 0 0
	InputDevice    "Mouse0" "CorePointer"
	InputDevice    "BackupMouse" "SendCoreEvents"
	InputDevice    "Keyboard0" "CoreKeyboard"
EndSection

Note that there can only be ONE “CorePointer” so set that to be your primary mouse.  Set the backup mouse to “SendCoreEvents” so that it receives mouse events, but X.org won’t freak out if it’s not present.

Lookup tables and spline fitting in Python

April 8th, 2009 Posted in Python, Scientific computing, Software development | No Comments »

Lookup tables and spline fitting are widely used by scientific programmers.  A particular function may not have an analytic solution–in other words, it can’t be expressed as an equation of elementary functions.  This might happen if the function were empirically determined from experimental data, or if the equation can’t be algebraically solved for one variable.  If the analytical function is available, but takes a long time to evaluate, a lookup table or spline approximation can be considerably faster.  In a previous post, I showed how to use the function interp1d from scipy.interpolate as a lookup table.  In a later post, I showed that interp1d is actually rather slow, and scipy.interpolate.UnivariateSpline is much faster.  Now, I will show some benchmark results, and explain a potential pitfall when using UnivariateSpline.

The two functions I will use for this demonstration are not very complicated.  Each function is a piecewise approximation of a more complex function.  Here is one of the functions.  For small z, the function uses one approximation, and for large z, it uses a far-field approximation.  For intermediate values, a polynomial is used to smoothly match the two solutions.

def M_wall_par(z):
    h = z - 1.0
    if h < 0.15:
        return 1./(0.9588+8./15.*log(1./h) + 64./375.*h*log(1./h))
    elif h < 0.6:
        return 0.33672 + 1.29936*h - 1.75005*h**2 + 0.89186*h**3
    else:
        return 1.-9./(16.*z)+0.125/z**3

I won’t go into the other function, which is similar.  Here is a code fragment showing how to create a lookup table for this function:

M_par_spline = UnivariateSpline(z, M_parallel, s=0)
M_par_fit = empty(len(z), dtype=float)
for i in range(len(z)):
    M_perp_fit[i] = M_perp_spline(z[i])
    M_par_fit[i] = M_par_spline(z[i])

The keyword parameter s=0 is critical, because it forces UnivariateSpline to act as a lookup table, with linear interpolation between points.  If s is set to anything higher than zero, UnivariateSpline fits a spline to the data, which can introduce fitting errors.  If you are going to use a spline fit, look at a plot of the fitted curve to make sure it’s valid across the entire domain of interest.  I benchmarked the speed of the 100,000 function calls and compared it to the same number of calls to the interpolating object. Here are the results:

  • Function calls: 1.8789 seconds average, standard deviation of 0.005
  • UnivariateSpline calls: 0.9662 seconds average, standard deviation of 0.005
  • interp1d calls: 24.8 seconds average, standard deviation of 0.1

The interpolation runs almost twice as fast as the function call, but interp1d is an order of magnitude slower!  This is because UnivariateSpline wraps a function from FITPACK (an optimized, compiled Fortran library), while interp1d is a pure Python function.  My benchmark excluded overhead such as creating the UnivariateSpline object, creating Numpy arrays, etc.  Here is a plot showing that the interpolated curve tracks perfectly with the function data (note there are two different functions plotted on this graph).

1D interpolation results (no smoothing)

1D interpolation results (no smoothing)

In conclusion, the UnivariateSpline class from Scipy is the way to go for creating fast lookup tables, if you set the keyword parameter s=0.  If you set s to a value greater than zero, the same class will fit a spline of the specified order, although you’d better check to make sure it is actually a valid approximation of your data.

3D Plotting Software for Python::Part 1::PyX

March 20th, 2009 Posted in Python, Scientific computing, Software development | 1 Comment »

There are lots of good open-source tools that you can use to make high-resolution, publication-quality 2D plots.  Personally, I like to use Python, numpy, and matplotlib.  Unfortunately, it is much harder to find a good tools to make 3D plots.  Older versions of matplotlib had rudimentary 3D support, but this was removed in version 0.98.  In this post, I will review a Python 3D plotting library called PyX.

Preparing the data

Figuring out how to store the data to be plotted was actually the hardest part of learning to use PyX.  The data format for 3D plots is not well documented.  PyX requires a list of (x,y,z) lists like this:

[
[x0  y0  z[0,0]]
[x0  y1  z[0,1]]
[x0  y2  z[0,2]]
...
[x1  y0  z[1,0]]
[x1  y1  z[1,1]]
[x1  y2  z[1,2]]
...
]

Unfortunately, PyX is designed to work with lists rather than Numpy arrays, which can be a real disadvantage for large datasets.  Here is a code snippet that shows how I transformed some Numpy data into a list of lists that can be plotted in PyX.  The 1D Numpy arrays z_values and theta_average contain x and y values, and the z values are stored in the 2D Numpy array called B_h_theta.


# Reshape data for plotting with pyx
values = []
for i in range(B_h_theta.shape[0]):
    for j in range(B_h_theta.shape[1]):
        values.append([z_values[i], theta_average[j], B_h_theta[i][j]])

Plotting

First create a graph.data.points object.  The one required argument is a list of values to plot, in the form described above.  There are other types of graph.data objects for reading data from text files, plotting functions, etc.  Then, create a graph.graphxyz object which defines the visual appearance of the graph.  Call the plot() method of graph.graphxyz and pass in the data object as an argument.  Call the dodata() to finish the plot, and then call writeEPSfile() or writePDFfile() to write the plot to a file on disk.  There is no way (that I found) to just show the plot in a graphical window.


from pyx import *
v = graph.data.points(values, title="B(h,theta)", addlinenumbers=0, x=0, y=1, z=2)
g = graph.graphxyz(size=4, projector=graph.graphxyz.central(10, 150, 30), \
        x=graph.axis.linear(title="z"), y=graph.axis.linear(title=r'$\theta$'),\
        z=graph.axis.linear(title="B(h,theta)"), x2=None, y2=None, z2=None)
g.plot(v, [graph.style.grid()])
g.dodata()
g.writeEPSfile("phi_plot")

Results

3D Plot from PyX

3D Plot from PyX

Advantages

  • Clean, object-oriented interface
  • Produces nice surface and wireframe plots in .pdf and .eps formats
  • Appears to be flexible and full-featured

Disadvantages

  • Does not seem to be actively developed or maintained (although it is very useful in its present state)
  • Does not integrate easily with Numpy–all data needs to be stored in text files or Python lists
  • No GUI output–writes images straight to .pdf or .eps files
  • TeX labels have poor resolution (at least on my system)

The Python configparser: a way to read simple data files

February 3rd, 2009 Posted in Linux, Software development | 1 Comment »

My simulation library, which is written mostly in Python, needs a lot of data and parameters in order to run. In some cases, I just hard-code the values in the script that calls the library, and in other cases I load a pickle file containing a Python objext. What if I want to read in data or configuration parameters from a human-editable text file? If the information is extensive or complex, XML might be a good choice, but XML is overkill for simple configuration or data files. Fortunately, a standard Python library called ConfigParser has already defined a configuration file format, and provides methods to interact with such files. Here is a sample of the format used in a config file:

[0.01]         ; this is a comment
1.0: 27.221
2.0: 4.214
5.0: 0.6973
10.0: 0.1523
30.0: -0.00120
[0.05]
1.0: 325.586
2.0: 30.858
5.0: 2.376
10.0: 0.458
30.0: -0.0159

Reading such a file is easy:


parser = ConfigParser.SafeConfigParser()
parser.read("filename")
section_names = parser.sections() # returns a list of strings of section names
items_in_section = parser.items(section_names[i]) # get a list of tuples from a section
pair = parser.items(section_names[i])[j]  # get one key/value pair
key = pair[0]
value = pair[1]     # values are always strings--remember to convert if needed

There are a couple of interesting “features” you should know about. First, if you open a file using the parser.read(”filename”) method, no exception will be raised if the file does not exist! If reading the file is not optional, first open the file with the fp=open(”filename”) and then call parser.readfp(fp). The second caveat is that the sections or key/value pairs returned from the parser are not necessarily returned in the order that they exist in the file. If you are trying to read data into arrays, be aware that the arrays may not be in order, so you may need to sort them.

Why reinvent the wheel when trying to do simple things? Unless you are doing it as an exercise, just look through the Python library and you’ll probably find that someone has already done the work for you.

Updated Python class for writing Paraview (VTK) (.vtu) files

January 23rd, 2009 Posted in Python, Scientific computing | No Comments »

I have released a new version of my Python class that generates VTK data files in the .vtu format, which is compatible with Paraview and other VTK applications.  If you have downloaded the old one, please get the latest version, which incorporates some bug fixes and has been more thoroughly tested.

How to put formatted, highlighted code in a Wordpress post

January 17th, 2009 Posted in Software development, Web design | No Comments »

I found two complementary plugins that enable me to put highlighted formatted code in a Wordpress page or post. Here’s an example of what they do:

# Plot flux at continuum boundary
pylab.figure()
pylab.hold(True)

pylab.plot(nd_times, Jl_BD, 'b-', label="Flux from BD simulation")
pylab.plot(nd_times, Jl_BD+Jl_BD_std, 'b.', label="Flux from BD simulation")
pylab.plot(nd_times, Jl_BD-Jl_BD_std, 'b.', label="Flux from BD simulation")

Test successful! Note that Javascript must be enabled to see the highlighting. Get the plugins here:

Deploying Python applications on Windows

January 15th, 2009 Posted in Linux, Python, Software development | 1 Comment »

Writing applications in Python on a Linux system is almost too easy.  Deploying Python apps on other Linux systems is not hard, because most Linux systems already have Python, with its core libraries and tools, installed.  Most Linux systems also have package managers that make it easy to find and install required components.  But, what happens when your co-workers who use Windows need to use your app?  When you tell them to “go to the command line and…” you’ve pretty much lost them at “command line.” How do you package Python in a way that’s easy for a Windows user to install?

Here is a process that worked for me:

  • Python distutils: You need this to deploy on all types of systems, so it’s best to go ahead and learn it now.  If you’re going to deploy on Windows, make a source distribution in a .zip archive.
  • py2exe is an amazing tool that will bundle your Python application, and all its dependencies, in one place.  Copy the .zip archive created by distutils to a Windows computer.  Install Python and all the required libraries on the Windows PC.  Test the app and make sure it works.  Then, use the py2exe tutorial and the application notes to see how to create a self-contained directory that contains a Windows executable.  You can just zip up this directory and distribute it, and all the user has to do is unzip the archive and double-click the right icon.
  • Optional–create an installer with a tool like NSIS.  I haven’t actually done this, because my app isn’t that complex.  It looks like NSIS will allow you to easily create a professional-looking, wizard-based installer for Windows.  I’d be interested to know how well it works.

Scipy.integrate ODEPACK import error solved!

January 14th, 2009 Posted in Linux, Python, Scientific computing | No Comments »

I recently found a solution to a problem that had been vexing me for about a year. In order to successfully import anything from scipy.integrate, I had edit the file scipy/integrate/__init__.py and comment out the line

from odepack import *

If not, I would get various import errors such as

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/scipy/integrate/__init__.py", line 10,
in <module>
    from odepack import *
  File "/usr/lib/python2.5/site-packages/scipy/integrate/odepack.py", line 7,
in <module>
    import _odepack
ImportError: /usr/lib/python2.5/site-packages/scipy/integrate/_odepack.so:
undefined symbol: daxpy_

I found the solution in Gentoo bug 251165. The problem only occurs when Scipy is built with non-reference versions of BLAS and CBLAS. The solution is to install the reference implementation of BLAS and CBLAS, rebuild Scipy, and then use whichever implementation of BLAS or CBLAS you want. I don’t know why this happens, and I don’t know if it affects distributions other than Gentoo.

By the way, Gentoo has a really useful system tool called eselect, which has various modules that are used to choose between different versions or implementations of tools on your system (BLAS, Java virtual machine, kernel sources, OpenGL, etc.)

Optimizing Python code for fast math

January 9th, 2009 Posted in Python, Scientific computing, Software development | 1 Comment »

I spent some time today profiling a Brownian dynamics simulation written in Python to see how I could make it faster before starting some long runs on a Linux cluster. In the sections below, I have attempted to quantify the speed improvements due to various changes. Keep in mind that the actual speed improvement in your code will vary, depending on where the actual bottlenecks occur. See my post about profiling Python code. Another caveat: I am running Python 2.4.4 because it’s installed on our cluster.

Use scipy.interpolate.UnivariateSpline instead of interp1d

In a previous post I described how to use interp1d from scipy to implement a fast lookup table. Well, it turns out that is actually a slow lookup table. The function scipy.interpolate.UnivariateSpline() is much faster because it wraps a compiled library (FITPACK), and it’s pretty much a drop-in replacement for interp1d. The one thing it can’t do is return a default value for out-of-range x values.

For scalar math, Python functions are faster than numpy functions

In one method, I realized that I had accidentally imported the sqrt() function from numpy, but I was only using it to compute the square root of scalar floating point numbers. Finding roots is computationally intensive, and from previous experience I suspected that numpy might be slower than the build-in Python math functions, so I knew this might be an easy place to find some speed. To quantify how much speed, I wrote this little test script:

from math import sqrt
from time import time
num_repeats = 10000000

start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "math.sqrt() Elapsed time: " + str(time()-start_time)

from numpy import sqrt
start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "numpy.sqrt() Elapsed time: " + str(time()-start_time)

It turns out that the sqrt() function from the standard Python math module is about seven times faster than the corresponding sqrt() function from numpy. As a side note, I learned that it is slightly faster (5-10%) to use the form “from math import sqrt” than it is to use “import math” and “math.sqrt()”. Because of this, you may not want to use “from numpy import *” if you are really concerned about speed.

Use Numpy for generating random numbers

Another significant improvement in speed came when I switched from the gauss() function of the built-in random library to the normal() function from numpy.random. Using a similar script to the one above, I found that the numpy version is about four times faster than random.gauss(). However, generating a random number from a uniform distribution (random.uniform) is very fast. In the source code of random, you will note that drawing one random number from a Gaussian distribution requires computing a square root, a log, a cosine, and a sine in Python. Numpy is faster because it does the math in C.

Function calls are expensive

A minor, but still useful, speed improvement came when I eliminated some unnecessary calls to the built-in functions max(), min(), and abs(). max(float1, float2) can be replaced with

if num1 > num2:
    a = num1
else:
    a = num2

which is about 35% faster. min() and abs() can be similarly replaced with if/then/else statements.

Eliminate unnecessary import statements

When you are wrapping up development of a module, go back and delete all the “from some_module import some_unnecessary_function” statements. This is marginally helpful with speed, but it’s a good programming practice.

Redirecting standard output from Python: another example

January 7th, 2009 Posted in Python, Software development | No Comments »

I wrote a previous post about how to redirect standard output from a Python script to a GUI window.  In this post, I will give an even simpler example of to redirect standard output to a log file.  During the early development and debugging of Python programs, I use print statements to keep me informed of what’s happening.  However, printing to the terminal is not always practical–for example, when I run numerical code on a parallel cluster, there is no way to determine which output came from which instance of the program.  Here is a class that you can use to redirect standard output to a log file:

class Logger:
    """The Logger class can be used to redirect standard output to a log file.
    Usage: Create a Logger object and redirect standard output to the Logger
    object.  For example:
    output = Logger(file_handle, True)
    import sys
    sys.stdout = output
    """

    def __init__(self, logFile, echo):
        """Arguments:
        logFile     a file object that is available for writing
        echo        Boolean.  If True, output is sent to standard output in
                    addition to the log file.
        """
        import sys
        self.out = sys.stdout
        self.logFile = logFile
        self.echo = echo

    def write(self, s):
        """Required method that replaces stdout. You don't have to call this
        directly--all print statements will be redirected here."""
        self.logFile.write(s)
        if self.echo:
            self.out.write(s)

I create a Logger object and pass it as an optional keyword parameter to an object that produces output that I want to redirect.  The __init__ method of that object looks something like this:

    def __init__(... , output=None):
        if output is not None:
            print "Redirecting output..."
            import sys
            sys.stdout = output

You should be able to make use of this class “as is.”  It would be a little more robust if it checked whether the file object was available for writing–feel free to improve upon it.

Tools for Python software development

January 6th, 2009 Posted in Linux, Python, Software development | No Comments »

I have found a few tools over the years that I find extremely useful for developing software. Python is my language of choice at the moment, but I’m sure these tools will be handy for any language.

  • Subversion is an open-source version-control system. Version control was designed to allow multiple programmers to work on the same project at the same time without stepping all over one another. However, even though I am a solo developer, I find version control to be extremely helpful.
    1. When I commit changes to the repository, I can document what I’ve changed and why. This is a great help when I introduce a bug and have to go back and find it.
    2. The repository is stored on a remote server that is backed up nightly.
    3. It’s easy to make an “unstable” branch for implementing new features. When I make changes that don’t work, it’s easy to revert to a previous version that works.
    4. It is easy to deploy my code to the Linux cluster and make sure that the cluster is running the latest version of my software.
  • Go read the documentation on the Subversion web site to find out what it can do for you.

  • RapidSVN is a GUI client for a Subversion server. By default, Subversion comes with a command-line client that does everything you need. However, sometimes it’s easier to stay organized when everything is presented visually. Here is a screenshot of RapidSVN:

    RapidSVN screenshot

    RapidSVN screenshot

  • gvim is my usual editing tool.  I don’t use a lot of the advanced features, but I love the way that my finger movement is minimized while typing.
  • Meld is a graphical diff tool. If you find the command-line tool “diff” handy, but not easy to use, download Meld. It’s like the diff mode in gvim, but even better. Here is a screenshot of a complex diff that illustrates why Meld is so cool:

    Meld screenshot

    Meld screenshot

A lookup table for fast Python math

December 11th, 2008 Posted in Python, Scientific computing | 2 Comments »

EDIT: the UnivariateSpline class from Scipy is much faster!

Numerical programming frequently requires the use of look-up tables. A look-up table is a collection of pre-computed values. When given an “x” value, the table returns a pre-computed “y” value. Look-up tables can be used to speed up numerical codes, when it is faster to look up a value in the table than it is to compute the value. They are also used when the data in the table cannot be computed–for example, experimental data or averaged results from an ensemble of Monte Carlo simulations. Another application is to compute a value when a function cannot be solved algebraically. Assume that you have a formula for a function q(h). You need the value of h for a given value of q, but the formula cannot be algebraically solved to get h(q). Instead, choose a range of h values, compute the function q(h), and store each value in a look-up table. Now you can get h(q) for any value stored in the table. The major limitation of a look-up table is that it cannot return valid results for any value of q which is outside the range of those stored in the table. Depending on its implementation, the table may be able to interpolate to return values between known points.

Scipy includes a handy lookup table class, but it’s sort of hidden. Here is an example of how to create a lookup table using interp1d:

from scipy.interpolate import interp1
from numpy import *

ddeltah = L/1000.
h_sampled = arange(0., L+deltah/10, deltah)
q_sampled = q(h_sampled,L,D,1e-4)
h_interp = interp1d(q_sampled, h_sampled, kind='linear')
print h_interp(0.514234)

First, create a Numpy array h_sampled to store the h-values for the lookup table. The function q(h,L,D,dt) is not shown, but it takes an array of h values and returns a floating-point array of q values in the range from 0 to 1. These two arrays are then used to construct the lookup table. The table uses linear interpolation to compute values between the known points. Higher-order interpolations can be used, but I don’t need them in this case. Higher-order interpolations can introduce distortions for certain types of data, so they are best avoided unless you really understand the data being fitted. Below is a plot illustrating the results of a look-up table:

The solid line is the function h(q), the black dots are the values stored in the lookup table, and the red triangles are values interpolated from the lookup table. The function h(q) was approximated with a lookup table that was created by computing q(h) with an evenly spaced array of h values. Notice how the points stored in the table are spaced more closely together for smaller values of q. This is a consequence of using a linear “h” spacing with a nonlinear function. You could work around this by using a non-uniform array of “h” values to create the lookup table. This is another reason to understand the function you are approximating, instead of blindly placing values into the table.

If you want to understand more about how lookup tables work, or don’t want to install SciPy, check out this interpolated lookup table class. I wouldn’t use it for serious math, since it doesn’t use Numpy and is probably not very fast.

Update 2: building 64-bit Numpy with Intel compilers and MKL

December 9th, 2008 Posted in Linux, Python, Scientific computing | No Comments »

In a previous post I described how I built Numpy with Intel compilers and the Math Kernel Library on a 64-bit cluster. Today I upgraded to Numpy-1.2.1 and I made a few improvements to my install process. Please read the previous post, since I will not duplicate some important information, and then read on.

This time, I made use of a site.cfg file. Copy the file site.cfg.example to site.cfg and edit. At the end of the file, uncomment the [mkl] section and set the path to your library. Mine looks like:

[mkl]
library_dirs = /opt/intel/mkl/10.0.1.014/lib/em64t/
lapack_libs = mkl_lapack
mkl_libs = mkl, guide

Configure the build with the following command. I’m not 100% sure that you need to specify the compilers at this point, but it never hurts to be consistent:

python setup.py config --compiler=intel --fcompiler=intel

After configuration, build the library:

python setup.py build --compiler=intel --fcompiler=intel --verbose > install_output.txt

Finally, install in my home directory since I don’t have admin privileges on this cluster:

python setup.py install --home=~

This seems like a better method than the one I used before. It doesn’t require typing such a long command line. It helps that the site.cfg.example file from numpy-1.2.1 is a little easier to follow.

Using Python to generate XML files for visualization in Paraview

November 13th, 2008 Posted in Linux, Python, Software development | No Comments »

VTK is an open-source software system for “3D computer graphics, image processing, and visualization” developed by by Kitware. VTK is the foundation of Paraview, an industrial-strength CFD visualization tool that I have found to be very useful. I generate “second generation” XML-based files from my Python code and import them into Paraview for visualization. I am in the process of creating some Python classes to do, and I hope to publish them soon. Until then, I want to share some useful resources. The VTK file formats are specified in this document. It’s a pretty good specification, but it lacks some examples. Soon I will post an example of a valid unstructured, serial .vtu file. Each VTK file includes data from only one time step, so you have to keep track of time yourself (the filename is an easy solution). Paraview can read in data from multiple time steps, but you have to specify them in a .pvd file. This is also an XML file, with the following format: (reference)

<?xml version="1.0"?>
<VTKFile type="Collection" version="0.1" byte_order="LittleEndian">
<Collection>
<DataSet timestep="131" group="" part="0" file="particles_step131.vtu"/>
<DataSet timestep="417" group="" part="0" file="particles_step417.vtu"/>
<DataSet timestep="923" group="" part="0" file="particles_step923.vtu"/>
<DataSet timestep="1744" group="" part="0" file="particles_step1744.vtu"/>
<DataSet timestep="5113" group="" part="0" file="particles_step5113.vtu"/>
<DataSet timestep="17613" group="" part="0" file="particles_step17613.vtu"/>
<DataSet timestep="29999" group="" part="0" file="particles_step29999.vtu"/>
</Collection>
</VTKFile>

You can read in a single .pvd file from Paraview, and all the timestep files will automatically be read in.

In the near future, I plan to publish a Python class that handles the details of generating valid XML, which you can use as an example or as a foundation to create more advanced VTK output.

Unexpected integer/float math behavior in Python

November 6th, 2008 Posted in Python, Software development | No Comments »

I wasted some time today tracking down a bug in one of my programs.  It turned out to be “unexpected behavior” rather than a bug.  I was aware of this aspect of the language, but I made an assumption and got bit.  Read on for a valuable lesson.

Python handles integer math differently than floating point math.  If you type a number without a decimal point, Python treats it as an integer.  All math performed only with integers results in integers. For example, 1/2 evaluates to 0 while 1./2. evaluates to 0.5.  If you mix integers and floats, Python will  produce a floating point result (1/2.=0.5), but you must be very careful.  For example, you might expect the expression 4/3*3.14159 to yield a floating point result.  It does yield a floating point number, but not the one you were expecting!  4/3*3.14159 yields 3.14159.  What happened?  Python works from left to right.  4/3 evaluates to the integer “1″.  1*3.14159 evaluates to  3.14159.  For comparison, 4./3.*3.14159 evaluates to 4.1887866.  Here’s the problem with this particular aspect of Python: according to the rules of math, 4/3*3.14159 is exactly the same expression as 4*3.14159/3, but in Python they yield different results if you forget the decimal points!  4*3.14159 evaluates to a floating point, so (4*3.14159)/3 yields the “correct” floating point value.

Lesson Learned: be explicit about specifying all floats if you are doing floating-point math!  Sometimes I get lazy and leave a trailing decimal point off of a number when doing a floating point calculation, knowing that the results are “upcast” into floats. Not any more!

Note: this unexpected behavior goes away in Python 3.0

Profiling Python code

October 24th, 2008 Posted in Python | 1 Comment »

“Speed” is a complicated term when used in the context of software.  Does it mean raw speed of execution, or reducing the amount of time until a correct result is obtained?  Python is not the first language that comes to mind when people think of “fast software.”  It is true that pure Python will usually not execute as quickly as the same algorithm directly coded in C or Fortran.  However, when you define speed as “least amount of time until you get the right answer,” then Python is pretty fast.  It is so easy to develop correct code in Python, when compared to low-level compiled languages, that Python is often the fastet route to a correct answer, even if the execution time is longer.  Having said that, there are times when code has to execute quickly, and that’s why I will introduce you to profiling Python code.

Like all things Python, profiling is easier than you think.  Python 2.4 has the profile module, and Python 2.5 has both profile and cProfile.  cProfile is written in C for lower overhead, and it’s the recommended version.  I am stuck with profile, because the cluster that I am working with still uses Python 2.4.  You can read the docs for more details, but I will quickly outline what I find to be the most helpful usage.  For example, say I want to profile the file run_sim.py.  I use the following command line:

python -m profile -o profile_file run_sim.py

The argument -m tells Python to run the library module profile as a script.  The argument -o is an option that tells profile to write the profile data to the specified file name.  The last argument, of course, is the name of the script to analyze.  Once the run is finished,  you can analyze the results.  I prefer to do this interactively, and I highly recommend the iPython shell.  The following is an example of an interactive session in iPython.

In [1]: import pstats
In [2]: p = pstats.Stats('profile_file')
In [3]: p.sort_stats('cumulative').print_stats(10)
Fri Oct 24 17:18:37 2008    profile_file

         278914972 function calls (277313577 primitive calls) in 4618.950 CPU seconds

   Ordered by: cumulative time
   List reduced from 550 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000 4618.950 4618.950 profile:0(execfile('run_sim.py'))
        1   12.740   12.740 4618.950 4618.950 run_sim.py:3(?)
        1    0.000    0.000 4618.950 4618.950 :0(execfile)
    129/1    0.000    0.000 4618.950 4618.950 <string>:1(?)
   100000 1416.240    0.014 4513.100    0.045 ~/BrownianDynamics.py:212(timeStep)
 12130610 1144.580    0.000 1620.700    0.000 ~/Brownian_Dynamics/trunk/BrownianDynamics.py:105(hasXYZCollision)
 36484167  497.980    0.000  779.550    0.000 /usr/lib64/python2.4/random.py:506(gauss)
   200000   22.760    0.000  673.340    0.003 ~/lib64/python/scipy/integrate/quadrature.py:182(simps)
   300000  545.730    0.002  637.640    0.002 ~/lib64/python/scipy/integrate/quadrature.py:152(_basic_simps)

This is the profile of a Brownian Dynamics simulation.  I have sorted the results by cumulative time, to see what parts of the code take longest to run.  The first five “calls” are not that useful–they show the profiler calling the “master” script, which in turn calls a “slave” script that actually does the work.  The fifth call is where things get interesting–it shows the runtime of the script that actually does the computations.  The sixth call is to the method “hasXYZCollision” within that script.  Clearly, this is the most time-consuming part of the code, which is why I have spend the past few posts optimizing it!  The next call is to a random number generator–not much I can do about that right now.  The final two calls demonstrate that you have to read the profile carefully.  The function “simps” calls the function “_basic_simps”, so the total time spend doing numerical integration is actually 673 seconds, not 673+637 seconds.

I have found the Python profiler to be extremely useful.  Since “premature optimization is the root of all evil,” the profiler is an necessary tool to find and improve those bits of code that are truly inefficient.

Updated: building 64-bit Numpy with Intel compilers (icc)

October 17th, 2008 Posted in Linux, Python | 1 Comment »

I had to re-build Numpy because our cluster was upgraded and the Intel compilers and libraries were moved to a different directory.  This turned out to be a half-day affair of trial-and-error.  I learned a few important things, which I will try to list here:

  • Delete the numpy-1.0.4/build directory after every build attempt.  Doing “python setup.py clean” is not effective.  I kept getting errors about undefined symbols when I tried to “import numpy” on the Python command line.  It was looking for symbols in the old locations, even though I had just rebuilt the code using the new library locations.  It turned out that I needed to delete the build directory in order to force a complete bottom-up rebuild.
  • The use of “setup.py” from distutils is not well documented online.  The best thing to do is run “python setup.py –help-commands” to get a list of available commands.  Then run “python setup.py <cmd> –help” to get help for that specific command.  You can string commands together on the command line, as I will show in the example below.
  • When you test the new numpy, make sure you are not in the numpy-1.0.4 directory!  If you are in the numpy source directory, when you import numpy, you will get the message “Running from numpy source directory.” and you will not be able to load any symbols from numpy.
  • On 64-bit architectures, you need to compile position-independent library code.  For some reason, distutils does not do this automatically, and the compilation will fail with an error similar to the following:
    relocation R_X86_64_PC32 against `_various_library_symbols'
    can not be used when making a shared object; recompile with -fPIC

    Edit the file numpy-1.0.4/numpy/distutils/intelccompiler.py.  Change the line cc_exe=”icc” to cc_exe=”icc -fPIC” (line 11 in my version of numpy).

  • Make sure your LD_LIBRARY_PATH is pointing at the right location, or the compiled code won’t be able to find shared libraries.

Here is the command line I used to build numpy:

python setup.py config –library-dirs=/apps/intel/mkl/10.0.1.014/lib/em64t/ –library-dirs=/apps/intel/cce/10.1.008/lib/ –compiler=intel –fcompiler=intel build –verbose install –home=~ > install_output.txt

Explanation: “config” is a command that accepts arguments –library-dirs to set the path where icc searches for shared libraries, –compiler to set the C compiler, and –fcompiler to set the Fortran compiler.  Technically, I think you can omit the “build” command.  The “install” command installs numpy, and the –home=~ installs it in the user directory (I don’t have admin privileges on this machine).  Finally, > install_output.txt redirects the lengthy output to a text file for debugging purposes.

I hope this saves somebody some time!

Even faster collision detection in Python using Numpy

October 16th, 2008 Posted in Python, Scientific computing, Software development | No Comments »

Last night, in the shower, I realized that my collision detection routine could be even faster. Here is a representative snippet of code from my previous post:

    d2 = (x-self.x[0:i])*(x-self.x[0:i]) + (y-self.y[0:i])*(y-self.y[0:i]) + (z-self.z[0:i])*(z-self.z[0:i])

For some reason, I used the code (x-self.x)*(x-self.x) instead of (x-self.x)**2. Upon further reflection, I realized that (x-self.x)*(x-self.x) computes the difference between array elements twice, and then multiplies the results. Using a “power function” should enable the interpreter to compute the difference only once, and then multiply each element times itself. Here is the updated code, using Python’s power operator:

d2 = (x-self.x[0:i])**2 + (y-self.y[0:i])**2 + (z-self.z[0:i])**2

The previous version of the code ran in 46.3 seconds on a representative problem–the updated version ran in only 35.8 seconds on the same problem. That’s a pretty good improvement. Out of curiosity, I decided to try Numpy’s power() function in place of Python’s power operator:

two = i
d2 = power(x-self.x[0:i],two) + power(y-self.y[0:i],two) + power(z-self.z[0:i],two)nt_(2)

This version took 55.5 seconds to execute. Using a Numpy integer scalar instead of a Python integer made no difference. Numpy’s power() function is quite sophisticated–it can take two array arguments and compute element-by-element powers. I suspect that this flexibility results in some overhead that slows it down compared to Python’s built-in power operator.

Haze damages LCD projectors

October 14th, 2008 Posted in Stage Lighting | No Comments »

We recently learned the hard way that the use of haze can lead to reliability problems for some LCD projectors. We have two rather large LCD projectors (don’t know the specs offhand) permanently mounted to the ceiling, projecting onto the front of screens located on both sides of the stage.  A third LCD projector hits a large rear-projection screen that basically forms the back wall of the stage.  We also have an oil-based hazer mounted above the stage.  We had some old, tired projectors, so we assumed that they were just old and unreliable.  However, after replacing them with new, more powerful models, the reliability issues continued.  After sending the new projectors out for service for the second time in less than a year, the service center informed us that a film of oil had been deposited on the LCD.  You can buy expensive sealed-optics projectors, which should work reliably in dirty environments, but we saved money and bought standard models.  We don’t have the budget to replace the “new” projectors, so we’ve had to stop using haze altogether.  The stage doesn’t look nearly as good without it, and our moving lights are much less useful.

On a side note: you should be aware that haze can also damage moving light fixtures that use a fan to cool the power supply or motors.  The fan sucks in haze and coats the internal components with a film of oil.  Eventually, something overheats and the fixture can actually catch fire.  Fortunately, we have older High End Studiospots and Trackspots that apparently don’t use forced-air cooling, so apparently the haze doesn’t really get inside.  Be warned!