Modeling, simulation, and engineering

Constrained least-squares fitting with Python

March 5th, 2010 Posted in Python, Scientific computing | No Comments »

Scipy contains a good least-squares fitting routine, leastsq(), which implements a modified Levenberg-Marquardt algorithm.  I just learned that it also has a constrained least-squared routine called fmin_slsqp().   I am using simple upper and lower bound constraints, but it’s also possible to specify more complex functional constraints.

What I did not realize, at first, is that fmin_slsqp requires a different type of objective function than leastsq.   leastsq requires you to write a function that returns a vector of residuals, and leastsq automatically squares and sums the residuals.  fmin_slsqp is actually more flexible, in that it can use any objective function that returns a single scalar value.  To implement least-squares curve fitting, your objective function will need to find the residual at each data point, square the values, and sum them up.  Hopefully this tip will save you some time.

Check out the scipy optimization tutorial for more examples.  Here is the original paper by Dieter Kraft which introduces the algorithm used by fmin_slsqp.

A preliminary review of the Lehigh Rendition console

January 17th, 2010 Posted in Stage Lighting | No Comments »

In my last post, I explained why I don’t think the ETC Element is a good replacement for the ETC Express.  When I was at LDI 2009, I ran across the Rendition series of consoles from Lehigh Lighting.   Based on the literature I picked up, and the extensive demonstration that I received from the Lehigh representative, it looks like the Rendition might be the true heir to the Express.  It is targeted at a similar market: small theaters, churches, and schools.  The physical layout will be familiar to anyone with theater console experience: 24 or 48 submasters on the left,  dual cue playback controls in the center, and hardkeys on the right, with channel faders (48 or 96) along the upper right portion of the desk.  The submasters are traditional theater-style subs.  Each sub records a fixed look rather than an independent cuelist.  Shows can be saved to a flash drive via a USB port.
Lehigh Rendition 24/48
While the console hardware is finalized, several additional features will be added in an upcoming version of the operation system.  The console currently supports one external monitor, but dual monitors will soon be supported.  An Ethernet port is already included on the console.  In a future software release, it will be possible to add wings with additional submasters or channel faders.  The advantage of Ethernet over USB is that a wing can be placed hundreds of feet from the console, and connected using existing Ethernet lines.  This feature would allow a wing to to be used as remote focus unit.  Using a wireless Ethernet bridge opens up even more possibilities.

The main difference between the two models, the 24/48 and 48/96, is the number of hardware faders.  The software capabilities are identical.   The price of the console depends on how many channels of conventional dimming are “unlocked.”  You can choose from 125, 250, or 512 conventional channels.  All models support 1024 channels of moving-light control.  The two hardware DMX ports each support 512 channels, but two additional universes can be accessed via Ethernet (I’m not sure of the details on how this works).  The console seems to support all the basic and advanced cue and effect functions that you’d expect: multiple cue lists, flexible cue timing options, macros, subroutines, and effects.  I can’t really comment further about that, since I would need to spend a lot of time with the console to accurately gauge its reliability and ease of use.  Moving lights are also supported, with fixture profiles for easy patching and a trackball for focusing.  Once again, it looks promising, but the only way to really evaluate these features is to try to use them and see how intuitive the process is.  Download the offline editing software and try it for yourself.

A more sophisticated console called the Rendition Pro is expected to be available in the second quarter of 2010.  The Pro is aimed at more experienced users, and appears to compete with the ETC Eos and Ion.  It will run the same software as the Rendition, so many of the features that are expected to be added to the Rendition are actually being developed for the Pro.  The layout is similar to the Rendition, with playback masters on the left side, traditional cue playback controls in the lower center, and keys to the lower right. Two LCD displays with encoder wheels and softkeys occupy the space above the keypad, with no hard channel faders.  Further, the submasters  have LCD displays and additional buttons–probably “Go” and “Pause”  buttons to support a cue stack on each sub.

From my limited experience, both of these consoles series are worthy of further investigation.  They seem to be solidly built and well-engineered.  The Lehigh personnel that I spoke to were helpful and friendly, and I appreciate their assistance in answering my questions.

Storing large Numpy arrays on disk: Python Pickle vs. HDF5

January 10th, 2010 Posted in Python, Scientific computing, Software development | No Comments »

In a previous post, I described how Python’s Pickle module is fast and convenient for storing all sorts of data on disk. More recently, I showed how to profile the memory usage of Python code.  In recent weeks, I’ve uncovered a serious limitation in the Pickle module when storing large amounts of data: Pickle requires a large amount of memory to save a data structure to disk. Fortunately, there is an open standard called HDF, which defines a binary file format that is designed to efficiently store large scientific data sets. I will demonstrate both approaches, and profile them to see how much memory is required. I am writing the HDF file using the PyTables interface. Here’s the little test program I’ve been using:

#!/usr/bin/env python

from numpy import *

array_len = 10000000

a = zeros(array_len, dtype=float)

#import pickle
#f = open('test.pickle', 'wb')
#pickle.dump(a, f, pickle.HIGHEST_PROTOCOL)
#f.close()

import tables
h5file = tables.openFile('test.h5', mode='w', title="Test Array")
root = h5file.root
h5file.createArray(root, "test", a)
h5file.close()

I first ran the program with both the pickle and the HDF code commented out, and profiled RAM usage with Valgrind and Massif (see my post about profiling memory usage of Python code). Massif shows that the program uses about 80MB of RAM. I then uncommented the Pickle code, and profiled the program again. Look at how the memory usage almost triples!

 MB
232.2^                                                                  #
  |                                                                  #
  |                                                                  #
  |                                                                  #
  |                                                                  #
  |                                                                  #
  |                                                                  #
  |                                                         @        #
  |                                                         @        #
  |                                                         @        #
  |                                                         @        #
  |                                                         @        #
  |                                                         @        #
  |                                                ,        @        #  .
  |                                                @        @        #  :
  |                                                @        @        #  :
  |                                                @        @        #  :
  |                                                @        @        #  :
  |                                                @        @        #  :
  |                                                @        @        #  :
0 +----------------------------------------------------------------------->Mi
  0                                                                   161.9

I then commented out the Pickle code and uncommented the HDF code, and ran the profile again. Notice how efficient the HDF library is:

    MB
81.62^                              ,.., .....,.. .,...,... .,......,..,:  .
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |                              @::@ :::::@:: :@:::@::: :@::::::@::#:  :
   |              .,..,. .....    @::@ :::::@:: :@:::@::: :@::::::@::#:  :::
 0 +----------------------------------------------------------------------->Mi
   0                                                                   246.6

I didn’t benchmark the speed because, for my application, it doesn’t really matter, because the data gets dumped to disk relatively infrequently. It’s interesting to note that the data files on disk are almost identical in size: 76.29MB.

Why does Pickle consume so much more memory? The reason is that HDF is a binary data pipe, while Pickle is an object serialization protocol. Pickle actually consists of a simple virtual machine (VM) that translates an object into a series of opcodes and writes them to disk. To unpickle something, the VM reads and interprets the opcodes and reconstructs an object. The downside of this approach is that the VM has to construct a complete copy of the object in memory before it writes it to disk. I’m not sure why the pickle operation in my example code seems to require three times as much memory as the array itself. 99% of the time, memory usage isn’t a problem, but it becomes a real issue when you have hundreds of megabytes of data in a single array. Fortunately, HDF exists to efficiently handle large binary data sets, and the Pytables package makes it easy to access in a very Pythonic way.

My article published in Technologies for Worship Magazine

December 14th, 2009 Posted in Stage Lighting | No Comments »

Head over to Technologies for Worship Magazine and check out my recently-published article about backlighting.  It’s my first publication in print (outside of technical journals).  I’m looking forward to writing more articles for a general audience!

First impression: ETC Element Console Review

December 8th, 2009 Posted in Stage Lighting | No Comments »

We recently got an ETC Element to replace an ETC Express 72/144 that had become unreliable due to a hardware problem.  I’d like to point out that ETC equipment is generally very durable, but this particular console only lasted about 8 years due to abuse by the high-school students who use the console during the week.  The Express was a great console.  It had a hardware fader for each of the channels that we commonly used, and a bank of 20 submasters.  Recording a submaster was as simple as creating a look onstage with the channel faders, then pressing “Record” and hitting the submaster bump button.  It had one external LCD screen for output, which had a very simple character-based interface.  Just about anyone could learn to use the Express.

The Element is, in many ways, a much more sophisticated console.  While the Express didn’t have any real capability to run moving lights, the Element has many features to facilitate the use of moving lights and programmable LED fixtures.  It supports two external LCD screens with a full graphical interface, a keyboard, and a mouse. Unfortunately, the Element has lost all of the simplicity that made the Express so successful in its target market.  Let’s start with the physical design.

The greatest disadvantage of the Element, especially for untrained users, is that all of the forty faders (60 in some models) are multi-function.  With the bank switch in position 1, the faders represent channels 1-40.  Position 2 changes the fader function to channels 41-80, and Position 3 changes the faders to channels 81-120.  Position 4 changes the fader function to submasters.  While it’s nice to have 40 subs, it’s also extremely confusing to figure out which channel fader or submaster is currently controlling the output.  Here’s an example.

Let’ s say that you use faders 2, 17, and 31 to set a look on Bank 1.  If you switch to Bank 2, faders 2, 17, and 31 are still up, but they are not controlling anything.  Bring those faders down, and continue setting up the look on channels 41-80 using Bank 2.  Now switch back to Bank 1 to tweak the look on channels 1-40…but the faders are no longer up!  The LEDs on the bump buttons will flash to show which channel levels no longer match the fader positions.  To take control of these channels, bring the fader up until it matches the current channel level.  Then you can control the channel again.  Are you confused yet?  Did I mention that the faders have no numbers?  I find this slows me down when creating a look, and makes it harder to edit a look “on the fly.”  To top it off, the bank selector is a little plastic twist switch, and it’s hard to see which bank it’s pointing at.

With many consoles, you can work around the limited number of faders by adding an expansion wing with additional controls.  Unfortunately, the Element does not support expansions–you have to upgrade to the Ion to get that feature!  I’m not thrilled with the layout of the keys on the Element keyboard, either.  It’s difficult to guess where a key is going to be–they don’t seem to be grouped in any particular way.  Compare to the Hog 1000, in which the keys are grouped as “verbs” and “nouns” for command-line programming.  One final gripe about the hardware: it’s difficult to run my finger down the row of bump buttons to  find a particular light–this was easy to do on the old Express.

So, after all my complaining, I want to leave you with a great feature.  I love the “Exclusive” submaster setting that prevents the output from a sub from being recorded in subsequent cues and submasters.  I use this all the time to keep some lights up onstage or in the house without having them recorded into cues.  I’m sure there are more little gems built into the Express.

Here is my conclusion about the Element: it is NOT a replacement for the Express.  Do not expect this console to be a “drop-in” replacement for any traditional theater console.  The learning curve will be very steep for brand-new users, and slightly less steep for Express users.  In the end, the Element will be far more capable than the Express–but that’s not the target market for the Express.  If you are buying a console for a school, small church, or community theater where the users are generally inexperienced, there are probably better choices than the Element.

LDI 2009 Report

November 22nd, 2009 Posted in Stage Lighting | No Comments »

I attended LDI (Live Design International) last Friday in Orlando, FL.  This was my second time at the show (read my report from 2007).  Overall, I’d say this year’s show was scaled back from 2007, which is not surprising considering how the economic recession has affected the entertainment industry.  There were fewer booths this year, and they weren’t as fancy.  A few manufacturers, such as Chauvet and Coemar, brought pretty spectacular booths.  Other companies brought large booths (such as ETC and Martin), but they were not as flashy.  I’m not sure if Friday is normally a slow day for this conference, but there wasn’t much of a crowd.  I didn’t have to wait in line to see anything.

In 2007, I went to LDI to try out control consoles to replace our Hog 1000.  That plan fell through when the economy tanked, so this year I was just looking around.  Apparently a lot of people were, because the sales people were rather aggressive about jumping onto any apparent interest in their product.  One girl tried to scan my badge as I walked by–I don’t even know what she was selling!  Actually, that’s a common occurrence at LDI–it’s hard to tell what the actual product is in the booth.  All the larger booths seemed to consist of truss, truss warmers, banners, drapes, moving lights, and video.  I often had to stop and look for a minute to figure out which one was actually their product!

I will post some photos when I get a chance.

Building and linking to a shared Fortran library

October 26th, 2009 Posted in Fortran, Linux, Scientific computing | No Comments »

I’m using GNU Fortran (gfortran) to build several shared libraries, and then dynamically linking to them from a Fortran program.  The process is a little different than what I’m used to for C libraries, so I thought I’d explain it.  Unlike C, there is no need to #include header files when compiling code that relies on functions defined in an external library.  Likewise, there is no need to use -l or -L linking flags to tell the linker about s hared libraries (at least when they’re in the same directory).  In fact, the whole process requires a lot less command-line options than I had expected.

Building the Fortran shared libraries

I want to test a library called my_library.  However, this library relies upon another library, which is called cfdrc_user_access.  I want to compile a unit-test program that calls my library and makes sure it is working correctly.

gfortran -shared -fPIC -o cfdrc_user_access.so cfdrc_user_access.f90
gfortran -shared -fPIC -o cfdrc_user.so my_library.f90

Linking the shared libraries with a Fortran program

First, compile the program to an object file (.o), using the -c flag as shown on the first line.  Then use gfortran to link the object file with the shared libraries created in the previous step:

gfortran -c blocking.f90
gfortran -o blocking blocking.o cfdrc_user.so cfdrc_user_access.so

Remember, unless the shared libraries are in a directory that’s already on your  dynamic linker path, you  will probably need to modify your LD_LIBRARY_PATH in order to run the program.

export LD_LIBRARY_PATH=/home/yourname/current_working_directory/
./blocking

That worked for me–hopefully it helps you.

X.org configuration for a Gentoo guest on VMWare Fusion

October 21st, 2009 Posted in Linux, Mac | No Comments »

Here is an xorg.conf for a Gentoo guest running in VMWare Fusion on MacOS 10.6 (Snow Leopard).  It’s surprisingly simple to set up X when the hardware is fake ;)  Gentoo runs great on a Macbook Pro 13″.  I was concerned at first about the lack of 3D acceleration, but I tried a few 3D applications and found the performance to be acceptable.  Blender ran really smoothly with a simple model (once I found out that command-click is equivalent to middle click).  Rotating a field of several hundred spheres in Paraview was a little slow, but still very acceptable.

I also bought a Logitech wireless mouse, and found that the middle button works just as you’d expect (paste) in Gentoo running in Fusion.  I didn’t have to install the Logitech drivers in OS X.

Kernel configuration for a Gentoo guest in VMWare Fusion

October 9th, 2009 Posted in Linux, Mac | No Comments »

I recently installed Gentoo Linux (amd64) as a guest on my Mac (OS 10.6 Snow Leopard) using VMWare Fusion.  I thought I’d post the kernel config that I am using, since I didn’t find any out there that I trusted.  If you can use this as a starting point, then it should save you some time and trouble.  It’s a pretty minimal configuration–I think I removed all the extra drivers and stuff.  You could lean it out a little more by removing the audio and a few other extras that I thought I might use.  Let me know if you have any trouble with it.

Gentoo amd64 kernel config for VMWare Fusion

So far I am very pleased with its performance.  My only disappointment is that VMWare doesn’t support Linux graphics hardware acceleration.

f2py: binding Fortran and Python

September 23rd, 2009 Posted in Fortran, Python, Software development | No Comments »

I  have recently started using f2py to call Fortran from Python.  I have found this useful for two reasons: speeding up Python scripts by calling compiled Fortran code, and using Python as a unit testing framework for Fortran modules.   Unfortunately, the documentation for f2py is rather sparse, and may not be completely up to date.   In this note, I will hopefully prevent you from wasting a lot of time figuring out how to pass array arguments, and return array results.

Array arguments

Passing array arguments is a critical when working with numerical algorithms.  F2py handles this well, but it is difficult to figure it out how to do it correctly.  Here is a  simple example with some Fortran 90 routines:

module test

contains

subroutine foo (a)
    implicit none

    integer, intent(in) :: a
    print*, "Hello from Fortran!"
    print*, "a=",a
end subroutine foo

function bar (len_a, a)
    implicit none
    integer, intent(in) :: len_a
    real, dimension(len_a), intent(in) :: a

    real, dimension(len_a) :: bar
    !f2py depend(len_a) a, bar

    integer :: i
    real, dimension(len_a) :: b

    do i=1,len_a
        b(i) = 2.0*a(i)
    end do

    bar = b
end function bar

subroutine sub (a, len_a, a_out)
    implicit none

    real, dimension(len_a), intent(in) :: a
    integer, intent(in) :: len_a
    real, dimension(len_a), intent(out) :: a_out

    integer :: i

    do i=1,len_a
        a_out(i) = 2.0*a(i)
    end do

end subroutine sub

end module test

Function “foo” is quite straightforward. Function “bar” is a little more complex, because it accepts an array argument and returns an array. If you’re new to Fortran, it will seem strange to pass the length of the array along with the array, but you need that information to declare the output array. f2py needs to know that the input array a and the output array bar both depend on the argument len_a.  The special comment line

!f2py depend(len_a) a, bar

is mandatory!  It tells f2py that a depends on len_a.  If you omit this comment or the corresponding one for, the function will not work correctly.  You will get strange errors like

ValueError: failed to create intent(cache|hide)|optional array-- must have defined
dimensions but got (0,)

Fortran subroutines

Now look at the subroutine called sub.  On the Fortran side, it has three arguments: two inputs and an output.  However, Python only has functions,  and all non-array arguments are passed by value.  How do you reconcile this?  When the Fortran subroutine is called from Python, the intent(out) variables are returned as a function result, or a tuple of results if there are more than one.  Compare the Fortran code above with the Python call below to see what I mean.

Building

This code can be compiled using the “fast and smart” method described in the docs:

f2py -m -c hello hello.f90

Here is the Python code that calls the Fortran routines:

#!/usr/bin/env python

import hello
from numpy import *
a = arange(0.0, 10.0, 1.0)
len_a = len(a)

print "foo:"
hello.test.foo(len_a)

print "bar:"
a_out = hello.test.bar(len_a, a)
print a_out

print "sub:"
a_out = hello.test.sub(a, len_a)
print a_out

You might also want to check out this rather complicated f2py example.

Profiling memory usage of Python code

April 17th, 2009 Posted in Linux, Python, Scientific computing, Software development | 2 Comments »

In a previous post, I explained how to use the Python profiler.  The profile is great for finding out which parts of the code run the slowest, or are called most often.  However, the profiler doesn’t give any information about how much RAM is being consumed, or where it’s being consumed.  If your program needs so much memory that it starts swapping to disk, its speed can be reduced by orders of magnitude.  On the positive side, your code may run much faster if it fits entirely in the processor cache.  In this post, I will introduce two tools that can help you understand the RAM usage of your Python code.

Read the rest of this entry »

How to make two mice work with xwindows (x.org)

April 10th, 2009 Posted in Linux | No Comments »

It’s a real pain to surf the Web when the batteries die in your wireless mouse or trackball.  I use some old rechargeables that are no longer fit for digital camera service, so this happens to me fairly often.   My batteries just died again, so I had to figure out how to configure X.org to use a backup mouse.  It’s actually not hard to have an old 3-button mouse plugged into the PS/2 port and tell X.org to use it as a backup.  First, define two input devices:

Section "InputDevice"
	Identifier  "Mouse0"
	Driver      "mouse"
	Option	    "Protocol" "auto"
	Option	    "Device" "/dev/input/mouse0"
	Option      "Buttons" "7"
	Option	    "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
	Identifier  "BackupMouse"
	Driver      "mouse"
	Option	    "Protocol" "auto"
	Option	    "Device" "/dev/input/mouse1"
	Option      "Buttons" "5"
	Option	    "ZAxisMapping" "4 5"
EndSection

The first mouse is my wireless Logitech trackball (which is currently a fancy paperweight while the batteries charge).  The second mouse is an old Microsoft two-button mouse with a scroll wheel that functions as buttons 3,4, and 5.  The trick is to define the ServerLayout as follows:

Section "ServerLayout"
	Identifier     "X.org Configured"
	Screen      0  "Screen0" 0 0
	InputDevice    "Mouse0" "CorePointer"
	InputDevice    "BackupMouse" "SendCoreEvents"
	InputDevice    "Keyboard0" "CoreKeyboard"
EndSection

Note that there can only be ONE “CorePointer” so set that to be your primary mouse.  Set the backup mouse to “SendCoreEvents” so that it receives mouse events, but X.org won’t freak out if it’s not present.

Lookup tables and spline fitting in Python

April 8th, 2009 Posted in Python, Scientific computing, Software development | No Comments »

Lookup tables and spline fitting are widely used by scientific programmers.  A particular function may not have an analytic solution–in other words, it can’t be expressed as an equation of elementary functions.  This might happen if the function were empirically determined from experimental data, or if the equation can’t be algebraically solved for one variable.  If the analytical function is available, but takes a long time to evaluate, a lookup table or spline approximation can be considerably faster.  In a previous post, I showed how to use the function interp1d from scipy.interpolate as a lookup table.  In a later post, I showed that interp1d is actually rather slow, and scipy.interpolate.UnivariateSpline is much faster.  Now, I will show some benchmark results, and explain a potential pitfall when using UnivariateSpline.

The two functions I will use for this demonstration are not very complicated.  Each function is a piecewise approximation of a more complex function.  Here is one of the functions.  For small z, the function uses one approximation, and for large z, it uses a far-field approximation.  For intermediate values, a polynomial is used to smoothly match the two solutions.

def M_wall_par(z):
    h = z - 1.0
    if h < 0.15:
        return 1./(0.9588+8./15.*log(1./h) + 64./375.*h*log(1./h))
    elif h < 0.6:
        return 0.33672 + 1.29936*h - 1.75005*h**2 + 0.89186*h**3
    else:
        return 1.-9./(16.*z)+0.125/z**3

I won’t go into the other function, which is similar.  Here is a code fragment showing how to create a lookup table for this function:

M_par_spline = UnivariateSpline(z, M_parallel, s=0)
M_par_fit = empty(len(z), dtype=float)
for i in range(len(z)):
    M_perp_fit[i] = M_perp_spline(z[i])
    M_par_fit[i] = M_par_spline(z[i])

The keyword parameter s=0 is critical, because it forces UnivariateSpline to act as a lookup table, with linear interpolation between points.  If s is set to anything higher than zero, UnivariateSpline fits a spline to the data, which can introduce fitting errors.  If you are going to use a spline fit, look at a plot of the fitted curve to make sure it’s valid across the entire domain of interest.  I benchmarked the speed of the 100,000 function calls and compared it to the same number of calls to the interpolating object. Here are the results:

  • Function calls: 1.8789 seconds average, standard deviation of 0.005
  • UnivariateSpline calls: 0.9662 seconds average, standard deviation of 0.005
  • interp1d calls: 24.8 seconds average, standard deviation of 0.1

The interpolation runs almost twice as fast as the function call, but interp1d is an order of magnitude slower!  This is because UnivariateSpline wraps a function from FITPACK (an optimized, compiled Fortran library), while interp1d is a pure Python function.  My benchmark excluded overhead such as creating the UnivariateSpline object, creating Numpy arrays, etc.  Here is a plot showing that the interpolated curve tracks perfectly with the function data (note there are two different functions plotted on this graph).

1D interpolation results (no smoothing)

1D interpolation results (no smoothing)

In conclusion, the UnivariateSpline class from Scipy is the way to go for creating fast lookup tables, if you set the keyword parameter s=0.  If you set s to a value greater than zero, the same class will fit a spline of the specified order, although you’d better check to make sure it is actually a valid approximation of your data.

3D Plotting Software for Python::Part 1::PyX

March 20th, 2009 Posted in Python, Scientific computing, Software development | 3 Comments »

There are lots of good open-source tools that you can use to make high-resolution, publication-quality 2D plots.  Personally, I like to use Python, numpy, and matplotlib.  Unfortunately, it is much harder to find a good tools to make 3D plots.  Older versions of matplotlib had rudimentary 3D support, but this was removed in version 0.98.  In this post, I will review a Python 3D plotting library called PyX.

Preparing the data

Figuring out how to store the data to be plotted was actually the hardest part of learning to use PyX.  The data format for 3D plots is not well documented.  PyX requires a list of (x,y,z) lists like this:

[
[x0  y0  z[0,0]]
[x0  y1  z[0,1]]
[x0  y2  z[0,2]]
...
[x1  y0  z[1,0]]
[x1  y1  z[1,1]]
[x1  y2  z[1,2]]
...
]

Unfortunately, PyX is designed to work with lists rather than Numpy arrays, which can be a real disadvantage for large datasets.  Here is a code snippet that shows how I transformed some Numpy data into a list of lists that can be plotted in PyX.  The 1D Numpy arrays z_values and theta_average contain x and y values, and the z values are stored in the 2D Numpy array called B_h_theta.


# Reshape data for plotting with pyx
values = []
for i in range(B_h_theta.shape[0]):
    for j in range(B_h_theta.shape[1]):
        values.append([z_values[i], theta_average[j], B_h_theta[i][j]])

Plotting

First create a graph.data.points object.  The one required argument is a list of values to plot, in the form described above.  There are other types of graph.data objects for reading data from text files, plotting functions, etc.  Then, create a graph.graphxyz object which defines the visual appearance of the graph.  Call the plot() method of graph.graphxyz and pass in the data object as an argument.  Call the dodata() to finish the plot, and then call writeEPSfile() or writePDFfile() to write the plot to a file on disk.  There is no way (that I found) to just show the plot in a graphical window.


from pyx import *
v = graph.data.points(values, title="B(h,theta)", addlinenumbers=0, x=0, y=1, z=2)
g = graph.graphxyz(size=4, projector=graph.graphxyz.central(10, 150, 30), \
        x=graph.axis.linear(title="z"), y=graph.axis.linear(title=r'$\theta$'),\
        z=graph.axis.linear(title="B(h,theta)"), x2=None, y2=None, z2=None)
g.plot(v, [graph.style.grid()])
g.dodata()
g.writeEPSfile("phi_plot")

Results

3D Plot from PyX

3D Plot from PyX

Advantages

  • Clean, object-oriented interface
  • Produces nice surface and wireframe plots in .pdf and .eps formats
  • Appears to be flexible and full-featured

Disadvantages

  • Does not seem to be actively developed or maintained (although it is very useful in its present state)
  • Does not integrate easily with Numpy–all data needs to be stored in text files or Python lists
  • No GUI output–writes images straight to .pdf or .eps files
  • TeX labels have poor resolution (at least on my system)

The Python configparser: a way to read simple data files

February 3rd, 2009 Posted in Linux, Software development | 1 Comment »

My simulation library, which is written mostly in Python, needs a lot of data and parameters in order to run. In some cases, I just hard-code the values in the script that calls the library, and in other cases I load a pickle file containing a Python objext. What if I want to read in data or configuration parameters from a human-editable text file? If the information is extensive or complex, XML might be a good choice, but XML is overkill for simple configuration or data files. Fortunately, a standard Python library called ConfigParser has already defined a configuration file format, and provides methods to interact with such files. Here is a sample of the format used in a config file:

[0.01]         ; this is a comment
1.0: 27.221
2.0: 4.214
5.0: 0.6973
10.0: 0.1523
30.0: -0.00120
[0.05]
1.0: 325.586
2.0: 30.858
5.0: 2.376
10.0: 0.458
30.0: -0.0159

Reading such a file is easy:


parser = ConfigParser.SafeConfigParser()
parser.read("filename")
section_names = parser.sections() # returns a list of strings of section names
items_in_section = parser.items(section_names[i]) # get a list of tuples from a section
pair = parser.items(section_names[i])[j]  # get one key/value pair
key = pair[0]
value = pair[1]     # values are always strings--remember to convert if needed

There are a couple of interesting “features” you should know about. First, if you open a file using the parser.read(”filename”) method, no exception will be raised if the file does not exist! If reading the file is not optional, first open the file with the fp=open(”filename”) and then call parser.readfp(fp). The second caveat is that the sections or key/value pairs returned from the parser are not necessarily returned in the order that they exist in the file. If you are trying to read data into arrays, be aware that the arrays may not be in order, so you may need to sort them.

Why reinvent the wheel when trying to do simple things? Unless you are doing it as an exercise, just look through the Python library and you’ll probably find that someone has already done the work for you.

Updated Python class for writing Paraview (VTK) (.vtu) files

January 23rd, 2009 Posted in Python, Scientific computing | No Comments »

I have released a new version of my Python class that generates VTK data files in the .vtu format, which is compatible with Paraview and other VTK applications.  If you have downloaded the old one, please get the latest version, which incorporates some bug fixes and has been more thoroughly tested.

How to put formatted, highlighted code in a Wordpress post

January 17th, 2009 Posted in Software development, Web design | 1 Comment »

I found two complementary plugins that enable me to put highlighted formatted code in a Wordpress page or post. Here’s an example of what they do:

# Plot flux at continuum boundary
pylab.figure()
pylab.hold(True)

pylab.plot(nd_times, Jl_BD, 'b-', label="Flux from BD simulation")
pylab.plot(nd_times, Jl_BD+Jl_BD_std, 'b.', label="Flux from BD simulation")
pylab.plot(nd_times, Jl_BD-Jl_BD_std, 'b.', label="Flux from BD simulation")

Test successful! Note that Javascript must be enabled to see the highlighting. Get the plugins here:

Deploying Python applications on Windows

January 15th, 2009 Posted in Linux, Python, Software development | No Comments »

Writing applications in Python on a Linux system is almost too easy.  Deploying Python apps on other Linux systems is not hard, because most Linux systems already have Python, with its core libraries and tools, installed.  Most Linux systems also have package managers that make it easy to find and install required components.  But, what happens when your co-workers who use Windows need to use your app?  When you tell them to “go to the command line and…” you’ve pretty much lost them at “command line.” How do you package Python in a way that’s easy for a Windows user to install?

Here is a process that worked for me:

  • Python distutils: You need this to deploy on all types of systems, so it’s best to go ahead and learn it now.  If you’re going to deploy on Windows, make a source distribution in a .zip archive.
  • py2exe is an amazing tool that will bundle your Python application, and all its dependencies, in one place.  Copy the .zip archive created by distutils to a Windows computer.  Install Python and all the required libraries on the Windows PC.  Test the app and make sure it works.  Then, use the py2exe tutorial and the application notes to see how to create a self-contained directory that contains a Windows executable.  You can just zip up this directory and distribute it, and all the user has to do is unzip the archive and double-click the right icon.
  • Optional–create an installer with a tool like NSIS.  I haven’t actually done this, because my app isn’t that complex.  It looks like NSIS will allow you to easily create a professional-looking, wizard-based installer for Windows.  I’d be interested to know how well it works.

Scipy.integrate ODEPACK import error solved!

January 14th, 2009 Posted in Linux, Python, Scientific computing | No Comments »

I recently found a solution to a problem that had been vexing me for about a year. In order to successfully import anything from scipy.integrate, I had edit the file scipy/integrate/__init__.py and comment out the line

from odepack import *

If not, I would get various import errors such as

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/scipy/integrate/__init__.py", line 10,
in <module>
    from odepack import *
  File "/usr/lib/python2.5/site-packages/scipy/integrate/odepack.py", line 7,
in <module>
    import _odepack
ImportError: /usr/lib/python2.5/site-packages/scipy/integrate/_odepack.so:
undefined symbol: daxpy_

I found the solution in Gentoo bug 251165. The problem only occurs when Scipy is built with non-reference versions of BLAS and CBLAS. The solution is to install the reference implementation of BLAS and CBLAS, rebuild Scipy, and then use whichever implementation of BLAS or CBLAS you want. I don’t know why this happens, and I don’t know if it affects distributions other than Gentoo.

By the way, Gentoo has a really useful system tool called eselect, which has various modules that are used to choose between different versions or implementations of tools on your system (BLAS, Java virtual machine, kernel sources, OpenGL, etc.)

Optimizing Python code for fast math

January 9th, 2009 Posted in Python, Scientific computing, Software development | 1 Comment »

I spent some time today profiling a Brownian dynamics simulation written in Python to see how I could make it faster before starting some long runs on a Linux cluster. In the sections below, I have attempted to quantify the speed improvements due to various changes. Keep in mind that the actual speed improvement in your code will vary, depending on where the actual bottlenecks occur. See my post about profiling Python code. Another caveat: I am running Python 2.4.4 because it’s installed on our cluster.

Use scipy.interpolate.UnivariateSpline instead of interp1d

In a previous post I described how to use interp1d from scipy to implement a fast lookup table. Well, it turns out that is actually a slow lookup table. The function scipy.interpolate.UnivariateSpline() is much faster because it wraps a compiled library (FITPACK), and it’s pretty much a drop-in replacement for interp1d. The one thing it can’t do is return a default value for out-of-range x values.

For scalar math, Python functions are faster than numpy functions

In one method, I realized that I had accidentally imported the sqrt() function from numpy, but I was only using it to compute the square root of scalar floating point numbers. Finding roots is computationally intensive, and from previous experience I suspected that numpy might be slower than the build-in Python math functions, so I knew this might be an easy place to find some speed. To quantify how much speed, I wrote this little test script:

from math import sqrt
from time import time
num_repeats = 10000000

start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "math.sqrt() Elapsed time: " + str(time()-start_time)

from numpy import sqrt
start_time = time()
for i in range(num_repeats):
    sqrt(1012.1234215)
print "numpy.sqrt() Elapsed time: " + str(time()-start_time)

It turns out that the sqrt() function from the standard Python math module is about seven times faster than the corresponding sqrt() function from numpy. As a side note, I learned that it is slightly faster (5-10%) to use the form “from math import sqrt” than it is to use “import math” and “math.sqrt()”. Because of this, you may not want to use “from numpy import *” if you are really concerned about speed.

Use Numpy for generating random numbers

Another significant improvement in speed came when I switched from the gauss() function of the built-in random library to the normal() function from numpy.random. Using a similar script to the one above, I found that the numpy version is about four times faster than random.gauss(). However, generating a random number from a uniform distribution (random.uniform) is very fast. In the source code of random, you will note that drawing one random number from a Gaussian distribution requires computing a square root, a log, a cosine, and a sine in Python. Numpy is faster because it does the math in C.

Function calls are expensive

A minor, but still useful, speed improvement came when I eliminated some unnecessary calls to the built-in functions max(), min(), and abs(). max(float1, float2) can be replaced with

if num1 > num2:
    a = num1
else:
    a = num2

which is about 35% faster. min() and abs() can be similarly replaced with if/then/else statements.

Eliminate unnecessary import statements

When you are wrapping up development of a module, go back and delete all the “from some_module import some_unnecessary_function” statements. This is marginally helpful with speed, but it’s a good programming practice.