Lookup tables and spline fitting in Python

Lookup tables and spline fitting are widely used by scientific programmers.  A particular function may not have an analytic solution–in other words, it can’t be expressed as an equation of elementary functions.  This might happen if the function were empirically determined from experimental data, or if the equation can’t be algebraically solved for one variable.  If the analytical function is available, but takes a long time to evaluate, a lookup table or spline approximation can be considerably faster.  In a previous post, I showed how to use the function interp1d from scipy.interpolate as a lookup table.  In a later post, I showed that interp1d is actually rather slow, and scipy.interpolate.UnivariateSpline is much faster.  Now, I will show some benchmark results, and explain a potential pitfall when using UnivariateSpline.

The two functions I will use for this demonstration are not very complicated.  Each function is a piecewise approximation of a more complex function.  Here is one of the functions.  For small z, the function uses one approximation, and for large z, it uses a far-field approximation.  For intermediate values, a polynomial is used to smoothly match the two solutions.

[sourcecode language="Python"]def M_wall_par(z):
    h = z - 1.0
    if h < 0.15:
        return 1./(0.9588+8./15.*log(1./h) + 64./375.*h*log(1./h))
    elif h < 0.6:
        return 0.33672 + 1.29936*h - 1.75005*h**2 + 0.89186*h**3
    else:
        return 1.-9./(16.*z)+0.125/z**3[/sourcecode]

I won't go into the other function, which is similar.  Here is a code fragment showing how to create a lookup table for this function:

[sourcecode language="Python"]M_par_spline = UnivariateSpline(z, M_parallel, s=0)
M_par_fit = empty(len(z), dtype=float)
for i in range(len(z)):
    M_perp_fit[i] = M_perp_spline(z[i])
    M_par_fit[i] = M_par_spline(z[i])[/sourcecode]

The keyword parameter s=0 is critical, because it forces UnivariateSpline to act as a lookup table, with linear interpolation between points.  If s is set to anything higher than zero, UnivariateSpline fits a spline to the data, which can introduce fitting errors.  If you are going to use a spline fit, look at a plot of the fitted curve to make sure it's valid across the entire domain of interest.  I benchmarked the speed of the 100,000 function calls and compared it to the same number of calls to the interpolating object. Here are the results:

  • Function calls: 1.8789 seconds average, standard deviation of 0.005
  • UnivariateSpline calls: 0.9662 seconds average, standard deviation of 0.005
  • interp1d calls: 24.8 seconds average, standard deviation of 0.1

The interpolation runs almost twice as fast as the function call, but interp1d is an order of magnitude slower!  This is because UnivariateSpline wraps a function from FITPACK (an optimized, compiled Fortran library), while interp1d is a pure Python function.  My benchmark excluded overhead such as creating the UnivariateSpline object, creating Numpy arrays, etc.  Here is a plot showing that the interpolated curve tracks perfectly with the function data (note there are two different functions plotted on this graph).

1D interpolation results (no smoothing)

1D interpolation results (no smoothing)

In conclusion, the UnivariateSpline class from Scipy is the way to go for creating fast lookup tables, if you set the keyword parameter s=0.  If you set s to a value greater than zero, the same class will fit a spline of the specified order, although you'd better check to make sure it is actually a valid approximation of your data.

Leave a Reply