ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


Numerically Python

Numerical Python Modules

08/09/2000

Introduction

This month we look under the hood of Numerical Python (NumPy) and explore a more detailed list of features and capabilities. The heart of Numerical Python is the multiarray datatype we have explored so far, but the NumPy distribution is really a package of modules that enable the programmer to make effective use of the multiarray concept. NumPy includes support for linear algebra functions, Fast Fourier Transforms, and random number generation, and provides help for users familiar with the commercial matrix manipulation package MATLAB. Let's look at the basics of NumPy's add-on modules in preparation for getting our hands really dirty next month. It's the accessories that make NumPy fly.

Under the hood

The term broadcasting in NumPy is used to refer to smaller dimensioned multiarrays being either replicated or extended in a natural way to work with higher dimensioned multiarrays. While we'll leave the details of broadcasting to next month, understanding this concept is crucial to pushing NumPy to its full potential. As a preface to next month's discussion of the complex rules of broadcasting, some basics of multiarray manipulation need exploration -- specifically indexing, slicing, and ellipses.

With a multiarray in hand, it turns out there are flexible rules for indexing and slicing.

Given a matrix of integers: Matrix 1

let's see what we can do with the array as entered.

>>> a = array(((0,1,2,3,4,5,6,7,8,9),
               (9,8,7,6,5,4,3,2,1,0)))
>>> print a
[[0 1 2 3 4 5 6 7 8 9]
 [9 8 7 6 5 4 3 2 1 0]]

Indexing
There are two styles to indexing, either [][] or [,]. (While we'll limit the discussion to only two dimensions, the concepts readily extend to multiarrays of higher orders.)

>>> print a[1][1]
8
>>> print a[1,1]
8

The difference between the two indexing styles can be characterized as cascaded extraction versus direct. Think of the method a[1][1] as (a[1])[1]. The first operation (a[1]) returns the entire second row of the a multiarray; the second [1] then indexes into the second element of the extracted row. The extraction "cascades." Think of the second method, a[1,1] as a[(1,1)]. It uses the tuple as an array address and retrieves the element directly. Of the two methods, the better one to use is the second ([,]). It extends to the more complex cases as expected, where the first method can lead to unexpected results.

Slicing
Following is an example of slicing. In this case the snippet a[:,2] extracts the second column, all rows. Remember, for a two-dimensional matrix the indexes are [rows,cols]. The: operator acts as a "slice" operator. It can be used in several forms: alone [:], meaning select everything; or with end points [start:stop], meaning select everything from start up to but not including stop; or [start:stop:step], meaning select everything from start up to but not including stop with a given increment (step) value.

>>> print a[:,2]
[2 7]

The next example shows that [][] is not the same as [,] when slicing is involved. This snippet attempts to retrieve the second row of the entire a multiarray.

>>> print a[:][2]
Traceback (innermost last):
  File "<stdin>", line 1, in ?
IndexError: index out of bounds

Why? Well, think of evaluating a[:][2] as (a[:])[2]. The a[:] part just returns a; now doing a[2] will actually attempt to return the third (counting from zero) row of a - which is nonexistent! Thus the exception.

Using more complex forms of the slice operator, the following example gives the (1,3,5) values of the 1st row

>>> print a[0,1:7:2]
[1 3 5]

Ellipses
When dealing with larger multi-dimensional arrays, ellipses (...) can be used to cover an indeterminate number of slices. Ellipses are essentially a shorthand, filling in for unspecified dimensions. Only the first, left-most ellipsis can be expanded. The ellipses will expand to the number of unspecified dimensions that remain.

As an example, if a is an array of four or more dimensions, then the following code snippet is true:

>>> a[5,:,:,1] == a[5,...,1]

This shows the use of ... as a substitute for :,:, (select all) slices of the intermediate dimensions.

Go ahead and create some test arrays and try your hand at indexing and slicing. More examples can be found in the NumPy documentation -- if you need further guidance. Getting the hang of indexing and slicing is a good start towards more powerful NumPy usage.

Modules

With an understanding of basic manipulation of the multiarray in hand, let's take a quick tour of four modules that are included with the default NumPy package.

Linear algebra
The LinearAlgebra module provides a workhorse package bolstered by the LAPACK library, a freely available library of fast matrix routines written in Fortran77. This gives the linear algebra package its speed. We used this package in Numerically Python I to perform both matrix multiplications as well as find the multiplicative inverse of a matrix.

This module also includes find functions for computing the eigen and singular value decompositions, functions that break matrices down into fundamental components and give insight into their structure. The structure of a matrix (the relationships between numbers stored within the matrix) is as important as the actual values the matrix contains. In this module you will also find functions for computing the determinant of a matrix as well as solving the least squares problem. The functions in this module are used to perform most, if not all, of the math associated with solving matrix problems found in science, business, and engineering.

Fast Fourier
The Fourier transform is a mathematical operation used by scientists and engineers to convert between frequency and time domain representations of a numerical sequence. If a sequence of samples represents measurements made in time, the Fourier transform can be used to compute the frequency (or cycles) of the measured events.

The Fast Fourier Transform (FFT) is an optimized way to compute the Fourier transform that was realized by Cooley and Tukey in the mid-1960s. Some will argue that the realization of this algorithm was the single most important event in making processing of signal waveforms by computers possible. Digital Signal Processing (DSP) is the heart of many industrial and consumer devices we rely on today -- most importantly, the modem inside your computer that allows you to read this article!

The FFT module provides access to an optimized library for efficiently computing such transforms. To gain access to the module, it needs to be imported:

>>>from FFT import *

Functions supported include the forward and inverse FFT for real and complex number sequences and forward FFTs for two-dimensional data sets (useful for image processing).

In general, the Fourier transform can be calculated on a data sequence of any length. However, to take advantage of the FFT algorithm, the data needs to be a power of 2 in length (2,4,8,16,32..).

The workhorse functions are
fft(data,n=None,axis=-1), which performs the forward FFT and
inverse_fft(data,n=None,axis=-1), which performs the inverse FFT.

Remember back to Numerically Python II where we computed the sum of two sinusoids to generate an interesting example for the plot() function using DISLIN? Let's use similar code to see an example of what the FFT routines can do for us.

>>> x=arange(128.0)
>>> y=sin(x/3)+cos(x/5)
>>> plot(x,y)

generates the plot as before. If we consider the x axis to represent time, then taking the FFT will let us measure the frequencies involved.

>Resulting plot.

Resulting plot.

The following snippet generated the next plot. Notice that there are two peaks shown. Each peak represents the amplitude of each of the two sinusoids added together. In this case the answer is known, but the same general technique can be used to see the amplitudes and frequencies present in measured (not created) data. This forms a particularly powerful tool for analysis.

>>> plot(x[0:64],abs(fft(y)[0:64]))
Resulting frequency plot.

Resulting frequency plot.

Take another look at the last line of code. Slicing was used to extract and plot only the first 64 data samples. Because the FFT is symmetric for data represented by real (as opposed to complex) samples, the last half of the FFT is a mirror to the first. We only need to plot the first half.

Random numbers
The RandomArray module provides support for generating multiarrays of random numbers. Random numbers find their way into many simulation algorithms representing the element of chance or noisy measurement data. Since a computer is generating the numbers, they are not truly "random" but rather have statistical properties simulating truly random numbers.

Like most random number packages, the RandomArray module allows the user to pick a seed value. The computer initializes the random number generator with the seed value. Picking a different seed each time an algorithm uses a random number increases the random nature of things. If you are looking for an identical operation between each run simply fix the seed value to the same value each run.

To use the RandomArray module, import it as follows:

>>> from RandomArray import *.

Then set the seed value like this:

>>>seed(x=0,y=0)

If no arguments are specified, values will be substituted based on the system clock (not a bad choice, by the way). Typically seed() is only called at the start of a session (or when you want to restart the generated sequence).

Random numbers are specified by their distribution. Different distributions are used to simulate different natural processes. The two most common distributions are the uniform distribution, which can be used to model equally random choices between two boundary conditions (pick any number between 1 and 10) and the normal or Gaussian distribution, which is used to model noise in natural processes (also the so-called "bell curve" from those olden days of high school).

The RandomArray module will compute sequences with either of these distributions as well as others (binomial, Poisson, chi-square, F, gamma, beta, and exponential).

All of the distribution functions accept a shape tuple specifying the desired output matrix size. In addition, some functions also require information about the desired distribution. The following code computes a 3-by-3 matrix of normally distributed random numbers with a mean of zero and a variance of 1.0.

>>> a = normal(0.0,1.0,(3,3))
>>> a
array([[ 2.27323961, -0.67850655,  0.39735967],
       [ 1.22253323,  1.09771109,  1.43763125],
       [ 0.53184462,  0.10846746,  1.87741685]])
>>>

Mlab
The Mlab (or MATLAB) module provides convenience routines for users familiar with the commercial matrix manipulation package from MathWorks -- MATLAB). The module includes routines for array initializations (eye, zeros, ones), shorthand for the eigen decomposition and singular value functions from the LinearAlgebra module (eig and svd) as well as many functions used in signal processing (sinc, cov, hamming and barlett).

End game

While the muscle of multiarray manipulation is provided by the fundamental NumPy routines, it is the add-ons that put the shiny chrome face on the distribution. The modules introduced here are only the standard package. In addition to these, users of NumPy have contributed even more modules that you can find at python.org and Travis Oliphant's Python Pages.

As you dig deeper into NumPy, its power and usefulness becomes more evident. Go ahead and try some of the examples! The NumPy documentation contains additional examples for these modules and full definitions of their functions.

Copyright © 2009 O'Reilly Media, Inc.