Analyzing Sound Files
Pages: 1, 2
Natural sounds are comprised of many frequencies with varying amplitudes. It is the complex interaction of both frequency and amplitude varied across time that makes sound pleasant to hear. But no matter how complex the sound, the FFT can be used to decompose it and to estimate the frequency content. Performing this analysis on samples of data through the length of a recording, we can get a measure of the frequency content of a signal over time. The resulting plot is known as a sonogram or a spectrogram. The plot shows the relationship between time (x-axis), frequency (y-axis), and power (color -- red is high power, blue low power).
The following plot shows the analysis of a file containing chirped whistles (I was doing my bird interpretation). The green and blue colors show the lower powered frequencies. The red color shows the variable frequency of the chirps.
Figure 2. The analysis of a file containing chirped whistles.
The code for the program follows. You can view the complete program from here. I'll explain it as we go along:
Basic imports from Numeric import * from MLab import * from FFT import * from dislin import * import wave import sys import struct
We use the
open method from the
to open a wave file and assign it to a file pointer,
Using information about the file, we then calculate some basic constants
to use in our analysis of the file. The length of the
passed in as a command line argument and should always be a power of
# open the wave file fp = wave.open(sys.argv,"rb") sample_rate = fp.getframerate() total_num_samps = fp.getnframes() fft_length = int(sys.argv) num_fft = (total_num_samps / fft_length ) - 2
We are going to need a matrix to store temporary results. Making it here allows for more efficient operation later:
# create temporary working array temp = zeros((num_fft,fft_length),Float)
The data from the wave file is stored as packed numbers in a binary
form. We will read the entire file into the temporary array we just
created. As each section is read in, we unpack it into a floating-point
value. Notice that we are storing it into a two-dimensional array where
each row contains
# read in the data from the file for i in range(num_fft): tempb = fp.readframes(fft_length); temp[i,:] = array(struct.unpack("%dB"%(fft_length), \ tempb),Float) - 128.0 fp.close()
Although not necessary, the next step multiplies each row of the data matrix by a "window" function. The window is used to mitigate some numerical side effects of the FFT. Since we are multiplying a vector and a matrix, we are taking advantage of broadcasting here!
# Window the data temp = temp * hamming(fft_length)
We again make use of the
real_fft() function, which is
optimized for real (as opposed to complex) data. Notice that we are
taking the FFT of the entire array, which defaults to taking the
individual FFT of each row. The
10*log10 is again used to
convert the output into the logarithmic scale. The addition of the
1E-20) solves the problem of
being undefined. Rather than check each argument for 0, we just add a
small inconsequential value to guarantee that the 0 case never arises.
There is a lot of work going on here. This is where the muscle is
applied. The result is an array that contains the energy values found in
each of the frequencies across each of the rows (sections of the time
# Transform with the FFT, Return Power freq_pwr = 10*log10(1e-20+abs(real_fft(temp,fft_length)))
Now we use DISLIN for plotting! Most of the work is to
get the scales correct. Using the contour Quickplot function, we are
going to take a look at all of the rows of the array at the same time.
conshade function will plot rows against columns in a
color plot where the color relates to the magnitude of the element (in
our case, the energy in a particular frequency during a particular part of
# Plot the result n_out_pts = (fft_length / 2) + 1 y_axis = 0.5*float(sample_rate) / n_out_pts * \ arange(n_out_pts) x_axis = (total_num_samps / float(sample_rate)) / \ num_fft * arange(num_fft) setvar('X',"Time (sec)") setvar('Y',"Frequency (Hertz)") conshade(freq_pwr,x_axis,y_axis) disfin()
That's it! The whole function is only 40 lines of code.
Running the tool
Either type in the commands or download the full version -- I called
pysono.py. Once in hand, the tool is simple to try
out. First use the sound recorder on your PC to generate a sound file
(the tool works best for files less than 60 seconds long). Then at the
command prompt type:
pysono.py <filename> <fft_size>
Use the name of the wave file you are analyzing for
filename and the length of the FFT to use for
fft_size. Remember it must be a
power of two. Give it a try!
Using the tool, I analyzed the first 60 seconds of a recording of The Very Thought of You sung by Natalie Cole. The loosely parallel lines show that her voice has several frequencies present -- these are harmonics and are what give the sound quality.
Figure 3. the first 60 seconds of a recording of The Very Thought of You sung by Natalie Cole.
Using techniques explored throughout this series of articles on Numeric Python, we have created a complete, useful application in only 40 lines of code. With the tool in hand, you can now analyze different wave files. Try saying the same words at different pitches (one in normal tone, the other in falsetto) -- do the sonograms look the same? Analyze the same file with varying sizes of FFTs. Can you see the difference in ability to measure frequencies? The mathematics behind the code comes from the field of DSP (digital signal processing). For more information search the web for "DSP". A good place to start is Digital Signal Processing Central.
Eric Hagemann specializes in fast algorithms for crunching numbers on all varieties of computers from embedded to mainframe.
Read more Numerically Python columns.
Discuss this article in the O'Reilly Network Python Forum.
Return to the Python DevCenter.