Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

QTJ Audio

by Chris Adamson

QuickTime Java can be the heart and soul of cross-platform video players and editors. As you will see in this article, QTJ is also well-suited to be the engine of audio-only applications, such as MP3 players. This article will develop an audio player, QTBebop, that displays song metadata, band levels, and current time, all of which help introduce the useful audio-related tools provided by QuickTime to the Java developer. We'll also look at QuickTime's "callbacks," which are critical to all kinds of QT apps.

Too Good, Too Bad

We tend to think of audio and video applications as separate realms — iTunes and WinAmp are one kind of application, while iMovie and RealPlayer are another — but this separation exists at the application layer, not the media framework layer. QuickTime treats sound the same way it treats video or any other kind of dynamic media. In fact, there's nothing special to opening and playing a sound file in QuickTime: you create an OpenMovieFile to reference a file in a supported format (MP3, AAC, WAV, etc.), hand that to Movie.fromFile(), and call Movie.start(). The qtj61-player code from the last article in this series will play audio files with no code changes. As far as QuickTime is concerned, imported audio files are just movies with a single track of audio media.

Given that, it's quite easy to write a bare-bones, GUI-less audio player. In fact, it seems like it should consist of simply opening a file, making a Movie from that, and starting the Movie. We could express this as the following code with only a single caveat ... it doesn't work:

try {
    //this does not work
    QTFile qtf = new QTFile (args[0]);
    OpenMovieFile omf = OpenMovieFile.asRead(qtf);
    Movie movie = Movie.fromFile (omf);
} catch (QTException qte) {

With Java 1.4 on Mac OS X, this returns immediately, without playing any music. On Windows 2000 and XP, it seems to play a few seconds and hang. Either way, chances are you're not happy with the result. What has happened?

The problem has to do with tasking, the arcane art of giving a QuickTime movie enough cycles to actually play itself. Typically, when we build a QTJ application with a GUI, we pick up the tasking calls automatically, and thus don't have to worry about, or even know about, the need to periodically call Movie.task(). In this case, we haven't picked up any automatic tasking calls, nor set up any of our own.

On Windows, the side effect is that after getting time to play a little bit of its buffer, the Movie is never given another chance to decode and play the audio. The Mac OS X case is a little stranger -- I believe what we're seeing is that our MP3 is handed off to a native library, not actually played by Java, so the main() method returns and the JVM, seeing no non-daemon threads running, decides to shut down.

In any case, we need to provide regular callbacks to the task() method to give our movie a chance to decode and play the data. Fortunately, QTJ provides a class called TaskAllMovies, which runs a thread that provides tasking callbacks to all active movies. So we can solve our problems on Mac and Windows by adding the two highlighted lines below after the Movie object is created:

try {
    //this version works
    QTFile qtf = new QTFile (args[0]);
    OpenMovieFile omf = OpenMovieFile.asRead(qtf);
    Movie movie = Movie.fromFile (omf);
    movie.setActive (true);  
} catch (QTException qte) {

Call Me, Call Me

At this point, when the selected audio is finished, the application will just sit around forever. We'd like it to do something a little more sensible, like terminating the app at the end of the song. A kludgy approach would be to spawn a thread to periodically poll the movie and see if the current time has reached the end.

A better approach is to register to be notified when the movie is finished playing, using one of the callbacks that QuickTime provides. We can provide a small piece of code and tell QTJ to call this code at the end of the movie.

In the included sample CloseOnCallbackAudio.java, we simply extend the simple player to register a callback that will be called when the movie (the audio) finishes playing. This registration is done with the callMeWhen() method:

callback = new ShutdownCallBack (movie);

The ShutdownCallBack is an inner class that extends QuickTime's ExtremesCallBack. In its constructor, we indicate what Movie we're interested in (specifically, the TimeBase of the movie), and provide flags to indicate on which events we want to be called:

public ShutdownCallBack (Movie m)
    throws QTException {
    super (m.getTimeBase(),

The callMeWhen() call does the actual registration of the callback. This may seem a lot like registering a listener in various Java APIs, but there's a big difference: callMeWhen() only registers code for one callback, as opposed to listeners that get called over and over until they're specifically removed. To get that kind of behavior in QTJ, we'd need to issue a new callMeWhen each time the callback is executed.

When the callback is called, its execute() method is called. Here's our simple implementation:

public void execute() {
    System.out.println ("ShutdownCallBack.execute()");

Note: The cancelAndCleanup() call is a required call to disassociate our callback from QuickTime when we're done using it. As the name suggests, there are two parts: a "cancel" that cancels any pending callbacks from occurring, and a "cleanup" that cleans up system resources. A separate cancel() method exists to just cancel pending callbacks. This would be useful if we wanted to reschedule or change the conditions under which the code is called back — we would then reschedule with a new call to callMeWhen().

As you might have expected from the fact that we subclassed ExtremesCallBack, there are different classes to extend in order to achieve different behaviors. All are subclasses of QTCallBack, but provide different constructors, since some take more detailed parameters. Each takes a TimeBase, typically fetched from a Movie, and some take a flags argument whose possible values are defined as trigger... constants in the StdQTConstants class.

Class Description
ExtremesCallBack Called when the given TimeBase reaches its start or stop point. You specify the behavior with the flags triggerAtStart or triggerAtStop.
RateCallBack Called when the TimeBase's rate changes. Using the flag triggerRateChange provides a callback on any rate change. Otherwise, you can use constants such as triggerRateLT or triggerRateGT to get called when the rate becomes less than, or greater than (respectively), a supplied value. The full set of possible flags is listed in the documentation for the native CallMeWhen() function.
TimeCallBack Called when a specific time value is reached. The flags determine whether the callback occurs only when the time is moving forward (triggerTimeFwd), backward (triggerTimeBwd), or either (triggerTimeEither).
TimeJumpCallBack This callback occurs when the TimeBase's time value changes by an amount other than would be expected from continuing to play at its current rate. An obvious example would be when the user clicks on the scrubber to "jump" to a different part of the movie. Setting up this callback takes no behavior flags or parameters.

While this is primarily an article about audio, it should be clear that the callbacks have a wide range of uses in many QuickTime applications. For example, a movie-playing GUI may want to enable or disable some of its buttons and menu items, based on whether a movie is currently playing.

What Planet Is This?

Now that we understand the basics of playing audio with QuickTime, let's think about what else we'd need to provide a more complete player application to end users.

One of the most obvious needs for a modern player is the ability to present metadata about the current song: information such as the title, the artist's name, what album it's from, etc. Practically any player puts this information front and center in the GUI.

There are different schemes for different audio formats, since some were designed to contain metadata and others weren't. MP3s, for example, weren't designed with these needs in mind -- arguably the only "metadata" per se is a copyright bit in the MPEG frame header. However, the ID3 standard was cleverly developed as a means of attaching metadata to MP3 files by defining a format that could be placed inside of an MP3 file but outside of the individual media frames. Typically, this information is simply placed at the beginning of an MP3 file, before its first MPEG frame.

When we open an MP3 file in QuickTime, we're really importing it, changing it into a QuickTime movie in memory. In the course of doing this, the ID3 data is parsed and placed in the movie's structure. If you recall from an earlier article on the QuickTime file format, QuickTime movies are represented both in memory and on disk as a tree of "atoms." These atoms can either contain data or other atoms, but not both. Typically, the top level of a self-contained movie file will contain an mdat atom to hold the media samples and a moov atom, which defines the movie's structure. The moov contains multiple trak structures, and also a handy atom called udta, short for "user data."

When an MP3 is imported, the ID3 tags become part of this user data atom. An Apple Q&A describes how an application can get values out of the user data: we just look for atoms in the user data whose atom types match some constants reserved for metadata. For example, to get the name of a song, we look in the user data for an atom called ©nam, while the album name is in an atom called ©alb. A full set of these constants is defined in QuickTime's Movies.h file.

It's important to remember that those atom types are not Strings. They're QuickTime "four character codes," meaning they're 32-bit int representations of four 8-bit ASCII characters. So, if we represent things in hex (which is actually easiest in this case), ©alb is an int made from the characters A9, 61, 6C, and 62, and thus is 0xA9616C62.

Once we know the atom type as a four-character code, getting the atom's contents from the Movie is pretty straightforward. We get a UserData object with Movie.getUserData(), and then find our atom and retrieve its contents with UserData.getTextAsString(). This method takes three arguments: an int for the requested atom type, an index that indicates our interest in the index-th instance of the given type (note that multiple atoms of the same type are legal, and also that this call is one-based, not zero-based), and finally an "international region tag" that takes one of the lang... constants from quicktime.io.IOConstants (langUnspecified is a useful wildcard value here).

This article's sample application, QTBebop, contains a MetadataJTable with a setMovie method that retrieves all of the defined metadata entries and turns them into the model of a Swing JTable. It defines all of the constants from Movies.h in an array called TAG_NAMES and looks for matches in a UserData object like this:

ArrayList foundTags =
    new ArrayList (TAG_NAMES.length);
ArrayList foundValues =
    new ArrayList (TAG_NAMES.length);
for (int i=0; i<TAG_NAMES.length; i++) {
    try {
        int type =
            ((Integer) TAG_NAMES[i][0]).intValue();
        String value = 
            userData.getTextAsString (type,
        if (value != null) {
           foundTags.add (TAG_NAMES[i][1]);
           foundValues.add (value);
    } catch (QTException qte) {} // didn't have tag
} // for

After this section, the foundTags and foundValues are converted into a two-dimensional array and passed to a DefaultTableModel constructor.

Notice the squashed catch block. If a given type is not found, QuickTime throws a QTException. For our current purposes, we do nothing, because this exception simply means that one of the many possible metadata atom types wasn't found in the user data. Returning an error code may make sense in C, but in Java, using exceptions to control program flow is considered something of a worst practice because of the expense of building a stack trace that won't be used, since the exception isn't really signaling an error state. From a purely Java point of view, it would be nice if QTJ had something like a UserData.hasType(int) method, so we could check for an atom without the performance hit of building a throwaway stack-trace if it isn't there.

That said, the MetadataJTable does its job, and works fairly quickly. Figure 1 shows an example of the table, running against an MP3 I ripped from my CD collection:

Parsed ID3 tags
Figure 1. Parsed ID3 tags

If you look closely at this figure, you might notice something missing: the artist! This points out a rather serious limitation of QuickTime's ID3 tag parsing. This file does have an artist tag, but it's in Unicode: "菅野よう子" (or, in Western characters, "Yoko Kanno," composer of the soundtracks for Cowboy Bebop, The Vision of Escaflowne, and other TV shows and movies). It seems, and has been confirmed on the quicktime-api mailing list, that QuickTime ignores any tag whose value isn't in plain old ASCII. Note that it doesn't help to supply a more appropriate language value to the getTextAsString() method -- there's no ©ART atom to call it on!

So how does iTunes support Unicode ID3 tags? Presumably, it has its own ID3 library, which makes sense, considering that it needs to both read and write ID3 data. So while QuickTime gives us easy ID3 tag parsing, the lack of support for international character sets might make you consider using another library for tag parsing, or rolling your own.

Bad Dog, No Biscuit

Since we know that QuickTime is used to play the AAC files supported by iTunes 4 and sold by the iTunes Music Store, we'd want and expect it to be able to handle metadata from those files, too.

In fact, since the M4A format for user-ripped AACs and the M4P for Apple-DRM'ed songs are both in the MPEG-4 file format, which itself was adapted from the QuickTime file format, we might reasonably expect that their metadata tags are already in the user-data atom, arranged in the same way that ID3 tags are parsed.

Yeah, we might expect that ... but we'd be wrong.

The metadata is still in the movie's user data, but in a much different and apparently undocumented format. So we have to examine it by hand. (Sigh ... This kind of thing is why I keep HexEdit on my dock.)

These iTunes-ripped files have an atom in the user data called meta. Its contents look like valid atoms, but aren't, since the first four bytes, which should be the size of the first child atom, are 0x00000000. Maybe that's meant to throw off QuickTime file parsers. Interestingly, a set of valid atoms begins after that, with four bytes of size and a four-byte type, just as we'd expect.

meta has a child called ilst, which in turn has children that use tag-name constants that we saw before. We can't use getUserDataAsString to get values from these atoms because we're now two levels below the user data, and besides, we're not through with undocumented oddities yet. In this AAC world, these atoms seem not to contain data, but rather a child atom called data, which contains eight junk bytes (perhaps flags) and then, finally, the data for the tag.

MetadataJTable also handles this kind of metadata. Its strategy in setMovie(), which kicks off a parse, is to look in the user data for the meta atom. If absent, the movie is assumed to be an ID3-tagged MP3 and uses the previously-described code. If it finds meta, then it looks for an ilst atom. If that succeeds, it starts looking for atoms named by TAG_NAMES. When one is found, it jumps ahead 24 bytes (to skip the size, type, size, "data," and 8 junk bytes) and reads the value.

An example of parsing a song purchased from the iTunes Music Store is shown in Figure 2.

Parsed .mp4 metadata
Figure 2. Parsed M4P metadata

You Make Me Cool

Surprisingly, everything we've done so far is in the main QuickTime API and is not strictly limited to audio content. Again, this speaks to QT's worldview that anything it reads in is a movie. Still, there are cool features that are specific to audio that we get at by retrieving a "handler" for the low-level audio data.

One thing we might want to provide for an audio player is a visual representation of the sound. On a home stereo or professional recording or mixing equipment, this would be represented as level meters that show the intensity of various frequency bands at an instant in time. In iTunes, these values are used to distort the visualizations and express the sound data in a visually pleasing way.

We can get these levels from QuickTime by first getting an AudioMediaHandler, which provides methods for getting and setting balance and metering audio levels. It's interesting to note that this class is an interface, implemented by SoundMediaHandler, StreamMediaHandler, and MPEGMediaHandler. The first is used for audio files and sound tracks within normal QuickTime movies and the second for streaming data, and the third represents the long-annoying fact that QuickTime sees multiplexed MPEG-1 files not as separate audio and video tracks but as a single opaque media type, which makes extracting sound and video from MPEG-1 quite difficult. Fortunately, MPEG-4 files read in as normal QuickTime movies, with separate video and audio tracks.

But how do we get an AudioMediaHandler? Again, it's helpful to state things in terms of QuickTime's view of the world:

So getting the AudioMediaHandler consists of code like the following:

AudioMediaHandler audioMediaHandler = null;
for (int i=1; i<=movie.getTrackCount(); i++) {
    Track track = movie.getTrack(i);
    Media media = track.getMedia();
    MediaHandler handler = media.getHandler();
    if (handler instanceof AudioMediaHandler) {
        audioMediaHandler =
            (AudioMediaHandler) handler;

Notice that once again a QuickTime get-by-index call, Movie.getTrack() in this case, uses indices that start at 1, not 0.

Now that we have the AudioMediaHandler, we can set balance, bass, and treble, and monitor sound levels. The first two are trivial. For the third, we need to pass in a structure representing which sets of frequencies, or "bands," we want to monitor. We do this with a MediaEQSpectrumBands object, which wraps the desired bands. For the QTBebop sample application, I've used the bands shown by iTunes' graphic equalizer, represented by the array EQ_LEVELS. So setting up for monitoring looks like this:

int[] EQ_LEVELS = {


MediaEQSpectrumBands bands =
    new MediaEQSpectrumBands (EQ_LEVELS.length);
for (int i=0; i<EQ_LEVELS.length; i++) {
    bands.setFrequency (i, EQ_LEVELS[i]);
audioHandler.setSoundEqualizerBands (bands);
audioHandler.setSoundLevelMeteringEnabled (true);

To get the levels, we call getSoundEqualizerBandLevels(), passing in the number of bands that we set up in the first place (e.g., EQ_LEVELS.length). This returns an int array, with values from 0 to 255. The QTBebop sample app uses a javax.swing.Timer to call this method every 100 milliseconds and redraw an offscreen java.awt.Graphics buffer with rectangles of a height proportional to the returned level values -- in other words, the rectangle gets 0 height if the level is zero, and is the height of the buffer when the level is 255.

The resulting application is shown in Figure 3.

The QTBebop application
Figure 3. The QTBebop application, with level meter

Author's Note: When run on Mac OS X with Java 1.4.1, the scrubber bar has repaint problems when a file is opened but is not yet playing. It does not have problems on OS X's Java 1.3.1 or on Windows, so this may be a version-specific bug, and has been filed appropriately. You can look in the sample code for the many workarounds I tried to get the scrubber repainted correctly.

See You, Space Cowboy

Obviously, our sample application could benefit from a graphical upgrade to make the bars more attractive -- perhaps spacing between bars, LED-like blocks of color, use of red and yellow regions in the upper part of each level, or a "sticky" line that represents the peak of each band's frequency over the last second. Adding balance and bass/treble controls would also be an easy improvement.

A more significant feature to add would be support for audio streams. As covered much earlier in this series, you can create a Movie from a URL by creating a DataRef from the URL string, which you then pass to the static Movie.fromDataRef() method. In terms of playable URLs, QuickTime can play RTSP-streamed content, of course, and can handle Shoutcast-style HTTP-streamed audio by changing the URL's http: protocol to the pseudo-protocol icy:, as detailed in the QuickTime 6 documentation.

With its support for a huge number of formats and codecs, QuickTime Java offers a great engine for writing audio clients. Using the techniques in this article should get your application off to a strong start.

Example Code

Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.