ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.


AddThis Social Bookmark Button

Parsing and Writing QuickTime Files in Java

by Chris Adamson

Apple's QuickTime turns 12 this year. Its very extensible file format has contributed to this longevity, allowing QuickTime to migrate from a world of CD-ROMs, AppleTalk, and static content to today's massively-networked, streaming, interactive world. The format is so flexible that it was chosen as the basis of the MPEG-4 file format. More than one might expect, the philosophy and concepts of the file format are integral to working with QuickTime structures at runtime.

However, the QuickTime APIs do much to isolate developers from the nuts-and-bolts of the file format when doing the most common tasks, so we'll examine the format with a simple pure-Java QuickTime file format parser, then we'll use some QuickTime for Java code to generate some different kinds of QuickTime files to illustrate the format's flexibility.

The details of the format are readily available in the 351-page Inside QuickTime: QuickTime File Format (PDF). They are also installed--for Mac OS X developers--in /Developer/Documentation/QuickTime/qtdevdocs/PDF/QTFileFormat.pdf by the Developer Tools installer.

Mighty Atom

The heart and soul of QuickTime is the concept of the "atom." The name should remind you of high-school chemistry, where an atom was the smallest unit of an element that retained the properties of the element. In QuickTime, an atom is the lowest level to which we can go and still be able to tell the difference between, say, an edit-list and a sprite. All atoms have a size and a type. Any other information they may contain depends on their type. This concept helps forwards-compatibility in the format--it's easy to skip over an unknown type because the size is right there.

There's a difference between "classic" atoms and newer "QT" atoms, but the latter is backwards-compatible with the former and both are commonly encountered in a single file. Let's focus on the commonalities. All atoms have a header of either 8 or 16 bytes, consisting of either two or three parts:

Sample Code

Download the sample code for this article.

  1. atom size:a 4-byte, unsigned integer. If 0, the atom continues to the end of the file.
  2. atom type: a 4-byte value, usually interpreted as an ASCII string like moov, though any value is valid.
  3. Optionally, an extended size: if the atom size was 1, then this field is present and interpreted as an 8-byte unsigned integer. This allows an atom to contain more than 4 GB of data.

The sample code contains a simple example in the EmptyMovie.mov file, which is just an untitled movie created in QuickTime Player and saved without modifiation. Open it in hexdump, od, or your favorite hex editor (I'm fond of HexEdit for the Mac). If you dump the output as characters (i.e., hexedit -cv EmptyMovie.mov), the atom types practically jump out at you:

\0  \0  \0 214   m   o   o   v  \0  \0  \0   l   m   v   h   d
\0  \0  \0  \0 272   @   Q 352 272   @   Q 372  \0  \0 002   X
\0  \0  \0  \0  \0 001  \0  \0  \0 377  \0  \0  \0  \0  \0  \0
\0  \0  \0  \0  \0 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
\0  \0  \0  \0  \0 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
\0  \0  \0  \0   @  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
\0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
\0  \0  \0 001  \0  \0  \0 030   u   d   t   a  \0  \0  \0  \f
 W   L   O   C  \0   4  \0 030  \0  \0  \0  \0

If we look at the byte values instead, and carefully count the sizes of the atoms, we can see the structure of the movie. Figure 1 shows a graphic representation. In case you're not comfortable reading hex, the file starts with the size and type of the first atom, an 0x8c-long moov, which matches the file size. It contains a 0x6c-long mvhd, which has a few non-null bytes. The moov's other child is a udta of size 0x18, which itself contains a WLOC of size 0x0c.

graphic map of atoms in EmptyMovie.mov
Figure 1--graphic map of atoms in EmptyMovie.mov

Little things to notice:

  • The moov and udata atoms contain other atoms, and don't seem to do anything besides contain atoms. This is a key trait of QuickTime atoms--they either contain data or other atoms, never both. That's different from other tree-structured data formats like XML, where an element can have both attributes and child elements.
  • What's the 0x0000 that's in the udta but follows the WLOC? Depending on your mood, it's a bug or a feature. Apple says that they write an extra 32 bits of zero after the last child of a udta atom to maintain compatibility with a bug from way back in QuickTime 1.0.
  • If your first eight bytes read as 0000 8c00 6f6d 767f, then you're running on Windows. QuickTime data structures are defined as "big-endian," meaning that the most-significant byte of a two-byte value comes first. PCs running Windows use little-endian ordering, so the bytes appear backwards when you look at 16-bit values.
  • Finally, there's no special sequence to identify the contents as QuickTime data, like the CAFEBABE "magic number" that begins Java class files or the ID3 sequence that typically begins an ID3-tagged MP3 file.

What does all this say anyway? The file-format docs define the contents of each of the "leaf" atoms, so we look there to interpret the mvhd and WLOC atoms. Since this is a minimal movie, there's not much to see--the mvhd is a "movie header;" a structure that defines some metadata values like creation time, preferred volume, time-scale, et cetera. These defaults are saved into the file. The next atom is user data, udta, a container for an arbitrarily long list of metadata atoms. This is a good place to put your own data into the movie, with whatever format suits you, so long as you choose an unused atom type and don't use all-lower-case, which is reserved for Apple. Here, there is only one piece of user data, the window location, WLOC. It contains two 16-bit unsigned ints for x and y, in this case (0x34,0x18) or in decimal, (52,24).

Related Reading

Ant: The Definitive Guide
By Jesse E. Tilly, Eric M. Burke

Doing It the Hard Way

While QuickTime for Java generally isolates you from the grubby details of the format, I've included a simple all-Java QuickTime file parser so we can quickly see the structure of a movie file on any J2SE platform. Download the accompanying source tarball and open it up. The parser source and a pre-compiled .jar are in the atom-parse directory. An Ant build.xml file is included to help you build the code, if you're interested (do ant help to see the available targets), or you can just run it from the .jar with java -classpath atomparse.jar com.mac.invalidname.qtatomparse.AtomParser.

The code starts with a basic ParsedAtom class, which represents any atom found in the file. This is subclassed as ParsedContainerAtom, containing an array of its children, and ParsedLeafAtom, which is meant to be a parent for type-specific subclasses that interpret particular atom types. A factory provides the parser with the class for a given type--new classes can be added by editing its properties file. Finally, AtomParser puts it all together, recursively calling a parseAtoms method when it discovers a container atom, and returning an array of children.

Here's the critical section for reading an atom's size, type, extended size, and data, given raf (a RandomAccessFile), off (current offset that we're reading; i.e., start of an atom), and stopAt (where the parent atom or file ends).

while (off < stopAt) {
    raf.seek (off);

    // 1. first 32 bits are atom size
    // use BigInteger to convert bytes to long 
    // (instead of signed int)
    int bytesRead = raf.read (atomSizeBuf, 0,
    if (bytesRead < atomSizeBuf.length)
        throw new IOException ("couldn't read atom length");
    BigInteger atomSizeBI = new BigInteger (atomSizeBuf);
    long atomSize = atomSizeBI.longValue();
    // this is kind of a hack to handle the udta problem
    // (see below) when the parent didn't have children,
    // meaning we've read 4 bytes of 0 and the parent atom
    // is already over
    if (raf.getFilePointer() == stopAt)
    // 2. next, the atom type
    bytesRead = raf.read (atomTypeBuf, 0
    if (bytesRead != atomTypeBuf.length)
        throw new IOException ("Couldn't read atom type");
    String atomType = new String (atomTypeBuf);
    // 3. if atomSize was 1, then this is 64-bit ext size
    if (atomSize == 1) {
        bytesRead = raf.read (extendedAtomSizeBuf, 0,
        if (bytesRead != extendedAtomSizeBuf.length)
            throw new IOException (
                      "Couldn't read extended atom size");
        BigInteger extendedSizeBI =
            new BigInteger (extendedAtomSizeBuf);
        atomSize = extendedSizeBI.longValue();
    // if this atom size is negative, or extends past end
    // of file, it's extremely suspicious (i.e.,we're not
    // really in a quicktime file)
    if ((atomSize < 0)  ||
       ((off + atomSize) > raf.length()))
           throw new IOException (
               "atom has invalid size: " + atomSize);

    // 4. if a container atom, then parse the children
    ParsedAtom parsedAtom = null;
    if (ATOM_CONTAINER_TYPES.contains (atomType)) {
        // children run from current point to end of the atom
        ParsedAtom [] children =
            parseAtoms (raf, raf.getFilePointer(), off + atomSize);
        parsedAtom =
            new ParsedContainerAtom (atomSize, atomType, children);
    } else {
        parsedAtom =
            AtomFactory.getInstance().createAtomFor (
                atomSize, atomType, raf);
    // add atom to the list
    parsedAtomList.add (parsedAtom);
    // now set offset to next atom (or end-of-file
    // in special case (atomSize = 0 means atom goes
    // to EOF)
    if (atomSize == 0)
        off = raf.length();
        off += atomSize;
    // if a 'udta' container atom, then jump ahead 4 
    // to work around Apple's QT 1.0 workaround
    // (http://developer.apple.com/technotes/qt/qt_03.html )
    if (atomType.equals("udta"))
        off += 4;
} // while not at stopAt

A few caveats to this code. First, please excuse my abuse of the BigInteger class to get longs from four-byte arrays, but the alternative is a blinding amount of bit-shifting. Moreover, the reason I use longs for atom sizes is that it usually avoids signing problems (32-bit java ints are signed, while the usual QuickTime atom size is a 32-bit unsigned value). However, it will be wrong if you happen to encounter an atom larger than 9,223,372,036,854,775,807 bytes (i.e.,a 64-bit integer with the top bit set). Just thought I'd mention that, in case you just got back from the store with a 10 exabyte drive. Also, my scheme for knowing what atoms are containers is to list known containers in AtomParser. If I've missed one, the parser handles it fairly gracefully, because we have the size of the atom and simply advance the offset to the next atom (unfortunately, without parsing the children).

Here's the output when we run the parser on EmptyMovie.mov:

moov (140 bytes) - 2 children
  mvhd (108 bytes) 
  udta (24 bytes) - 1 child
    WLOC (12 bytes)  (x,y) == (52,24)

So far, so boring. Let's try a more interesting bit of content. The movie tim-drm-ref.mov is a 45-second sound bite of Tim O'Reilly discussing digital rights management at the recent O'Reilly Mac OS X conference. The file is a reference to a 51 MB movie of the entire keynote panel, yet this file is a dainty 6 KB, since it consists entirely of metadata, including the references to the original movie on the O'Reilly web site.

This file's structure is a lot more involved:

moov (5957 bytes) - 4 children
  mvhd (108 bytes) 
  trak (3951 bytes) - 4 children
    tkhd (92 bytes) 
    edts (36 bytes) - 1 child
      elst (28 bytes) [1 edit]
    mdia (3803 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/vide - Apple Video Media Handler]
      minf (3705 bytes) - 4 children
        vmhd (20 bytes) 
        hdlr (55 bytes) [dhlr/url  - Apple URL Data Handler]
        dinf (76 bytes) - 1 child
          dref (68 bytes) 
        stbl (3546 bytes) - 6 children
          stsd (102 bytes) 
          stts (24 bytes) 
          stss (216 bytes) 
          stsc (172 bytes) 
          stsz (2248 bytes) 
          stco (776 bytes) 
    udta (12 bytes) - 0 children
  trak (1857 bytes) - 4 children
    tkhd (92 bytes) 
    edts (36 bytes) - 1 child
      elst (28 bytes) [1 edit]
    mdia (1709 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/soun - Apple Sound Media Handler]
      minf (1611 bytes) - 4 children
        smhd (16 bytes) 
        hdlr (55 bytes) [dhlr/url  - Apple URL Data Handler]
        dinf (76 bytes) - 1 child
          dref (68 bytes) 
        stbl (1456 bytes) - 5 children
          stsd (132 bytes) 
          stts (24 bytes) 
          stsc (880 bytes) 
          stsz (20 bytes) 
          stco (392 bytes) 
    udta (12 bytes) - 0 children
  udta (33 bytes) - 2 children
    WLOC (12 bytes)  (x,y) == (83,93)
    SelO (9 bytes)

This file is far more typical of what we expect to see in a movie, or more accurately, in a moov (go ahead, say it out loud: moo-vee). In addition to the metadata-bearing mvhd movie header and the udta user data, there are two trak atoms, both with a deep, yet similar, structure. This movie consists of two "tracks," one for video and one for audio. Tracks store metadata in the tkhd track header (analogous to the mvhd we saw earlier), an "edits" structure that indicates what parts of the underlying media are used by the track, and a detailed "media" structure.

The media structure has, again, a metadata header, a hdlr handler atom that indicates which component should handle the media data, a "data information" structure made up of dref data references to say where the media data is (in this file, elsewhere on disk, on the net, etc.), and finally, a tricky structure for locating and intepreting media samples.

It's too much to try to understand what all of these atoms represent right away if you're new to QuickTime, but it might be helpful to look at Apple's Introduction to QuickTime tutorial, specifically the section on tracks and media, and see how the contents map fairly directly onto the structure presented in the preceding two paragraphs. Another point of interest is Ridgeworks' QTatomizer, a shareware product that represents the atom structure of a QuickTime movie as a Swing JTree.

Pages: 1, 2

Next Pagearrow