ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Parsing and Writing QuickTime Files in Java
Pages: 1, 2

Relevance Break: Is This Really Necessary?

You might well wonder if all this stuff is really necessary. After all, MPEG-1 and MPEG-2 don't have a particular file format at all, and they seem pretty popular. What does all of this fanciness gain us?



Consider the power of storing media data by reference. Let's say you're writing an audio or video editor. Your user has selected a big segment of media from a file and wants to copy it from the source movie and paste it into a new one. Do you read all that data from disk? Media files are big, so that's going to take a while. Worse yet, if you can't store it all in memory, are you going to turn around and write it to a scratch movie? Great, copy-and-paste now requires copying hundreds of megabytes--even with fast hard drives, your user will be annoyed (and really unpleasant, if you fill the drive). Consider what QuickTime provides instead: the ability to refer to that source media and an edit list to say what parts of that source we want. The copy and paste is practically instantaneous--we just store pointers.

That's part of the thinking that led MPEG-4 to adopt the QuickTime file format. As Carsten Herpel, Guido Franceschini, and David Singer write in The MPEG-4 Book:

The MPEG committee sought a life-cycle format--one in which the files could be used when capturing media, editing it, and combining it; when serving the media as a file download or as a stream; and when exchanging partial or complete presentations. This need for a life-cycle format is not met in many simple file format designs. For example ... the design approach of MPEG-2, in which a stream is simply recorded to a file, makes editing hard. (pp. 253-4)

Beyond the issues of handling audio and video, consider the scope of MPEG-4, which, in its various permutations, can incorporate 2D and 3D graphics, compositing of captured video with rendered graphics, a Java API ("MPEG-J") for writing interactive applications to be delivered inside a movie or stream, etc. To support all of that, the format needs to be extremely extensible. With the ability to define new structures as new atom types, QuickTime fits the bill.

To learn more about MPEG-4, start at the MPEG-4 Industry Forum. Let's cut to the chase and let our parser take a look at some MPEG-4 content. Envivo, which makes MPEG-4 software, has a handy page of MPEG-4 samples from various sources. A few that I find amusing are the Philips television commercials. Here's what the 800K "CD-R Dinner" commercial looks like when we let our parser have a look at it:

ftyp (16 bytes) 
skip (16 bytes) 
mdat (2918834 bytes) 
moov (46140 bytes) - 6 children
  mvhd (108 bytes) 
  trak (469 bytes) - 3 children
    tkhd (92 bytes) 
    mdia (337 bytes) - 3 children
      mdhd (32 bytes) 
      minf (264 bytes) - 3 children
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (208 bytes) - 6 children
          stts (24 bytes) 
          stsd (84 bytes) 
          stsz (20 bytes) 
          stsc (28 bytes) 
          stco (20 bytes) 
          ctts (24 bytes) 
        nmhd (12 bytes) 
      hdlr (33 bytes) [/odsm - ]
    tref (32 bytes) - 1 child
      mpod (24 bytes) 
  trak (449 bytes) - 2 children
    tkhd (92 bytes) 
    mdia (349 bytes) - 3 children
      mdhd (32 bytes) 
      minf (276 bytes) - 3 children
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (220 bytes) - 6 children
          stts (24 bytes) 
          stsd (96 bytes) 
          stsz (20 bytes) 
          stsc (28 bytes) 
          stco (20 bytes) 
          ctts (24 bytes) 
        nmhd (12 bytes) 
      hdlr (33 bytes) [/sdsm - ]
  trak (5855 bytes) - 2 children
    tkhd (92 bytes) 
    mdia (5755 bytes) - 3 children
      mdhd (32 bytes) 
      minf (5682 bytes) - 3 children
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (5622 bytes) - 6 children
          stts (32 bytes) 
          stsd (118 bytes) 
          stsz (5200 bytes) 
          stsc (172 bytes) 
          stco (68 bytes) 
          ctts (24 bytes) 
        smhd (16 bytes) 
      hdlr (33 bytes) [/soun - ]
  trak (39209 bytes) - 2 children
    tkhd (92 bytes) 
    mdia (39109 bytes) - 3 children
      mdhd (32 bytes) 
      minf (39036 bytes) - 3 children
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (38972 bytes) - 8 children
          stts (5312 bytes) 
          stsd (196 bytes) 
          stsz (3628 bytes) 
          stsc (544 bytes) 
          stco (192 bytes) 
          ctts (24 bytes) 
          stss (108 bytes) 
          uuid (28960 bytes) 
        vmhd (20 bytes) 
      hdlr (33 bytes) [/vide - ]
  iods (42 bytes) 
skip (37 bytes)

Similar structure, but some significantly different contents. Here are some key differences worth noting:

  • There are more top-level atoms than just moov. Some are trivial (skip is a placeholder for free space in the file), but mdat contains the raw media data for this movie. Our earlier examples referred to media outside the movie file. This is the first our parser has seen of a self-contained movie.
  • Actually, there's also a QuickTime atom called wide that's used like the first skip in this file, right before an mdat or other potentially huge atom. It's a placeholder in case the atom grows large enough to require an extended size, which means it would need another 8 bytes of header.
  • There are four tracks, two of which are audio and video (as seen by the vmhd video media header and smhd sound media header atoms, and associated handlers of subtypes vide and soun), and two new MPEG-4-only tracks that have nmhd headers. The handlers have subtypes odsm and sdsm. There's another MPEG-4-only atom, the "initial object descriptor" or iods. These MPEG-4 extensions are not defined in the QuickTime spec, but that's okay. We don't trip up parsing them because they're still normal atoms with a type and size.

Writing Movie Files with QuickTime

Now that we've toured the format and exposed ourselves to the parsing from which QuickTime for Java isolates us (with calls like Movie.fromFile()), we'll turn our attention to writing files. We can write different different kinds of QuickTime files, depending on our particular needs for an application.

The following code assumes that you have downloaded and installed the QuickTime for Java SDK on your Mac or Windows machine (apologies, as always, to developers using operating systems not supported by QuickTime). Because we'll want to use MPEG-4, please make sure you have QuickTime 6. Also, while the sample code includes an Ant build.xml file, you'll need to copy my.ant.properties.mac or my.ant.properties.win to my.ant.properties and possibly edit it so that its qtjavazip.file entry points to QTJava.zip on your system. Curiously, while the QTJ classes are found in your Java extensions directory when running an application, they need to be put in the CLASSPATH explicitly for a compile. Equivalent caveats apply if you're using make or your favorite IDE.

On the other hand, if you just want to run the code, running java -classpath makemovies.jar com.mac.invalidname.makemovies.MovieMaker should work fine, with one more caveat--you must use Java 1.3 on the Mac, because Apple is eliminating the JDirect library used by QuickTime for Java in its upcoming Java 1.4 implementation and generally advises against calling Carbon code from their Java 1.4. (This issue is a moving target and the 1.4 implementation is NDA'd, but here's the java-dev post announcing the policy and a follow-up with more details.)

The sample MakeMovies class creates a Movie in memory composed of references to another movie, saving variants of this movie to disk. The movie is created with low-level edits, meaning functions that work with segments of a movie defined by starting time and duration. To keep things simple, our movie consists of three five-second segments grabbed from the beginning, middle, and end of another movie:

// figure out start points for 5-second segments at
// approximate beginning, middle, and end of movie
int scale = sourceMovie.getTimeScale();
int end   = sourceMovie.getDuration();
int fiveSeconds  = 5 * sourceMovie.getTimeScale();
int[] startTimes = {0, // beginning
                    end/2, // middle
                    end - fiveSeconds};

// insert 5-second segments from sourceMovie into
// refMovie
int fiveSecRefTime = 5 * refMovie.getTimeScale();
for (int i=0; i < startTimes.length; i++) {
    sourceMovie.insertSegment (refMovie,
                               startTimes[i],
                               fiveSeconds,
                               i * fiveSecRefTime);
}

With that, we have a 15-second movie, which the demo app plays in a QTCanvas. Now to save it to disk.

If you were just combing over the javadocs, you might be tempted to use the convertToFile method in the Movie class. It's fairly straightforward, just needing the file and some constants for file-type, Mac file "creator," and a Mac ScriptManager. The downside here is that the generated file has uncompressed audio, and video barely compressed with Apple's "Video" codec. Still, take a look at it with our atom parser and we've got a normal-looking self-contained movie:

moov (2732 bytes) - 3 children
  mvhd (108 bytes) 
  trak (631 bytes) - 3 children
    tkhd (92 bytes) 
    edts (36 bytes) - 1 child
      elst (28 bytes) [1 edit]
    mdia (495 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/soun - Apple Sound Media Handler]
      minf (397 bytes) - 4 children
        smhd (16 bytes) 
        hdlr (57 bytes) [dhlr/alis - Apple Alias Data Handler]
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (280 bytes) - 5 children
          stsd (52 bytes) 
          stts (24 bytes) 
          stsc (40 bytes) 
          stsz (20 bytes) 
          stco (136 bytes) 
  trak (1985 bytes) - 3 children
    tkhd (92 bytes) 
    edts (36 bytes) - 1 child
      elst (28 bytes) [1 edit]
    mdia (1849 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/vide - Apple Video Media Handler]
      minf (1751 bytes) - 4 children
        vmhd (20 bytes) 
        hdlr (57 bytes) [dhlr/alis - Apple Alias Data Handler]
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (1630 bytes) - 6 children
          stsd (102 bytes) 
          stts (40 bytes) 
          stss (56 bytes) 
          stsc (364 bytes) 
          stsz (824 bytes) 
          stco (236 bytes) 
free (16 bytes) 
wide (8 bytes) 
mdat (5059930 bytes)

While we're here, let's note another seemingly-useful-but-probably-not method: createShortcutMovie in the QTFile class. You'd be forgiven for thinking that this creates a movie that preserves our references to the media in the original movie. Not even close--take a look at it with the atom-parser:

moov (254 bytes) - 1 child
  mdra (246 bytes) 
    dref (238 bytes)

In other words, a "shortcut" movie is something of a QuickTime analogue to an file alias or symbolic link.

Exporting Movies

So far, none of these methods have given us a way to specify that we'd like to state (and possibly change) the encoding or format of the saved movie. That's the realm of the MovieExporter, which writes a movie in a particular format with our choice of audio and video codecs. The code isn't hard to understand: get an exporter for a particular format, bring up a dialog for the user to specify encoding and quality settings, and let the exporter get to work.

What can be tricky is getting a MovieExporter. The list of available exporters is variable, depending on the user's version and what optional pieces of QuickTime they have installed. One technique is to call the MovieExporter with an int constant:

MovieExporter me = 
    new MovieExporter (StdQTConstants.kQTFileTypeMovie)

This creates an exporter to create typical QuickTime .movs. You can also use the hex value 0x6d706734 to get an MPEG-4 exporter in QuickTime 6. In case you were wondering, that int is the string mpg4 in ASCII. Passing short strings as 32-bit ints is very common in the QuickTime API.

What if you want to offer the user the ability to export to a format that might be a post-install add-on, or that might be included in a future version of QuickTime? For this, the MovieExporter has a second constructor, one that takes a ComponentIdentifier as its argument. To find a suitable ComponentIdentifier, we can iterate through the installed components, with ComponentIdentifier.find(), looking for those that have type "spit," which is provided as the constant StdQTConstants.movieExportType. The sample code produces a dialog of the discovered choices, modestly validating those that are actually appropriate for exporting our movie:

// build up a list of exporters and let user choose one
Vector compIdentifiers  = new Vector();
ComponentIdentifier ci  = null;
ComponentDescription cd =
    new ComponentDescription(StdQTConstants.movieExportType);

while ( (ci = ComponentIdentifier.find(ci, cd)) != null) {
    // check to see that the movie can be exported
    // with this component (this throws some obnoxious
    // exceptions, maybe a bit expensive?)
    try {
        MovieExporter exporter = new MovieExporter (ci);
        if (exporter.validate (movie, null))
            compIdentifiers.addElement (ci);
    } catch (StdQTException expE) {} // ow!
}

The sample code then takes the Vector of ComponentIdentifiers and populates a JComboBox, which goes into a user dialog, as seen in Figure 2. The sample code tries to export all tracks, audio and video. Choosing a movie audio-only format like "AIFF" will throw a QTException. Production code could be more careful about what tracks to export, or what choices the user has.

Choice of MovieExporters
Figure 2--the choice of MovieExporter

Once the user has chosen a MovieExporter, we call a method named doUserDialog to let the user choose quality and other format-specific options. If the user chooses the normal "QuickTime Movie," the export dialog looks like Figure 3. You may notice that the MPEG-4 exporter dialog is exceptionally verbose and carefully explains whether or not your choices will create a standard MPEG-4 file readable by other machines. Another quirk of the MPEG-4 exporter is that Windows users won't be able to export audio. (I'm not sure if this is because of technical limitations or issues licensing the AAC audio codec from Dolby.)

User dialog for QuickTime Movie export
Figure 3--the user dialog for QuickTime Movie export

The export takes a long time, particularly with large movies, slow computers, or certain codecs. To provide a good user experience, it's best to provide a progress update. In QTJ, a MovieProgress implementation can get callbacks from time-consuming operations. One thing that makes this a little difficult, however, is that the javadocs say that as the operation progresses, your implementation will receive the messages movieProgressOpen, movieProgressUpdatePercent, and movieProgressClose ... but those values from the native QuickTime API don't seem to be defined in QTJ. Fortunately, their values turn out to be pretty simple: 0, 1, and 2, respectively. In the sample code, I've extended a Swing ProgressMonitor to update as the export continues, as seen in Figure 4. Unfortunately, this only works on the Mac. On Windows, the callbacks occur on the AWT-Windows thread (even though the export was called from the main thread) and QuickTime seems to block the AWT thread, so our attempts to update the ProgressMonitor never get a chance to repaint. I haven't found a clever thread-scheduling or SwingUtilities way around this. If you do, please put it in the talkback!

Progress bar for MovieExporter
Figure 4--the progress bar for MovieExporter

Flat-land

Let's say that you're happy with saving as a QuickTime movie. In fact, you want to keep the original audio and video encoding, but you want to eliminate references to external files, copying all of the media data into one movie that can be sent to other machines without breaking. This process of eliminating references is called "flattening." It takes a straightforward call to Movie.flatten() with a list of usually-constant values:

movie.flatten (0,                                // movieFlattenFlags
    flatFile,                                    // fileOut
    StdQTConstants.kMoviePlayer,                 // creator
    IOConstants.smSystemScript,                  // scriptTag
    StdQTConstants.createMovieFileDeleteCurFile, // createQTFileFlags
    StdQTConstants.movieInDataForkResID,         // resId
    flatFile.getName());                         // resName

This produces a typical-looking QuickTime movie, with a big mdat atom, indicating the media is inside of the movie file:

wide (8 bytes) 
mdat (2326820 bytes) 
moov (3100 bytes) - 4 children
  mvhd (108 bytes) 
  trak (2077 bytes) - 3 children
    tkhd (92 bytes) 
    edts (36 bytes) - 1 child
      elst (28 bytes) [1 edit]
    mdia (1941 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/vide - Apple Video Media Handler]
      minf (1843 bytes) - 4 children
        vmhd (20 bytes) 
        hdlr (57 bytes) [dhlr/alis - Apple Alias Data Handler]
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (1722 bytes) - 5 children
          stsd (102 bytes) 
          stts (24 bytes) 
          stsc (412 bytes) 
          stsz (920 bytes) 
          stco (256 bytes) 
  trak (895 bytes) - 3 children
    tkhd (92 bytes) 
    edts (60 bytes) - 1 child
      elst (52 bytes) [3 edits]
    mdia (735 bytes) - 3 children
      mdhd (32 bytes) 
      hdlr (58 bytes) [mhlr/soun - Apple Sound Media Handler]
      minf (637 bytes) - 4 children
        smhd (16 bytes) 
        hdlr (57 bytes) [dhlr/alis - Apple Alias Data Handler]
        dinf (36 bytes) - 1 child
          dref (28 bytes) 
        stbl (520 bytes) - 5 children
          stsd (68 bytes) 
          stts (24 bytes) 
          stsc (256 bytes) 
          stsz (20 bytes) 
          stco (144 bytes) 
  udta (12 bytes) - 0 children

Don't Try This at Home

In a moment of curiosity, I browsed the methods of the AtomContainer class, which is used (infrequently) to pass around QuickTime memory structures as particularly complex parameters or for other really low-level tasks. I noted that it has a getBytes() method (inherited from QTHandleRef), and that a Movie could be coaxed into an AtomContainer representation.

So I'm like, "Huh, I could get the raw bytes of the Movie ... wonder what that looks like."

Dumping the byte array to disk is simple, and the first few bytes look awfully familiar:

0000 0e3c 6d6f 6f76 0000 006d 6d76 6864
0000 0000 ba6b 3f16 ba6b 3f70 0000 0258
0000 2328 0001 0000 00ff 0000 0000 0000
... 

Yep, there's moov and a mvhd right there on the first line. The memory structure is almost identical to the file format. Almost? Yes, it's apparently the same except for one byte: the size of the mvhd is wrong. On the Mac, it's 0x006d, when it should be 0x006c. On Windows, it's 0x016c. Accounting for endian differences between the platforms, it's like 1 was added to the size in an endian-specific way.

The sample code dumps the movie's AtomContainer two ways, in its raw form as atom.out and with this byte fixed as atom-fixed.mov. Surprisingly, in my testing, this fixed version consistently plays in QuickTime Player.

This may not be a recommended way to create a movie on disk that just keeps pointers to its source segments, but it should help tie things together, to help illustrate the fact that QuickTime's concepts of movies, tracks, and media and of atoms and their containment heirarchy, and its use of pointers to media data, are not just a conceit of the file format, but a core concept of how movies are managed in memory and manipulated by code.

Now that you know how hairy those structures are, be glad that the API largely isolates you from them!

Chris Adamson is an author, editor, and developer specializing in iPhone and Mac.


Return to ONJava.com.