ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Extend JavaSound to Play MP3, Ogg Vorbis, and More

by The JavaZOOM Team
08/11/2004

The JavaSound API adds audio capabilities to the Java platform. It's been part of J2SE since version 1.3 and it supports the WAV, AU, and AIFF audio formats, and provides MIDI support. It doesn't support some other audio formats, such as MP3, but it provides a flexible plugin architecture allowing any third-party vendor to add custom audio format support through the JavaSound Service Provider Interfaces (SPIs). This article deals with this plugin architecture and API, how to write and use a custom SPI implementation, how metadata such as title, artist, and copyright are exposed, and how multiple SPI implementations could be integrated in an application such as player or a game.

Plugin Architecture

The JavaSound API provides a plugin architecture, allowing third parties to support new formats such as MP3, Ogg Vorbis, FLAC, Monkey's Audio, and more. This architecture allows the JVM to discover and load plugins at runtime. Each plugin must implement the service provider interfaces. One implementation is needed for each new audio format supported. That's the reason why you can find one SPI implementation for MP3, one for Monkey's Audio, and so on.

To be loaded, the SPI implementation must be available in the JVM runtime classpath. To play audio, the JVM will look for javax.sound.sampled.spi.AudioFileReader and javax.sound.sampled.spi.FormatConversionProvider, text files stored in META-INF/services folder. These files contain the concrete classnames of the SPI implementation that will be instantiated. They are needed for loading and decoding audio data. Then, when an application needs to play an audio file, JavaSound will try each SPI implementation until throwing UnsupportedAudioFileException if none matches. Thus, for a JavaSound-based application (such as an audio player, game, educational program, etc.), developers don't have to pay attention to audio-format support. Instead, the application just needs to use the JavaSound API. SPI classes are needed at runtime only and not at build time, so, in addition to technical advantages, their use could have business advantages for GNU-GPL-based solutions integration.

From the SPI Provider Side: An MP3 Sample

Related Reading

Java Cookbook
By Ian F. Darwin

The JavaZOOM team provides an open source MP3 SPI implementation. It focuses on MP3 playing only. It relies on JLayer, an open source Java library that decodes and converts MP3 (MPEG 1, 2, and 2.5, Layers 1, 2, and 3) frames to PCM, the standard for uncompressed audio data. JavaSound service provider interfaces allows caller to read, convert, and write audio data, but to play MP3, we only need the read and convert features. JavaZOOM's MP3 SPI does not allow MP3 encoding.

Thus, the JavaSound API requires us to implement the AudioFileReader and FormatConversionProvider abstract classes. First, let's focus on our MpegAudioFileReader that extends AudioFileReader. Six methods must be implemented; three return an AudioFileFormat instance from an input (File, URL, or InputStream) and three return an AudioInputStream instance.

  • public abstract AudioFileFormat getAudioFileFormat(File file) throws UnsupportedAudioFileException, IOException
  • public abstract AudioFileFormat getAudioFileFormat(URL url) throws UnsupportedAudioFileException, IOException
  • public abstract AudioFileFormat getAudioFileFormat(InputStream stream) throws UnsupportedAudioFileException, IOException
  • public abstract AudioInputStream getAudioInputStream(File file) throws UnsupportedAudioFileException, IOException
  • public abstract AudioInputStream getAudioInputStream(URL url) throws UnsupportedAudioFileException, IOException
  • public abstract AudioInputStream getAudioInputStream(InputStream stream) throws UnsupportedAudioFileException, IOException

To avoid code duplication, we developed one more generic method for getAudioFileFormat:

  • public AudioFileFormat getAudioFileFormat(InputStream inputStream, long mediaLength) throws UnsupportedAudioFileException, IOException

Indeed, File and URL could be seen as InputStreams with a known length. We also did the same for getAudioInputStream. The work of our getAudioFileFormat is to read and parse the first MP3 frame to:

  1. Check if InputStream is a valid MP3 stream (if not, then it throws an UnsupportedAudioFileException).
  2. Extract audio information such as MPEG version, layer version, VBR flag, bitrate (bps), frequency (Hz), framesize, framerate, etc.
  3. Extract metadata such as ID3 tags (artist, album, date, copyright, comments, etc.).
  4. Return an MpegAudioFileFormat instance with all of these audio properties.

MpegAudioFileFormat extends AudioFileFormat by adding MP3-specific, high-level audio properties such as metadata (ID3 tags). Its constructor needs a Type and an AudioFormat:

  • MpegAudioFileFormat(AudioFileFormat.Type type, int byteLength, AudioFormat format, int frameLength)

We also extended AudioFormat to MpegAudioFormat to add MP3-specific properties (VBR, CRC flag, padding, etc.). Unlike AudioFileFormat, AudioFormat includes low-level audio properties such as sampling rate, channels, framesize and AudioFormat.Encoding. We defined multiple AudioFormat.Encoding constants, one for each combination of MPEG version and layer:

public class MpegEncoding extends AudioFormat.Encoding
{
  public static final AudioFormat.Encoding MPEG1L1 =
      new MpegEncoding("MPEG1L1");
  public static final AudioFormat.Encoding MPEG1L2 =
      new MpegEncoding("MPEG1L2");
  public static final AudioFormat.Encoding MPEG1L3 =
      new MpegEncoding("MPEG1L3");
  public static final AudioFormat.Encoding MPEG2L1 =
      new MpegEncoding("MPEG2L1");
  public static final AudioFormat.Encoding MPEG2L2 =
      new MpegEncoding("MPEG2L2");
  public static final AudioFormat.Encoding MPEG2L3 =
      new MpegEncoding("MPEG2L3");
  public static final AudioFormat.Encoding MPEG2DOT5L1 =
      new  MpegEncoding("MPEG2DOT5L1");
  public static final AudioFormat.Encoding MPEG2DOT5L2 =
      new MpegEncoding("MPEG2DOT5L2");
  public static final AudioFormat.Encoding MPEG2DOT5L3 =
      new MpegEncoding("MPEG2DOT5L3");

	public MpegEncoding(String strName)
	{
		super(strName);
	}
}

Now, let's focus on our MpegFormatConversionProvider, which extends FormatConversionProvider :

  • public abstract AudioInputStream getAudioInputStream (AudioFormat.Encoding targetEncoding, AudioInputStream sourceStream)
  • public abstract AudioInputStream getAudioInputStream (AudioFormat targetFormat, AudioInputStream sourceStream)
  • public abstract AudioFormat.Encoding[] getSourceEncodings()
  • public abstract AudioFormat.Encoding[] getTargetEncodings()
  • public abstract AudioFormat.Encoding[] getTargetEncodings (AudioFormat sourceFormat)
  • public abstract AudioFormat[] getTargetFormats (AudioFormat.Encoding targetEncoding, AudioFormat sourceFormat)

The getSourceEncodings and getTargetEncodings methods return the list of sources and target encodings supported by the conversion provider. For MP3, it's important that the returned AudioFormat.Encodings indicate only the combinations of sampling rate, bitrate, and channels that are allowed by the MP3 header specification. The getAudioInputStream methods return an AudioInputStream with the specified format (or encoding) from the given source AudioInputStream. For instance, MP3 SPI could return a decoded 44.1 kHz/16bits/stereo PCM stream given a 44.1 kHz/128 kbps/joint stereo input MP3 stream. To save time, we used the low-level classes of Tritonus. They provide nice methods for the matrix format conversion and circular buffer implementation (to store decoded PCM data) needed for most SPI implementations. This way, the main job of our MpegFormatConversionProvider is to call the JLayer API to synchronize and get decoded frames.

Pages: 1, 2

Next Pagearrow