ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button O'Reilly Book Excerpts: Learning Java, 2nd Edition

XML Basics for Java Developers, Part 2

by Patrick Niemeyer and Jonathan Knudsen

In this second part in a several part series on XML for Java developers from Learning Java, 2nd Edition, learn about SAX and the SAX API.

SAX

Related Reading

Learning Java
By Patrick Niemeyer, Jonathan Knudsen

SAX is a low-level, event-style mechanism for parsing XML documents. SAX originated in Java but has been implemented in many languages.

The SAX API

To use SAX, we'll be using classes from the org.xml.sax package, available from the W3C (World Wide Web Consortium). To perform the actual parsing, we'll need the javax.xml.parsers package, which is the standard Java package for accessing XML parsers. The java.xml.parsers package is part of the Java API for XML Processing (JAXP), which allows different parser implementations to be used with Java.

To read an XML document with SAX, we first register an org.xml.sax.ContentHandler class with the parser. The ContentHandler has methods that are called in response to parts of the document. For example, the ContentHandler's startElement() method is called when an opening tag is encountered, and the endElement() method is called when the tag is closed. Attributes are provided with the startElement() call. Text content of elements is passed through a separate method called characters(). The characters() method can be invoked repeatedly to supply more text as it is read, but it often gets the whole string in one bite. The following are the method signatures of these methods of the ContentHandler class.

public void startElement{
    String namespace, String localname, String qname, Attributes atts };
public void characters{
    char[] ch, int start, int len };
public void endElement{ 
    String namespace, String localname, String qname };

The qname parameter is the qualified name of the element. This is the element name, prefixed with namespace if it has one. When working with namespaces, the namespace and localname parameters are also supplied, providing the namespace and unqualified name.

The ContentHandler interface also contains methods called in response to the start and end of the document, startDocument() and endDocument(), as well as those for handling namespace mapping, special XML instructions, and whitespace that can be ignored. We'll confine ourselves to the three methods above for our examples. As with many other Java interfaces, a simple implementation, org.xml.sax.helpers.DefaultHandler, is provided for us that allows us to override just the methods we're interested in.

JAXP

To perform the parsing, we'll need to get a parser from the javax.xml.parsers package. The process of getting a parser is abstracted through a factory pattern, allowing different parser implementations to be plugged into the Java platform. The following snippet constructs a SAXParser object and an XMLReader used to parse a file:

import javax.xml.parsers.*;

SAXParserFactory factory = SAXParserFactory.newInstance(  );
SAXParser saxParser = factory.newSAXParser(  );
XMLReader parser = saxParser.getXMLReader(  );

parser.setContentHandler( myContentHandler );
parser.parse( "myfile.xml" );

You might expect the SAXParser to have the parse method. The XMLReader intermediary was added to support changes in the SAX API between 1.0 and 2.0. Later we'll discuss some options that can be set to govern how XML parsers operate. These options are normally set through methods on the parser factory (e.g., SAXParserFactory) and not the parser itself. This is because the factory may wish to use different implementations to support different required features.

In This Series

XML Basics for Java Developers, Part 5
In this final in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, get an introduction to XSL/XSLT and Web services.

XML Basics for Java Developers, Part 4
In part four in a series of XML basics for Java developers book excerpts from Learning Java, 2nd Edition, learn about validating documents.

XML Basics for Java Developers, Part 3
In part three in this series of book excerpts on XML basics for Java developers from Learning Java, 2nd Edition, learn about the Document Object Model (DOM).

XML Basics for Java Developers, Part 1
This is the first in a series of book excerpts on XML for Java developers from Learning Java, 2nd Edition. This excerpt covers XML fundamentals.

SAX's strengths and weaknesses

The primary motivation for using SAX instead of the higher-level APIs that we'll discuss later is that it is lightweight and event-driven. SAX doesn't require maintaining the entire document in memory. If, for example, you need to grab the text of just a few elements from a document, or if you need to extract elements from a large stream of XML, you can do so efficiently with SAX. The event-driven nature of SAX also allows you to take actions as the beginning and end tags are parsed. This can be useful for directly manipulating your own models without first going through another representation. The primary weakness of SAX is that you are operating on a tag-by-tag level with no help from the parser to maintain context.

Building a Model Using SAX

The ContentHandler mechanism for receiving SAX events is very simple. It should be easy to see how one could use it to capture the value or attributes of a single element in a document. What may be harder to see is how one could use SAX to build a real Java object model from an XML document. The following example, SAXModelBuilder, does just that. This example is a bit unusual in that we resort to using reflection to do a job that would otherwise be a burden on the developer. Later, we'll discuss more powerful tools for automatically generating and building models for use with XML documents.

In this section, we'll start by creating some XML along with corresponding Java classes that serve as the model for this XML. We'll see later that it's possible to work with XML more dynamically, without first constructing Java classes that hold all the content, but we want to start out in the most concrete and general way possible. The final step in this example is to create the generic model builder that reads the XML and populates the model classes with their data. The idea here is that the developer is creating only XML and model classes--no custom code--to do the basic parsing.

Building the XML file

The first thing we'll need is a nice XML document to parse. Luckily, it's inventory time at the zoo! The following document, zooinventory.xml, describes two of the zoo's residents, including some vital information about their diets:

<?xml version="1.0" encoding="UTF-8"?>
<!-- file zooinventory.xml -->
<Inventory>
    <Animal class="mammal">
        <Name>Song Fang</Name>
        <Species>Giant Panda</Species>
        <Habitat>China</Habitat>
        <Food>Bamboo</Food>
        <Temperament>Friendly</Temperament>
    </Animal>
    <Animal class="mammal">
        <Name>Cocoa</Name>
        <Species>Gorilla</Species>
        <Habitat>Central Africa</Habitat>
        <FoodRecipe>
            <Name>Gorilla Chow</Name>
            <Ingredient>Fruit</Ingredient>
            <Ingredient>Shoots</Ingredient>
            <Ingredient>Leaves</Ingredient>
        </FoodRecipe>
        <Temperament>Know-it-all</Temperament>
    </Animal>
</Inventory>

The document is fairly simple. The root element, <Inventory>, contains two <Animal> elements as children. <Animal> contains several simple text elements for things like name, species, and habitat. It also contains either a simple <Food> element or a compound <FoodRecipe> element. Finally, note that the <Animal> element has one attribute (class) that describes the zoological classification of the creature.

The model

Now let's make a Java object model for our zoo inventory. This part is very mechanical--easy, but tedious to do by hand. We simply create objects for each of the complex element types in our XML, using the standard JavaBeans property design patterns ("setters" and "getters") so that our builder can automatically use them later. (We'll prove the usefulness of these patterns later when we see that these same model objects can be understood by the Java XMLEncoder tool.) For convenience, we'll have our model objects extend a base SimpleElement class that handles text content for any element.

public class SimpleElement {
    StringBuffer text = new StringBuffer();
    public void addText( String s ) { text.append( s ); }
    public String getText() { return text.toString(); }
    public void setAttributeValue( String name, String value ) {
        throw new Error( getClass()+": No attributes allowed");
    }
}
public class Inventory extends SimpleElement {
   List animals = new ArrayList(  );
   public void addAnimal( Animal animal ) { animals.add( animal ); }
   public List getAnimals(  ) { return animals; }
   public void setAnimals( List animals ) { this.animals = animals; }
}

public class Animal extends SimpleElement { 
   public final static int MAMMAL = 1;
   int animalClass;
   String name, species, habitat, food, temperament;
   FoodRecipe foodRecipe;

   public void setName( String name ) { this.name = name ; }
   public String getName(  ) { return name; }
   public void setSpecies( String species ) { this.species = species ; }
   public String getSpecies(  ) { return species; }
   public void setHabitat( String habitat ) { this.habitat = habitat ; }
   public String getHabitat(  ) { return habitat; }
   public void setFood( String food ) { this.food = food ; }
   public String getFood(  ) { return food; }
   public void setFoodRecipe( FoodRecipe recipe ) { 
      this.foodRecipe = recipe; }
   public FoodRecipe getFoodRecipe(  ) { return foodRecipe; }
   public void setTemperament( String temperament ) { 
      this.temperament = temperament ; }
   public String getTemperament(  ) { return temperament; }

   public void setAnimalClass( int animalClass ) { 
      this.animalClass = animalClass; }
   public int getAnimalClass(  ) { return animalClass; }
   public void setAttributeValue( String name, String value ) { 
      if ( name.equals("class") && value.equals("mammal") )
         setAnimalClass( MAMMAL );
      else
         throw new Error("Invalid attribute: "+name);
   }
   public String toString(  ) { return name +"("+species+")"; }
}

public class FoodRecipe extends SimpleElement {
   String name;
   List ingredients = new ArrayList(  );
   public void setName( String name ) { this.name = name ; }
   public String getName(  ) { return name; }
   public void addIngredient( String ingredient ) { 
      ingredients.add( ingredient ); }
   public void setIngredients( List ingredients ) { 
      this.ingredients = ingredients; }
   public List getIngredients(  ) { return ingredients; }
   public String toString() { return name + ": "+ ingredients.toString(  ); }
}

If you are working in the NetBeans IDE, you can use the Bean Patterns wizard for your class to help you create all those get and set methods (see the "Bean patterns in NetBeans" section in Chapter 21 for details).

Pages: 1, 2

Next Pagearrow