PHP DevCenter
oreilly.comSafari Books Online.Conferences.

advertisement


PHP Cookbook

Using PHP 5's SimpleXML

by Adam Trachtenberg, coauthor of PHP Cookbook
01/15/2004

XML is great, but I've constantly wondered why it's so difficult to parse. Most languages provide you with three options: SAX, DOM, and XSLT. Each has its own problems:

  • SAX's event-based design forces you to track elements manually, by pushing and popping them on and off of a stack.
  • DOM is bulky and cumbersome. While comprehensive, it takes seven lines just to read <hello>.
  • XSLT? If I wanted to program in a functional language, I'd use Lisp instead of PHP.

SimpleXML is a new and unique feature of PHP 5 that solves these problems by turning an XML document into a data structure you can iterate through like a collection of arrays and objects. It excels when you're only interested in an element's attributes and text and you know the document's layout ahead of time. SimpleXML is easy to use because it handles only the most common XML tasks, leaving the rest for other extensions.

This article shows how to use SimpleXML to read an XML file, parse the results into a useful form, and query the document with XPath. I use RSS for the examples, since some versions of RSS are nice and easy. Then there's RSS 1.0. It uses RDF, multiple namespaces, and defines a default namespace for its elements. (Not so nice and easy.)

Along the way, there's a brief discussion on XML namespaces and XPath, since they're necessary to process XML documents that expand beyond the basics. In particular, to handle RSS 1.0, you need to work with these XML specifications.

To try SimpleXML, you need a copy of PHP 5 Beta 3, as not everything described here works in earlier versions. SimpleXML also requires libxml2, an open source XML parsing library that all of PHP 5's XML extensions now use. SimpleXML support is enabled by default, so it's automatically installed when you build PHP 5.

Like PHP 5, SimpleXML is beta quality. There are still a few bugs, memory leaks, and unimplemented features, but overall it's coming together nicely.

Reading XML

The first set of examples use the following chunk of RSS, which is stored in rss-0.91.xml:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="0.91">
<channel>
    <title>PHP: Hypertext Preprocessor</title>
    <link>http://www.php.net/</link>
    <description>The PHP scripting language web site</description>
</channel>

<item>
    <title>PHP 5.0.0 Beta 3 Released</title>
    <link>http://www.php.net/downloads.php</link>
    <description>PHP 5.0 Beta 3 has been released. The third beta 
    of PHP is also scheduled to be the last one (barring unexpected 
    surprises).</description>
</item>
<item>
    <title>PHP Community Site Project Announced</title>
    <link>http://shiflett.org/archive/19</link>
    <description>
    Members of the PHP community are seeking volunteers to help 
    develop the first web site that is created both by the community and for 
    the community.</description>
</item>
</rss>

To begin, create a new SimpleXML object. For XML on disk, use simplexml_load_file('/path/to/file.xml'). If it's stored in a PHP variable, use simplexml_load_string($xml). So, to load the RSS, do:

$s = simplexml_load_file('rss-0.91.xml');

Element text is accessed like object properties:

print $s->channel->title . "\n";

PHP: Hypertext Preprocessor

If there's more than one element in the same level in document, they're placed inside an array. In this example, there's only one <channel>, but two <items>s. To access an <item>, use its location in the array:

print $s->item[0]->title . "\n";

PHP 5.0.0 Beta 3 Released

To print all titles, use a foreach loop:

foreach ($s->item as $item) {
    print $item->title . "\n";
}

PHP 5.0.0 Beta 3 Released
PHP Community Site Project Announced

Use array notation to read element attributes:

print $s['version'] . "\n";

0.91

Other XML features, like comments and processing instructions, are unsupported. You can't (yet) access these entities. However, since most XML documents don't place vital information in comments or use processing instructions, this isn't a big drawback.

Querying with XPath

SimpleXML uses XPath to allow you to gather information from a document. Find and print all the text inside title elements with:

foreach ($s->xsearch('//title') as $title) { 
    print "$title\n";
}

PHP: Hypertext Preprocessor
PHP 5.0.0 Beta 3 Released
PHP Community Site Project Announced

The xsearch() method searches a SimpleXML object and returns an array of matching nodes. Pass your XPath query as the argument. In this case, //title finds all title elements regardless of location in the tree. Or, restrict the search to only <title>s inside of <item>s with //item/title.

If you've used XSLT, you're familiar with XPath. XSLT templates use XPath expressions to determine when to process a node. For more on XPath, read John E. Simpson's XPath and XPointer (O'Reilly) or John's XML.com article, Top Ten Tips to Using XPath and XPointer. Additionally, Chapter 9 of XML in a Nutshell, by Elliotte Rusty Harold and W. Scott Means (O'Reilly), covers XPath and is available free online.

XPath and XPointer

Related Reading

XPath and XPointer
Locating Content in XML Documents
By John E. Simpson

While these examples are somewhat trivial, XPath is quite useful with complex documents, as you can create sophisticated queries to return finely tuned results.

Pages: 1, 2

Next Pagearrow




Valuable Online Certification Training

Online Certification for Your Career
Earn a Certificate for Professional Development from the University of Illinois Office of Continuing Education upon completion of each online certificate program.

PHP/SQL Programming Certificate — The PHP/SQL Programming Certificate series is comprised of four courses covering beginning to advanced PHP programming, beginning to advanced database programming using the SQL language, database theory, and integrated Web 2.0 programming using PHP and SQL on the Unix/Linux mySQL platform.

Enroll today!


Sponsored by: