The Python comunity has too many deceptive XML benchmarks
The Python/XML community has an unfortunately long tradition of dodgy benchmarks. I had a lot to say about probably the most egregious example in my article on PyRXP. PyRXP is called an XML parser, and its developers benchmark it as such against other Python/XML parsers. The problem is that it turns out PyRXP is not an XML parser. It fails the most fundamental conformance to the most important aspect of XML: Unicode support. As a result, a benchmark of PyRXP against an XML parser is ludicrously unfair. In my article I had a lot to say about how poisonous such unfair benchmarks are.
On the less egregious end are benchmarks of libxml2's default Python binding, which is in many ways so gnomic (no pun intended) and trecherous that it's also an unfair comparison against most Pythonic XML tools. It sounds as if Martijn Faassen's lxml is making decent progress towards rectifying this.
But I must say that the benchmarks that were the last straw for me came from an old friend. Fredrik Lundh ("/F") is IMO one of the few XML package developers in the Python community who really understand both Python and XML. This has been generally borne out in his ElementTree library, about which I've always had a lot of good things to say. cElementTree came along and suddenly raised the Python/XML benchmark sweepstakes once again. As part of promotion of cElementTree, /F posted a benchmark on the home page. The benchmarks are very flattering to cElementTree, and it's probably deserving of some such flattery, but as I examined the performance issue a bit more, I've come to conclude that his benchmarks are pretty much useless.
The problem is that besides a performance bug in my own Amara 0.9.2, which /F brought to my notice, and that was fixed in the subsequent release, I was unable to reproduce under real-world conditions anything like the proportions implied in /F's benchmarks. Well, /F pretty much admits that all he's doing in his benchmark is reading in a file using each library. Hmm. This is not the stuff of which useful benchmarks are made. Nobody reads in a 3MB XML document just to throw all the data away, least of all Python developers who have long been vocal of their desire to do as little with XML as possible. Of course of I can't be 100% sure in this complaint because I haven't seen the benchmark code, but then again that's just another complaint.
I set out to run at least one real-world benchmark, in order to determine whether there is anything to the no-op benchmarks /F uses. The basics come from
this article, where I introduce the Old Testament test. The idea is simply to print all verses containing the word 'begat' Jon Bosak's Old Testament in XML, a 3.3MB document. A quick note on the characteristics of the file: it contains 23145 v elements containing each Bible verse and only text: no child elements. The
I'm working on a Dell Inspiron 8600 notebook with 2GB RAM. It's a Centrino 1.7GHz, which is about equivalent to a P4-3GHz (modulo the equally wacky world of CPU benchmarks). The OS is Fedora Core 3 Linux, and I've tuned DMA and the like. I'm running Python 2.3.2. The following are my pystone results:
I ran each case 5 times and recorded the high and low run times, according to the UNIX
First up: plain old PySAX. Forget the performance characteristics for a moment: this code was just a pain in the arse to write.
I get numbers ranging from 2.32 - 3.97 seconds.
Next up is PySAX using a filter to normalize text events, and thus simplify the SAX code a great deal. The filter,
I get numbers ranging from 2.66 - 4.88 seconds.
Next up is Amara pushdom, which tries to combine some of the performance advantages of SAX with the (relative) ease of DOM.
I get numbers ranging from 5.83 - 7.11 seconds.
Next up is Amara pushbind, which tries to combine some of the performance advantages of SAX with the most Pythonic (and thus easy) API I can imagine.
I get numbers ranging from 10.46 - 11.40 seconds.
Next up is Amara bindery chunker, which is the basis of pushbind.
I get numbers ranging from 9.44 - 10.27 seconds.
Finally, I look at /F's cElementTree.
I get numbers ranging from 1.53 - 3.18 seconds.
So what do I conclude from these numbers? As I've said before, the speed of cElementTree amazes, me, but it's advantage in the real world is nowhere near as dramatic as /F's benchmarks claim. More relevant to my own vanity, Amara 0.9.3's disadvantage in the real world is nowhere as dramatic as /F's benchmarks claim. IMHO, it's close enough in performance to all the other options, and offers so many advantages in areas besides performance, that it's a very respectable alternative to any Python/XML library out there.
But the point of this exercise goes far beyond all that. We really need to clean up our act in what is a very strange political battleground in the Python/XML space. If we've decided that MIPS wars are what we're going to be all about in development, then let's benchmark properly. Let's gather some real-world use-cases and normalized test conditions. Let's make sure all our benchmarks are transparent (at least release all the code used), and let's put some statistical rigor behind them (not an easy thing to do, and not something I claim to have done in this article). Let's do all this as a community.
While we're at it, I'd like to repeat my call for test case diversity from my PyRXP article: [R]un the tests on a variety of hardware and operating systems, and [don't] focus on a single XML file, but rather examine a variety of XML files. Numerous characteristics of XML files can affect parsing and processing speed, including:
And if we're not willing to do things rightly, let's stop deceiving users with meaningless benchmarks.
Comments on this weblog
1 to 10 of 10
2005-01-26 08:24:43 Uche Ogbuji | [View]
2005-01-26 04:09:57 RobertKern [View]
2005-01-25 18:49:06 Uche Ogbuji | [View]
2005-01-24 19:51:30 PJE [View]
2005-01-24 19:47:49 David_Mertz [View]
2005-01-24 17:06:26 Uche Ogbuji | [View]
2005-01-24 08:34:35 faassen [View]
2005-01-24 03:22:31 Fredrik Lundh | [View]
1 to 10 of 10
Return to weblogs.oreilly.com.