Published on ONJava.com (http://www.onjava.com/)
 See this if you're having trouble printing code examples

Putting XML in LDAP with LDAPHttp

by Jon Roberts

A software project is like a journey, and in this article I want to bring you along as a passenger. Foremost, I intend to describe the process of writing an application using my own LDAPHttp framework and gateway, a set of Java classes based on the Netscape/Mozilla LDAP SDK that provide simple MVC abstraction to use directory database back ends through Java servlets. The suggested app involves reading news and weblog feeds to create new data, so I will also get to touch on parsing RSS. Although the actual functionality of this little example may seem limited (and the overall approach unorthodox), hopefully when I'm done, the question of why will seem as unimportant as the general idea of combining XML with LDAP-driven models seems natural. This is not a how-to for LDAPHttp. A developer's guide will be forthcoming, I promise, but here I plan to skip the details in favor of the flavor.

The Briefest Introduction to Directories

The Lightweight Directory Access Protocol (LDAP) is an Internet standard for obtaining and manipulating data in directory databases through TCP/IP. Descending from the X.500 standard, an LDAP directory service is very different from a traditional RDBMS but can perform many of the same tasks at comparable speeds, with lower complexity and overhead. Directories are predominantly used to centralize user-contact and account information, but can be distributed, replicated, and extended to satisfy a wide variety of needs. LDAP is defined by several public RFCs and is implemented in server products by many major vendors, including Sun, IBM, Oracle, Microsoft, and Novell. OpenLDAP is a free, open source directory client and server offering based historically on the code developed at the University of Michigan in the original LDAP project. Because of the open nature and maturity of the standard, hooks into LDAP are manifest in a plethora of operating systems, tools, and development languages. There are also lots of books on LDAP, including ORA's own LDAP System Administration.

LDAP System Administration

Related Reading

LDAP System Administration
By Gerald Carter

A central schema defines object classes and attributes for use in the associated directory database. The atomic unit of a directory, an entry, is essentially a uniquely identified collection of attributes, each of which may be assigned a value or values. Every entry participates in one or more object classes, determining for which attributes it must or may hold values. Directory databases are composed of entries organized in a hierarchical structure called a directory information tree (DIT). Figure 1 shows the portion of my DIT that pertains to this example:

Example Directory Information Tree
Figure 1. An example DIT

The boxes represent both entries in the database and nodes in the tree. The labels are significant attribute/value pairs. Each entry is identified with a handle, called a distinguished name (or dn), which is built from its location in the DIT. For instance, the account for charlie in this tree is referenced by the dn uid=charlie,ou=Generic,ou=People,o=mentata.com. Entries can be described, imported, and exported in the LDAP Data Interchange Format (LDIF), which identifies the entry and then lists the pairs. Meet Charlie:

dn: uid=charlie, ou=Generic, ou=People, o=mentata.com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
uid: charlie
givenName: Charlie
sn: Foster Kanie
cn: Charlie Foster Kanie
userPassword: nonnie

As you might guess, Charlie's game is journalism, not network security. This will be his project, so I'm giving Charlie (and thee) the keys to his own branch of entries in my DIT.

access to dn.subtree="ou=Reps,ou=Comments,ou=Expressions,o=mentata.com"
    by dn="uid=charlie,ou=Generic,ou=People,o=mentata.com" write
    by * read

A Pundit's Plan

Charlie yearns to express himself, but wants to start small by writing editorial commentary on weblogs and news stories he finds on the Internet. Of course, he could post comments on all of these disparate sites, but Charlie wants his own empire, so he's decided to consolidate his work on his own presence with references back to the sources. His idea is to grab titles, links, and descriptions for selected items from news or weblog feeds, add his own two cents, and store the results as an entry in my directory. Somewhere between a reply and a rip, we'll call these entries Reps (RSS extracted postings?) and put them in the DIT under ou=Reps,ou=Comments,ou=Expressions,o=mentata.com.

Default schemas delivered with a directory server typically come from standards bodies, but this type of entry requires something unique. I happen to already have a custom OpenLDAP schema file that will do the trick, so we'll just reuse it and make a Rep just another expression. Once added, a Rep entry may look something like this:

dn: uid=CRA3,ou=Reps,ou=Comments,ou=Expressions,o=mentata.com
uid: CRA3
cn: India putting the boots to Microsoft.
businesscategory: ora
dnqualifier: 20030530161712Z
description: The Times of India is reporting: "President A P J
	Abdul Kalam on Wednesday urged Indian IT
	professionals to develop and specialise in open source code software
	rather than use proprietary solutions based on systems such as 
	Microsoft Windows." Steve Mallett
link: http://www.oreillynet.com/pub/wlg/3248
content: India is smart about software quality, so this is
	quite an endorsement!

Supporting the Gateway Theory

Charlie has no intention of hand-coding LDIF files any more than he wants to read RSS feeds himself, so some software is clearly required. Let's use mine :) The mentata.ldaphttp package defines a framework but doesn't deliver any functionality by itself. It includes a web gateway, which is composed of a second package, mentata.gateway, and a half-dozen servlets. Self-contained and simple, I like to consider the gateway an example LDAPHttp application that can solve many primitive directory needs, such as creating or searching entries. Rather than write a new application from scratch, we'll keep things easy and meet these requirements with the gateway.

The building blocks for LDAPHttp applications are contexts, packages of Java classes under mentata.ldaphttp in the class hierarchy visible to the JVM of the servlet container. I happened to be working on a context called forum for needs similar to Charlie's. Figure 2 contains a view of the relevant portion, where the necessary classes will reside:

My LDAPHttp forum package
Figure 2. My LDAPHttp forum package

A localContext class is required for all LDAPHttp context packages, and it's used to configure the controller with basic information about the target LDAP server and its DIT. Every other class above represents an LDAPHttp object — an abstraction for a type of entry. It's all about inheritance: a comment is a simple textual expression, while a solicitation would be a comment with a form for adding responses (more comments). Reps will therefore be solicitations that are created, in part, by reading an RSS feed. We will further subclass rep to define very small objects to represent the peculiarities of each source, in this case, news from The Register (registerrep), Lawrence Lessig's weblog (lessigrep), and the weblogs from O'Reilly Meerkat (orarep).

The LDAPHttp gateway servlets all use a straightforward grammar for naming the context and object, as well as perhaps the identifier or attribute, germane to a request. Hence, the URL for the action in the HTML form to add a new entry with the create servlet is as simple as http://server/path/create/forum/orarep. The only other things we'll want to submit with the creation request are the ID for the particular reference article and Charlie's own remarks, of course. The rest will come right off of the Internet.

Using SAX to Get What You Want

Parsing XML can be an expensive task. In scanning a file like an RSS feed over the network to obtain only a small subset of the information therein, we would prefer to read once as a stream and retain only what we need. Hence, SAX is the right tool for the job. To do the work, we will nest a subclass of a SAX DefaultHandler inside of our rep class. This handler will tap the RDF of RSS v1.0, the rdf:about attribute for item elements in particular, to identify the precise post to which Charlie wants to respond. In other words, Charlie submits an article ID, a rep class method converts this to an appropriate substring, the substring is passed to the handler, and the handler looks for the matching item as it parses.

public void startElement(String namespaceURI,
	String localName, String qName, Attributes atts)
	throws SAXException {

	// store the current element name
	this.current_element = localName;

	// indicate if within a relevant parent tag
	if ( localName.equalsIgnoreCase("item") ) {
		if (atts.getValue("rdf:about").indexOf(this.item_substring) > 0) {
			this.item_found = true;
			this.in_item    = true;

Once inside the appropriate item, the handler will fill buffers with the bits we want.

public void characters(char[] ch, int start, int length)
	throws SAXException {

	// store information from the relevant item
	if (in_item) {
		if ( this.current_element.equalsIgnoreCase("title") ) {
			this.title_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("link") ) {
			this.link_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("description") ) {
			this.description_buffer.append( new String(ch, start, length) );

		} else { if ( this.current_element.equalsIgnoreCase("creator") ) {
			this.creator_buffer.append( new String(ch, start, length) );

The values are then available with standard get methods that convert the buffers to trimmed strings and return them individually or as a composite. Another method indicates whether the specific item was in fact found in the feed by returning item_found. The class is rounded out with the usual error methods. A lot of work for a little data? Perhaps, but SAX doesn't get much simpler and we will only need one handler.

The Urge to preCreate

The code for the gateway create servlet primarily constructs a new entry in memory from attribute values submitted with the request, then attempts to add it to the directory. Between these two steps is a call to the object's preCreate() method, which does nothing by default. We override this method in our rep class to perform the RSS parsing and to populate additional attributes with derived values.

public void preCreate() throws LDAPHttpException {

	// perform standard comment precreation

	// retrieve and parse the RSS feed
	String uri = getFeed();
	RSS1Handler rss_handler = new RSS1Handler();
	rss_handler.setItemSubstring( getItemSubstring() );

	try {
		XMLReader reader = XMLReaderFactory.createXMLReader(PARSER_CLASS);
		InputSource input_source = new InputSource(uri);

	// handle feed retrieval exceptions
	} catch(IOException e) {
		throw new LDAPHttpException(
			"Unable to fetch the RSS feed: " + e.getMessage() );

	// handle feed parsing exceptions
	} catch(SAXException e) {
		throw new LDAPHttpException(
			"Unable to parse the RSS feed: " + e.getMessage() );

	// update attribute values with those found in the feed
	if ( rss_handler.itemFound() ) {
		resetAttribute("cn", new String[] {rss_handler.getTitle()} );
		resetAttribute("description", new String[] {
			rss_handler.getDescription()} );
		resetAttribute("link", new String[] {
			rss_handler.getLink()} );

	// handle the case where the item wasn't found in the feed
	} else {
		throw new LDAPHttpException("An item matching <i>"
			+ getItemSubstring()
			+ "</i> was not found in the current feed: " + getFeed() );

The feed URI is defined in the constructor for the particular subclass, along with some information used in retrieval. For instance:

public class registerrep extends rep {
	public registerrep() throws LDAPHttpException {

		setLabel("The Register news reply");

An Entry You Can Count On

Looking back at the DIT diagram and the Rep LDIF, you are probably still wondering about c=CRAcounter and uid=CRA42. While most directory servers manage a few attributes at the system level for things like timestamps, there is nothing in LDAP that will auto-increment or define a primary key for your entries when you add them; the server expects this information from you at creation time. As I've learned, the best practice for solving this problem is to store and manage an incrementing identifier value in the directory itself. Because it's small, standard, and doesn't appear in this picture, we'll hijack the country object class for our counter entry.

dn: c=CRAcounter, ou=Reps, ou=Comments, ou=Expressions, o=mentata.com
objectclass: top
objectclass: country
c: CRAcounter
description: 0

With this loaded, LDAPHttp and the create servlet will automatically grab and set a unique uid value for each new post per this line in the rep constructor:

setIncremental("CRA", "c=CRAcounter, ou=Reps, ou=Comments, " 
	+ "ou=Expressions, o=mentata.com", "description");

And "Bob's your uncle," as they say in Australia.

But Why, Daddy?

I'm sure to some this may look like a profound waste of time. I've provided motivation for LDAP and LDAPHttp elsewhere, but this example raises the question: why would you take data from one available format and store it in another for use in closed little web applications?

XML is an excellent way to express simple or complex textual information openly, and is every bit the de facto standard for general data representation that I predicted last year it would continue to become. On the other hand, what do we always say about silver bullets? Although there are mechanisms for indexing XML content, if you want to search your data by fields or dynamically re-express it, odds are you want it in a database of some sort. Relational database systems are adequately powerful, but can be overkill for some needs, as they require lots of administration and a potentially stilted process of partitioning data into two-dimensional views. Directory databases are simpler, plus they excel at searchability and are well suited to host information that is available live but doesn't change frequently once created. You can make use of identities and sophisticated schemes for access control without new software, and the results are accessible to any client or API that speaks LDAP.

The question of whether to use LDAPHttp and my gateway here may be more to the point. LDAP is a mature standard, so you can bet there are and will be plenty of ways to communicate with directory servers; promising new open source apps are being released with increasing frequency. With LDAPHttp, my own goal has been to deliver a platform that plays to the specific strengths of LDAP, servlets, and HTTP to do useful and interesting things without regard to what those things are. The framework may be non-standard, but it's as extensible as Java itself. LDAPHttp is clearly not a panacea, but it will provide elegant solutions for appropriate problems. Good ideas can organically bubble up from contexts to app libraries or the core packages. Someday, this software may serve as a competitive advantage in some vertical market of my choosing, but for now, I've deferred the question of what in the hope that others can use my work to prototype, demonstrate, and deliver unique services of their own. Think about Charlie.

The Larger Conversation

If I had to pick a space today, I'd say I am particularly interested in supporting transactions that involve the exchange of text (e.g., news and weblogs). Since a Rep is a solicitation, the HTML page returned by the gateway retrieve servlet will include a form for creating a new comment under the ou=Anonymous,ou=Comments,ou=Expressions,o=mentata.com branch of the DIT. Hence, Charlie's earlier Rep could provoke

dn: uid=CMA217,ou=Anonymous,ou=Comments,ou=Expressions,o=mentata.com
uid: CMA217
cn: India's endorsement?
businesscategory: rep
dnqualifier: 20030622123345Z
description: anonymously
content: I don't know what Bombay developers would think of all this,
	but you're going to have to do better if you want them
	to read your posts.
parent: uid=CRA3,ou=Reps,ou=Comments,ou=Expressions,o=mentata.com

That final parent attribute is a special type used by LDAP to relate entries by dn value. Think of it as a pointer or foreign key. One of the features of LDAPHttp is to allow you to trace dn attributes in either direction, providing (among other things) links to comments made on a retrieved Rep. This all works well because a request for an entry by its dn or a request for entries with values for an indexed dn attribute matching a given distinguished name will both run like streaks of greased lightning through LDAP. To me, a good candidate application for LDAPHttp should involve lots of dn attributes. So is Charlie's app a good one? It depends on how much dialogue his posts generate! Even so ...

<comment>Much like open source developers, reporters and columnists frequently exchange and co-opt the ideas of their fellows without much concern for abstract notions of property. In fact, James Joyce goes so far as to make incest the central metaphor for journalism in the Aeolus episode of Ulysses. On the Internet today, with all of that fast and easy communication facilitated between people worldwide in real time, commentary as a profession may soon be overwhelmed by commentary as a diversion. RSS in its many flavors makes it a snap to generate your own syndicated feed, blasting your observations and perspective far and wide. Along for the ride are the expressions of others as they have influenced you. This is all a boon for an open, democratic society, but the real value is not in the posting, but in the responding. The best place for that is in the original conversation. Sorry Charlie, but I don't think anybody should be an island.</comment>

That doesn't mean it was such a bad example for this article; we covered a lot. And with this journey at an end, I will employ yet another class from my forum package to start new conversations, asking my perennial favorite question: suggestions?


Jon Roberts is an independent software developer and sole proprietor of Mentata Systems.

Return to ONJava.com.

Copyright © 2009 O'Reilly Media, Inc.