ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

What is Java Content Repository
Pages: 1, 2, 3, 4

Content Repository Model

JSR-170 says that a content repository is composed of a number of workspaces, which should normally contain similar content. A repository can have one or more workspaces. Each workspace contains a single rooted tree of items. An item is either a node or a property. Each node may have zero or more child nodes and zero or more child properties. Only the root node does not have parent and all other nodes have exactly one parent. Every workspace has only one root node. Properties have one node as a parent and cannot have children; they are leaves of the trees. All of the actual content in the repository is stored within the values of the properties.



Figure 2 describes a content repository model for a sample blogging application. Every child node of the root node represents one blog entry. Any actual data related to a blog entry is stored as properties of blogEntry. The properties blogTitle, blogAuthor, and creationTime should all be self-evident, while the blogContent property contains actual entry data, and a blogAttachment property holds a binary image file that is image attached:

Thumbnail, click for full-size image.
Figure 2. Content repository model (click for full-size version)

In addition to this repository model, JSR-170 also defines different features or operations that should be supported by a compliant repository. To make it easy for existing CMS vendors to adopt to these new standards, JSR-170 has brought in the concept of compliance levels, which define the number of features that must be supported for a given level of compliance. JSR-170 defines three different compliance levels:

  • Level 1 defines a read-only repository: This includes functionality for the reading of repository content, export of content to XML and searching. This functionality should meet the needs of presentation templates and basic portal applications comprising a large portion of existing codebase of content-related applications. Level 1 is also designed to be easy to implement on top of an existing content repository.
  • Level 2 defines a writable repository: Level 2 repository is a superset of Level 1. In addition to Level 1's functionality, it defines methods for writing content and importing content from XML. Applications written against Level 2 features include any application that generates data, information or content, both structured and unstructured.
  • Advanced options: In addition to Level 1 or Level 2 features, the specification defines five additional functional blocks: Versioning, (JTA) Transactions, Query using SQL, Explicit Locking and Content Observation. In addition to being either Level 1 or Level 2 compliant, any repository can decide to implement one or more of these functional blocks. A repository that implements all of these features in addition to being Level 2 compliant can be used as a general purpose off-the-shelf infrastructure for content management, document management, code management, or just about any other application that persists content

So, if you are a CMS vendor, the first step is to make your repository Level 1 compliant. As time progresses, you can decide to move to Level 2 compliance and implement advanced features based on your needs or client base.

What Is Apache JackRabbit?

Apache Jackrabbit is fully JSR-170 compliant, Level 2 compliant, and implements all optional feature blocks. Beyond the JCR-170 API, Jackrabbit features numerous extensions and administrative features that are needed to run a repository but are not specified by JCR-170.

We have decided to use Apache Jackrabbit as the content repository in our sample application. One problem with Apache Jackrabbit is that it doesn't offer a binary release, so developers need to build it from source code before installing it. See Building Jackrabbit for information on how to build Apache Jackrabbit from source code.

How to Configure Apache Jackrabbit

After downloading and building the Jackrabbit source code successfully, let's configure it. Jackrabbit needs two parameters at runtime to configure a content repository instance.

  1. Repository home directory: The filesystem path of the directory that usually contains all the repository content, search indexes, internal configuration, and other persistent information managed within the content repository. The directory structure of the content repository will look something like this:

       c:/temp
            |
            |--Blogging
                    |
                    |-repository
                    |       |
                    |       |-index
                    |       |-meta
                    |       |-namespaces
                    |       |-nodetypes             
                    |
                    |-version
                    |
                    |-workspace
                            |
                            |--default

    In this case, value of repository home directory parameter should be c:/temp/Blogging.

  2. Repository configuration file: The filesystem path of the repository configuration XML file. This file contains configuration information for the repository, including class names for Jackrabbit components (deciding which implementation we want to use) and configuration information required for that component. Take a look at the following listing, which represents what a typical configuration file would look like:

    <Repository>
     <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
      <param name="path" value="${rep.home}/repository"/>
     </FileSystem>
     <Security appName="Jackrabbit">
      <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager"/>
      <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule">
        <param name="anonymousId" value="anonymous"/>
      </LoginModule>
     </Security>
     <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
     <Workspace name="${wsp.name}">
      <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
       <param name="path" value="${wsp.home}"/>
      </FileSystem>
      <PersistenceManager 
            class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
       <param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
       <param name="schemaObjectPrefix" value="${wsp.name}_"/>
      </PersistenceManager>
      <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
       <param name="path" value="${wsp.home}/index"/>
      </SearchIndex>
     </Workspace>
     <Versioning rootPath="${rep.home}/version">
      <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
       <param name="path" value="${rep.home}/version" />
      </FileSystem>
      <PersistenceManager 
            class="org.apache.jackrabbit.core.state.db.DerbyPersistenceManager">
       <param name="url" value="jdbc:derby:${rep.home}/version/db;create=true"/>
       <param name="schemaObjectPrefix" value="version_"/>
      </PersistenceManager>
      </Versioning>
      <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
       <param name="path" value="${rep.home}/repository/index"/>
      </SearchIndex>
    </Repository>

    In the repository configuration file, the <Repository> element is a top-most or root element. One <Repository> element is equivalent to one repository configuration information and it contains following elements

    • <FileSystem>: The filesystem element represents virtual filesystem implementation that would be used for storing global data--data that is applicable at level of repository, such as registered namespace, custom node types, etc. Apache Jackrabbit provides a few options to store this data. One option is to store it on an underlying filesystem, which we are doing in our sample application by using LocalFileSystem. If you want this data to be stored in a database, then use DbFileSystem.
    • <Security>: The security element contains security configuration information for this repository. It has two child elements: <AccessManager> and <LoginModule>. The value of <AccessManager> indicates the class that should be queried to determine if a user has rights to perform a particular action on a particular item. The <LoginModule> element allows you to configure a class of LoginModule type, which is used for implementing authentication.
    • <Workspaces>: This element holds configuration that is common across all workspaces in that repository. Its rootPath attribute points to the root directory containing all workspace folders. In our sample directory configuration it would be c:/temp/Blogging/Workspace. The value of defaultWorkspace attribute contains default name of the workspace.
    • <Workspace>: This element represents the default template for all workspaces in this repository. So, when you create a new workspace in this repository, its workspace.xml file will look like this element. The <Workspace> element has three child elements. The first is <FileSystem>, which configures the virtual filesystem that should be used for storing data related to this workspace. The <PersistenceManager> element indicates how you want to persist content of this workspace. Apache Jackrabbit gives you with a choice of storing it on the filesystem, in a database, in memory as hashtable, or as an XML file. In our sample we are planning to persist that content in a Derby database. The last element is <SearchIndex>, which is an optional element. The value of this element points to a class which is used for indexing as well as actual query execution.
    • <Versioning>: This element configures a versioning-related object. You may have noticed that it contains the same child elements FileSystem and PersistentManager as seen in Workspace. That's because JSR-170 treats version as nodes, and so the same structure can be reused.
    • <SearchIndex>: This element configures the index that is used for searching repository-wide content.

The repository home directory and repository file configuration parameters are passed either directly to Jackrabbit when a repository instance is created or indirectly through settings for the JNDI object factory. You can set the value of the org.apache.jackrabbit.repository.home system property to point to the repository home directory. In our example, we will set it to c:/temp/Blogging. Then again, if you have a repository.xml file and you want to use that for setting up the repository, then you can set the value of the org.apache.jackrabbit.repository.conf system property to point your repository.xml. In our case, we don't want to use an existing repository.xml, instead we want Jackrabbit to generate a default repository.xml file for us. If you don't set either of these properties, then Jackrabbit will treat the current folder as the home directory and create a repository directory structure file as well as a repository.xml file in it. Refer to the Apache Jackrabbit online documentation to configure Apache Tomcat to create a repository configuration object and bind it in the JNDI tree.

Pages: 1, 2, 3, 4

Next Pagearrow