Lucene has four different types of fields, which can be specified for optimal index creation: Keyword, UnIndexed, UnStored, and Text.
- Keyword fields are those that are not parsed by the analyzer, but are indexed and stored in the index.
JavaSourceCodeIndexeruses this field to store import declarations.
- UnIndexed fields are neither analyzed nor indexed, but their values are stored in the index, word for word. The Java file name is indexed with this field, as we would want to store the location of the file but would rarely search for keywords in the file name.
- UnStored fields are the opposite of
UnIndexedfields. Fields of this type are analyzed and indexed, but are not stored in the index. The source code of the method is indexed as an
codefield, as storing every line of code would require a large amount of space. The source code of a method can be directly retrieved from the original Java file, resulting in an optimal index size.
- Text fields are analyzed, indexed, and stored in the index. The class name is stored as a text field. The summary of the
Fields used by
JavaSourceCodeIndexeris shown in the following table:
|Method Block (Code)||UnStored|
|Method Parameter Type||Text|
The indexes created by Lucene can be viewed and modified using Luke, a useful open source tool for understanding indexes. Luke's snapshot of the indexes creates by
JavaSourceCodeIndexer is shown in Figure 1.
Figure 1. Snapshot of indexes in Luke
As you can see, the import declarations are stored as is, without tokenizing or analyzing. The class names and method names are converted to lower case and stored.