Optimizing the Index

You now know that the indexing process can leave any number of segments in the index. Although indexing performance is unaffected by the number of segments in the index, search performance does depend on the number of segments. The fewer segments, the better the search performance. When you open an index for reading, each segment is opened with its own SegmentReader. Each of those SegmentReaders has its own term dictionary, term enumerators, document enumerators, etc., so they can chew up quite a few resources, not to mention the fact that each reader needs to be searched separately. Therefore, the fewer the segments the better when searching the index. This is even more important when running short-lived command-line programs. It takes much more time to read in an unoptimized index than to read in an optimized index.

IndexWriter has an optimize method that minimizes the number of segments in an index, making the index optimal for searching. The best time to optimize the index is at the end of a batch indexing session. If, however, you are incrementally indexing your data—as you might do when indexing a model in a Rails application—you need to be more careful deciding when to optimize the index. The optimizing process itself can be quite resource-intensive, and it prevents any other documents from being added to the index. Thus, it is certainly not a good idea to optimize the index after each document is added to the index.

It should be noted that for large indexes, ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.