Optimizing the Index
You now know that the indexing process can leave any number of
segments in the index. Although indexing performance is unaffected by the
number of segments in the index, search performance does depend on the
number of segments. The fewer segments, the better the search performance.
When you open an index for reading, each segment is opened with its own
SegmentReader
.
Each of those SegmentReaders
has its
own term dictionary, term enumerators, document enumerators, etc., so they
can chew up quite a few resources, not to mention the fact that each
reader needs to be searched separately. Therefore, the fewer the segments
the better when searching the index. This is even more important when
running short-lived command-line programs. It takes much more time to read
in an unoptimized index than to read in an optimized index.
IndexWriter
has an optimize
method that minimizes the number of segments in an index, making
the index optimal for searching. The best time to optimize the index is at
the end of a batch indexing session. If, however, you are incrementally
indexing your data—as you might do when indexing a model in a Rails
application—you need to be more careful deciding when to optimize the
index. The optimizing process itself can be quite resource-intensive, and
it prevents any other documents from being added to the index. Thus, it is
certainly not a good idea to optimize the index after each document is
added to the index.
It should be noted that for large indexes, ...
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.