Indexing Non-String Datatypes
So far, we’ve only really talked about adding strings to the index. As far as Ferret is concerned, every field is a string. But sometimes we want to index other datatypes, such as numbers and dates. We’re going to take a moment to talk about best practices when indexing non-string datatypes, specifically storing special datatypes in their own field. We won’t mention how to handle numbers or dates within a larger string field (like in the string The 39 Steps). You’ll learn more about text-field analysis in Chapter 5.
Number Fields
Indexing number fields is relatively straightforward. You don’t even need to
convert them to strings when you add them to the document. However, you
do need to think about how you set up the field. Make sure it is untokenized, as some
Analyzer
s will strip all numbers and
you’ll end up with an empty field:
index
<<
{
:product
=>
"
widget
",
:price
=>
24.95
,
:weight
=>
2400
}
The one exception is when you want to run range queries on a number field. For example, you may want
to submit a query for all products between $5.00 and $25.00 or for all
products that weigh less than 500 grams. In Ferret, the RangeQuery
sorts fields lexicographically, so
while 200 comes before 500, 70 comes after 500. To fix this, pad the numbers to a fixed width by prepending zeros. So instead of adding 5, 70, and 200, you would add 0005, 0070, and 0200, and instead of adding 3.45 and 101.95, you would add 0003.45 and 0101.95. This is pretty easy using Ruby’s ...
Get Ferret now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.