Chapter 6. Ferret in Practice

In Chapter 1, you saw how to index all the text files under a directory. Unfortunately, most of the files in a filesystem aren’t text files, so let’s extend the indexer to handle some different file types. We also want to filter and sort files by different fields, such as their modification date. In this chapter, we’ll implement these extensions and more. We’ll call our application ferretfind.

Indexing Multiple Document Types

  7 module FerretFind
  8   class Reader
  9     @@subclasses = []
 10     @@readers = {}
 11 
 12     def Reader.inherited(subclass)
 13       @@subclasses << subclass
 14     end
 15 
 16     def Reader.load_readers(field_infos) 
 17       @@subclasses.each do |subclass|
 18         reader = subclass.new(field_infos)
 19         subclass::EXTENSIONS.each do |ext|
 20           @@readers[ext.downcase] = reader
 21         end
 22       end
 23     end
 24 
 25     def Reader.get_reader(path) 
 26       @@readers[(File.extname(path)[/[^.]+/]||"").downcase]
 27     end
 28 
 29     def Reader.read(path) 
 30       document = {
 31         :path     => path,
 32         :accessed => File.atime(path).strftime("%Y%m%d"),
 33         :modified => File.mtime(path).strftime("%Y%m%d")
 34       }
 35       if File.readable?(path) and reader = Reader.get_reader(path) 
 36         document.merge!(reader.read(path))
 37       end
 38       return document
 39     end
 40 
 41     protected
 42       def initialize(field_infos); end           
 43       def read(path); {} end                     
 44       def add_field(field_infos, field, options) 
 45         field_infos.add_field(field, options) unless field_infos[field]
 46       end
 47   end
 48 end

The FerretFind::Reader class is ...

Get Ferret now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.