iglu.examples
Class DirectorySearchEngine

java.lang.Object
  |
  +--iglu.examples.DirectorySearchEngine

public class DirectorySearchEngine
extends java.lang.Object

An example for using the iglu.ir package. This class creates a simple search engine from the files in a directory structure, assuming that the files are all text documents. It is rather inefficient, since it must rebuild the index every time it is used, but this simplicity makes it an easy example.

Author:
Ryan Scherle

Field Summary
private  RAMSearchEngine engine
           
private  ValueSortedMap searchResults
           
private  StopList stopList
           
private  TFIDFVectorCreator vecMaker
           
 
Constructor Summary
DirectorySearchEngine()
          Initializes the the vector creator (which collects keywords from the files and assigns weights to them), and search engine (which allows the keywords to be stored and retrieved).
 
Method Summary
 void doSearch(java.io.File directory, TermVector query)
          Indexes all files in the directory, and sends the query to the engine.
private  void indexFile(java.io.File file, boolean indexing)
          Add a single file to the search engine.
static void main(java.lang.String[] args)
          Runs the search engine.
private  void printResults(TermVector query, ValueSortedMap results)
          Prints out the list of files that match the query.
private  void processFiles(java.io.File file, boolean indexing)
          Goes through each file in the directory, passing the files to indexFile.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

engine

private RAMSearchEngine engine

vecMaker

private TFIDFVectorCreator vecMaker

searchResults

private ValueSortedMap searchResults

stopList

private StopList stopList
Constructor Detail

DirectorySearchEngine

public DirectorySearchEngine()
Initializes the the vector creator (which collects keywords from the files and assigns weights to them), and search engine (which allows the keywords to be stored and retrieved).

Method Detail

doSearch

public void doSearch(java.io.File directory,
                     TermVector query)
Indexes all files in the directory, and sends the query to the engine. Results are printed to stdout.


processFiles

private void processFiles(java.io.File file,
                          boolean indexing)
Goes through each file in the directory, passing the files to indexFile.


indexFile

private void indexFile(java.io.File file,
                       boolean indexing)
Add a single file to the search engine. This method must be called twice for every file. The first time, indexing is true, indicating that the file's information should be counted to provide the statistical information for the TFIDF algorithm. The second time, indexing is false, indicating that the term vector is actually added to the search engine.


printResults

private void printResults(TermVector query,
                          ValueSortedMap results)
Prints out the list of files that match the query.


main

public static void main(java.lang.String[] args)
Runs the search engine.