iglu.ir
Class FileSearchEngine

java.lang.Object
  |
  +--iglu.ir.FileSearchEngine
All Implemented Interfaces:
InformationSource, SearchEngine

public class FileSearchEngine
extends java.lang.Object
implements SearchEngine

This is an implementation of the SearchEngine that uses the FileBTree. It does not implement the entire api. Right now it is mostly useful for storing document vectors and ids, and retrieving then using searches. Although the complete api is not implemented yet, what is implemented is pretty robust. I've used it a lot.

Version:
1.0
Author:
Travis Bauer

Field Summary
 FileBTree documentFile
           
(package private)  java.util.HashMap invEntries
           
 FileBTree invIndex
           
(package private)  boolean keepDocs
           
static int keyLength
           
(package private)  int numDocs
           
static int termLength
           
 FileBTree vectorFile
           
 
Constructor Summary
FileSearchEngine(java.lang.String fname)
           
FileSearchEngine(java.lang.String fname, boolean keepDocs)
           
 
Method Summary
 void addDocument(java.io.Serializable docId, java.io.Serializable docData, TermVector docVector)
          Add a vector to the collection.
 void close()
           
 boolean delete(java.io.Serializable docId)
           
 boolean docExists(java.io.Serializable docId)
          Returns true if a document with that ID is already in the database.
 java.util.Iterator docIterator()
           
 boolean equals(java.lang.Object o)
          Indicates whether an object is equal to this SearchEngine
protected  void flushAll()
           
private  void flushInvEntries()
           
 java.lang.String getDescription()
          Returns a textual description of this information source.
 java.io.Serializable getDocData(java.io.Serializable docId)
          Returns the document data associated with docId.
 java.lang.String getMetricName()
          Returns the name of the similarity metric used by this class.
 java.lang.String getName()
          Returns the name of this particular source.
 long getNumDocuments()
           
 double getSimilarityScore(TermVector vector1, TermVector vector2)
          Returns the similarity of the two vectors based on the metric indicated by getMetricName().
 TermVector getVector(java.io.Serializable docId)
          Get the vector for the given document.
 java.util.Iterator iterator()
          Returns an iterator over the document identifiers.
static void main(java.lang.String[] argv)
           
 ValueSortedMap retrieveDocuments(TermVector vector, int numSimilar)
          Return a list of document identifiers with documents similar to the given vector, sorted by similarity.
 void setDescription(java.lang.String description)
          Sets the description of this particular search engine
 void setDocData(java.io.Serializable docId, java.io.Serializable docData)
          Sets the document's data.
 void setName(java.lang.String name)
          Sets the name of this particular source.
 void setVector(java.io.Serializable docId, TermVector docVector)
          Change the vector for docId to the given vector.
static TermVector stringToTV(java.lang.String s)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

invIndex

public FileBTree invIndex

vectorFile

public FileBTree vectorFile

documentFile

public FileBTree documentFile

termLength

public static final int termLength
See Also:
Constant Field Values

keyLength

public static final int keyLength
See Also:
Constant Field Values

invEntries

java.util.HashMap invEntries

numDocs

int numDocs

keepDocs

boolean keepDocs
Constructor Detail

FileSearchEngine

public FileSearchEngine(java.lang.String fname)

FileSearchEngine

public FileSearchEngine(java.lang.String fname,
                        boolean keepDocs)
Method Detail

getNumDocuments

public long getNumDocuments()

addDocument

public void addDocument(java.io.Serializable docId,
                        java.io.Serializable docData,
                        TermVector docVector)
Description copied from interface: SearchEngine
Add a vector to the collection. Throws an exception if some error occurs during document addition, including an error if docId is already in the engine.

Specified by:
addDocument in interface SearchEngine

flushInvEntries

private void flushInvEntries()

docExists

public boolean docExists(java.io.Serializable docId)
Description copied from interface: SearchEngine
Returns true if a document with that ID is already in the database.

Specified by:
docExists in interface SearchEngine

equals

public boolean equals(java.lang.Object o)
Description copied from interface: SearchEngine
Indicates whether an object is equal to this SearchEngine

Specified by:
equals in interface SearchEngine
Overrides:
equals in class java.lang.Object

getDescription

public java.lang.String getDescription()
Description copied from interface: SearchEngine
Returns a textual description of this information source.

Specified by:
getDescription in interface SearchEngine

getDocData

public java.io.Serializable getDocData(java.io.Serializable docId)
Description copied from interface: SearchEngine
Returns the document data associated with docId.

Specified by:
getDocData in interface SearchEngine

getMetricName

public java.lang.String getMetricName()
Description copied from interface: SearchEngine
Returns the name of the similarity metric used by this class.

Specified by:
getMetricName in interface SearchEngine

getName

public java.lang.String getName()
Description copied from interface: SearchEngine
Returns the name of this particular source.

Specified by:
getName in interface SearchEngine

getSimilarityScore

public double getSimilarityScore(TermVector vector1,
                                 TermVector vector2)
Description copied from interface: SearchEngine
Returns the similarity of the two vectors based on the metric indicated by getMetricName().

Specified by:
getSimilarityScore in interface SearchEngine

getVector

public TermVector getVector(java.io.Serializable docId)
Description copied from interface: SearchEngine
Get the vector for the given document. If the document id is not in the collection, return null.

Specified by:
getVector in interface SearchEngine

iterator

public java.util.Iterator iterator()
Description copied from interface: SearchEngine
Returns an iterator over the document identifiers. Iterates in a random order.

Specified by:
iterator in interface SearchEngine

docIterator

public java.util.Iterator docIterator()

retrieveDocuments

public ValueSortedMap retrieveDocuments(TermVector vector,
                                        int numSimilar)
Description copied from interface: SearchEngine
Return a list of document identifiers with documents similar to the given vector, sorted by similarity.

Specified by:
retrieveDocuments in interface SearchEngine
Parameters:
numSimilar - The maximum number of documents to return. If 0, return all documents.
Returns:
A list of document identifiers, ordered by similarity.

setDescription

public void setDescription(java.lang.String description)
Description copied from interface: SearchEngine
Sets the description of this particular search engine

Specified by:
setDescription in interface SearchEngine

setDocData

public void setDocData(java.io.Serializable docId,
                       java.io.Serializable docData)
Description copied from interface: SearchEngine
Sets the document's data.

Specified by:
setDocData in interface SearchEngine

setName

public void setName(java.lang.String name)
Description copied from interface: SearchEngine
Sets the name of this particular source.

Specified by:
setName in interface SearchEngine

setVector

public void setVector(java.io.Serializable docId,
                      TermVector docVector)
Description copied from interface: SearchEngine
Change the vector for docId to the given vector.

Specified by:
setVector in interface SearchEngine

flushAll

protected void flushAll()

delete

public boolean delete(java.io.Serializable docId)

close

public void close()

stringToTV

public static TermVector stringToTV(java.lang.String s)

main

public static void main(java.lang.String[] argv)