iglu.ir
Class StreamFreqTable

java.lang.Object
  |
  +--iglu.ir.StreamFreqTable
All Implemented Interfaces:
java.io.Serializable

public class StreamFreqTable
extends java.lang.Object
implements java.io.Serializable

This class is intended to provide incremental TFIDF information when there is no sense of a "document". For example, if you have streaming data with no clear breaks (as in a conversation). All terms are stored/retrieved case-independently.

Author:
Ryan Scherle
See Also:
Serialized Form

Field Summary
private static double ITF_SCALE_FACTOR
           
private  long maxWordFreq
           
private  long numTermsSeen
           
private  java.util.Hashtable termFreqHash
           
 
Constructor Summary
StreamFreqTable()
          Open a blank table for usage.
 
Method Summary
 long getNumTermsSeen()
          Returns the number of terms that have been used to generate the frequency information.
 double inverseTermFrequency(java.lang.String word)
          Returns a score in the range (0.0, 1.0] indicating the descrimination power of word.
 void lookAtString(java.lang.String s)
          Add a new chunk of information to the table.
 void lookAtTerm(java.lang.String word)
          Inspects a term to be counted for frequency information.
static void main(java.lang.String[] args)
          Runs some test examples on the methods of this class.
 long termFrequency(java.lang.String word)
          Returns the number of times a term has been seen.
 java.util.Iterator termIterator()
          Returns an iterator of all terms that have been seen.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ITF_SCALE_FACTOR

private static final double ITF_SCALE_FACTOR
See Also:
Constant Field Values

termFreqHash

private java.util.Hashtable termFreqHash

numTermsSeen

private long numTermsSeen

maxWordFreq

private long maxWordFreq
Constructor Detail

StreamFreqTable

public StreamFreqTable()
Open a blank table for usage.

Method Detail

lookAtTerm

public void lookAtTerm(java.lang.String word)
Inspects a term to be counted for frequency information.


termFrequency

public long termFrequency(java.lang.String word)
Returns the number of times a term has been seen.


inverseTermFrequency

public double inverseTermFrequency(java.lang.String word)
Returns a score in the range (0.0, 1.0] indicating the descrimination power of word. The inverseTermFrequency value of a word is highest when the word has never been seen, and lowest when the word is the most frequently seen word in the table. This function (exponentially) scales these frequencies to the range (0.0, 1.0].


getNumTermsSeen

public long getNumTermsSeen()
Returns the number of terms that have been used to generate the frequency information.


termIterator

public java.util.Iterator termIterator()
Returns an iterator of all terms that have been seen.


lookAtString

public void lookAtString(java.lang.String s)
Add a new chunk of information to the table. The string will be broken into words based on a generic set of delimiters, and each word will be added through lookAtTerm().


main

public static void main(java.lang.String[] args)
Runs some test examples on the methods of this class.