|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--iglu.ir.StreamFreqTable
This class is intended to provide incremental TFIDF information when there is no sense of a "document". For example, if you have streaming data with no clear breaks (as in a conversation). All terms are stored/retrieved case-independently.
Field Summary | |
private static double |
ITF_SCALE_FACTOR
|
private long |
maxWordFreq
|
private long |
numTermsSeen
|
private java.util.Hashtable |
termFreqHash
|
Constructor Summary | |
StreamFreqTable()
Open a blank table for usage. |
Method Summary | |
long |
getNumTermsSeen()
Returns the number of terms that have been used to generate the frequency information. |
double |
inverseTermFrequency(java.lang.String word)
Returns a score in the range (0.0, 1.0] indicating the descrimination power of word . |
void |
lookAtString(java.lang.String s)
Add a new chunk of information to the table. |
void |
lookAtTerm(java.lang.String word)
Inspects a term to be counted for frequency information. |
static void |
main(java.lang.String[] args)
Runs some test examples on the methods of this class. |
long |
termFrequency(java.lang.String word)
Returns the number of times a term has been seen. |
java.util.Iterator |
termIterator()
Returns an iterator of all terms that have been seen. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
private static final double ITF_SCALE_FACTOR
private java.util.Hashtable termFreqHash
private long numTermsSeen
private long maxWordFreq
Constructor Detail |
public StreamFreqTable()
Method Detail |
public void lookAtTerm(java.lang.String word)
public long termFrequency(java.lang.String word)
public double inverseTermFrequency(java.lang.String word)
word
.
The inverseTermFrequency
value of a word is
highest when the word has never been seen, and lowest when
the word is the most frequently seen word in the table.
This function (exponentially) scales these frequencies to
the range (0.0, 1.0].
public long getNumTermsSeen()
public java.util.Iterator termIterator()
public void lookAtString(java.lang.String s)
lookAtTerm()
.
public static void main(java.lang.String[] args)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |