|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--iglu.ir.TermVector
A term vector is a mapping of all words in a language to values.
Each term vector represents some concept by having various values
for the terms. A term vector can represent the content of a document,
a user's preferences, or any other semantic concept. This class is
very similar to ValueSortedMap
, but the methods and
error checking are geared more to term vectors than general maps.
The new version of this class utilizes and ObjectPager
to enable a large number of term vectors to be treated as if they were
in memory at the same time. For most purposes, nothing special needs
to be done. But if you plan on having a large number of TermVectors in
memory at the same time (several thousand large vectors), then you can
use the setObjectPager
to a FileObjectPager
.
After you do this, you don't have to do anything unusual, just treat the TermVectors normally. However, the TermVector class will swap vectors out to the ObjectPager when too many are created, and swap them back in when needed.
ValueSortedMap
,
ObjectPager
,
FileObjectPager
,
Serialized FormField Summary | |
private static ObjectPager |
cache
A pager for TermVectors. |
static TermVector |
EMPTY
An empty term vector, for use when you don't want to do anthing to the vector. |
private java.lang.Object |
termsId
A reference to this vector's ValueSortedMap |
Constructor Summary | |
TermVector()
Constructs a term vector with all terms in the world having value 0. |
|
TermVector(java.lang.String someWords)
Constructs a simple term vector with the string given. |
|
TermVector(java.lang.String someWords,
java.lang.String someDelimeters)
Constructs a simple term vector with the string given. |
Method Summary | |
void |
clear()
Clears all the terms from the vector |
java.lang.Object |
clone()
Creates and returns a copy of this object. |
double |
cosineSim(TermVector tv)
Gives the 2-norm (euclidean distance) between this vector and the given one. |
boolean |
equals(java.lang.Object o)
Tests for equality. |
protected void |
finalize()
Delete myself from the cache |
double |
get(java.lang.String term)
Returns the value associated with term . |
private ValueSortedMap |
getAllTerms()
Get this TermVector's ValueSortedMap from the pager. |
void |
increment(java.lang.String term)
Adds one to the value of a term. |
void |
linearlyScale()
Linearly scales the vector, to skew the data. |
static void |
main(java.lang.String[] args)
Runs some test cases on this class. |
void |
normalize()
Normalizes the vector. |
void |
put(java.lang.String term,
double value)
Associates value with term in the
vector. |
void |
putAll(TermVector additional)
Adds the contents of another term vetor to this one. |
private void |
readObject(java.io.ObjectInputStream in)
Read the VSM from the input stream |
void |
removeStopWords(StopList stopList)
Removes from the list all words found in the given stoplist, as well as one-character words and words that are longer than 20 characters. |
void |
scaleBy(double n)
Scales all terms of the vector by the given value. |
private void |
setAllTerms(ValueSortedMap vsm)
Set this items ValueSortedMap in the pager. |
static void |
setObjectPager(ObjectPager newObjectPager)
Set the pager for the TermVectors. |
int |
size()
Returns the number of terms with non-zero weight in the vector. |
void |
subtract(java.lang.String theTerm)
Removes a single term from the list of terms. |
void |
subtract(TermVector subWords)
Performs a set difference on the list of terms. |
java.util.Iterator |
termIterator()
Returns an iterator for the non-zero terms contained in this vector. |
static void |
test()
Runs some test cases on this class. |
TermVector |
topN(int n)
Returns a new (clone) TermVector containing the top n words in the TermVector, along with their values. |
java.lang.String |
toString()
Returns a string representation of the vector. |
void |
truncateTo(int numTerms)
Truncates this term vector to the given length. |
private void |
writeObject(java.io.ObjectOutputStream out)
Write the VSM to the output stream, not only the id |
Methods inherited from class java.lang.Object |
getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public static final TermVector EMPTY
private java.lang.Object termsId
private static ObjectPager cache
RAMObjectPager
which means that vectors are not swapped out.
Constructor Detail |
public TermVector()
public TermVector(java.lang.String someWords)
public TermVector(java.lang.String someWords, java.lang.String someDelimeters)
Method Detail |
public static void setObjectPager(ObjectPager newObjectPager)
private ValueSortedMap getAllTerms()
private void setAllTerms(ValueSortedMap vsm)
public void clear()
public int size()
public void put(java.lang.String term, double value)
value
with term
in the
vector.
public void putAll(TermVector additional)
public void increment(java.lang.String term)
public double get(java.lang.String term)
term
. Terms that
haven't had explicit values set will return a value of zero.
public void normalize()
public void scaleBy(double n)
public void linearlyScale()
public void subtract(TermVector subWords)
public void subtract(java.lang.String theTerm)
public void removeStopWords(StopList stopList)
public java.util.Iterator termIterator()
get()
method.
This iterator does not support the remove operation.
public void truncateTo(int numTerms)
numTerms
terms, ordered by value, are kept.
public java.lang.Object clone()
clone
in class java.lang.Object
public java.lang.String toString()
toString
in class java.lang.Object
public TermVector topN(int n)
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public double cosineSim(TermVector tv)
public static void test()
protected void finalize() throws java.io.IOException
finalize
in class java.lang.Object
java.io.IOException
private void writeObject(java.io.ObjectOutputStream out) throws java.io.IOException
java.io.IOException
private void readObject(java.io.ObjectInputStream in) throws java.io.IOException, java.lang.ClassNotFoundException
java.io.IOException
java.lang.ClassNotFoundException
public static void main(java.lang.String[] args)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |