|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Object
|
+--iglu.ir.AbstractVectorCreator
|
+--iglu.ir.TrainableVectorCreator
|
+--iglu.ir.TFIDFVectorCreator
Generates TFIDF vectors from a document set. It will construct
the document frequency automatically with documents passed in
through addDoc, or you can initialize the
class with your own document frequency information. To generate
the vectors, it uses the standard termFrequency*log_2(N/documentFrequency).
JDBCVectorCreator,
Serialized Form| Field Summary | |
private TermVector |
docOccurs
A term vector indicating the number of documents in which a term appears. |
private int |
highestRank
|
private int |
numDocs
The number of documents in the corpus |
| Fields inherited from class iglu.ir.AbstractVectorCreator |
|
| Constructor Summary | |
TFIDFVectorCreator()
Create a new TFIDFVectorCreator with no data. |
|
TFIDFVectorCreator(TermVector docOccurs,
int numDocs)
Create a new TFIDFVectorCreator using the supplied information. |
|
| Method Summary | |
void |
addDoc(Document d)
Add a document to the corpus. |
void |
addDoc(TermVector freqVec)
Add a document to the corpus, when the term frequencies are known. |
void |
addDocSet(DocumentSet ds)
Add an entire document set. |
TermVector |
getDocOccurs()
Returns a term vector indicating the number of documents in which each term appears. |
int |
getNumDocs()
|
TermVector |
getVector(Document d)
Get a vector for the given document. |
TermVector |
getVector(TermVector freqVec)
Get a vector for the given document when the term frequencies are known. |
static void |
main(java.lang.String[] args)
Runs some tests on this class. |
void |
setLimitTopN(int highestRank)
Returns vectors containing only the topN most frequently occuring terms |
void |
setNumDocs(int n)
|
static void |
test()
Runs some tests on this class. |
java.lang.String |
toString()
Returns a string representation of this object. |
| Methods inherited from class iglu.ir.AbstractVectorCreator |
cleanUp, setDictionary, setLinearlyScale, setMaxSize, setNormalize |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
private TermVector docOccurs
private int highestRank
private int numDocs
| Constructor Detail |
public TFIDFVectorCreator()
public TFIDFVectorCreator(TermVector docOccurs,
int numDocs)
docOccurs - A TermVector in which the value of each term
is the number of documents in which it appears.numDocs - The number of documents in the corpus.| Method Detail |
public void setLimitTopN(int highestRank)
public void addDoc(Document d)
addDoc in class TrainableVectorCreatord - a Document valuepublic void addDoc(TermVector freqVec)
freqVec - a TermVector that indicates the frequency of each
term in this documentpublic void addDocSet(DocumentSet ds)
public java.lang.String toString()
toString in class java.lang.Objectpublic TermVector getDocOccurs()
public void setNumDocs(int n)
public int getNumDocs()
public TermVector getVector(Document d)
getVector in interface VectorCreatord - a Document value
TermVector valuepublic TermVector getVector(TermVector freqVec)
freqVec - a TermVector that indicates the frequency of each
term in this documentpublic static void test()
public static void main(java.lang.String[] args)
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||