|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--iglu.ir.AbstractVectorCreator | +--iglu.ir.TrainableVectorCreator | +--iglu.ir.TFIDFVectorCreator
Generates TFIDF vectors from a document set. It will construct
the document frequency automatically with documents passed in
through addDoc
, or you can initialize the
class with your own document frequency information. To generate
the vectors, it uses the standard termFrequency*log_2(N/documentFrequency).
JDBCVectorCreator
,
Serialized FormField Summary | |
private TermVector |
docOccurs
A term vector indicating the number of documents in which a term appears. |
private int |
highestRank
|
private int |
numDocs
The number of documents in the corpus |
Fields inherited from class iglu.ir.AbstractVectorCreator |
|
Constructor Summary | |
TFIDFVectorCreator()
Create a new TFIDFVectorCreator with no data. |
|
TFIDFVectorCreator(TermVector docOccurs,
int numDocs)
Create a new TFIDFVectorCreator using the supplied information. |
Method Summary | |
void |
addDoc(Document d)
Add a document to the corpus. |
void |
addDoc(TermVector freqVec)
Add a document to the corpus, when the term frequencies are known. |
void |
addDocSet(DocumentSet ds)
Add an entire document set. |
TermVector |
getDocOccurs()
Returns a term vector indicating the number of documents in which each term appears. |
int |
getNumDocs()
|
TermVector |
getVector(Document d)
Get a vector for the given document. |
TermVector |
getVector(TermVector freqVec)
Get a vector for the given document when the term frequencies are known. |
static void |
main(java.lang.String[] args)
Runs some tests on this class. |
void |
setLimitTopN(int highestRank)
Returns vectors containing only the topN most frequently occuring terms |
void |
setNumDocs(int n)
|
static void |
test()
Runs some tests on this class. |
java.lang.String |
toString()
Returns a string representation of this object. |
Methods inherited from class iglu.ir.AbstractVectorCreator |
cleanUp, setDictionary, setLinearlyScale, setMaxSize, setNormalize |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
private TermVector docOccurs
private int highestRank
private int numDocs
Constructor Detail |
public TFIDFVectorCreator()
public TFIDFVectorCreator(TermVector docOccurs, int numDocs)
docOccurs
- A TermVector in which the value of each term
is the number of documents in which it appears.numDocs
- The number of documents in the corpus.Method Detail |
public void setLimitTopN(int highestRank)
public void addDoc(Document d)
addDoc
in class TrainableVectorCreator
d
- a Document
valuepublic void addDoc(TermVector freqVec)
freqVec
- a TermVector that indicates the frequency of each
term in this documentpublic void addDocSet(DocumentSet ds)
public java.lang.String toString()
toString
in class java.lang.Object
public TermVector getDocOccurs()
public void setNumDocs(int n)
public int getNumDocs()
public TermVector getVector(Document d)
getVector
in interface VectorCreator
d
- a Document
value
TermVector
valuepublic TermVector getVector(TermVector freqVec)
freqVec
- a TermVector that indicates the frequency of each
term in this documentpublic static void test()
public static void main(java.lang.String[] args)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |