|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--iglu.ir.PorterStemmer
This class implements the Porter stemming algorithm. Basically, a set of suffix transformation rules are applied to each term, transforming similar terms to a common root. The algorithm is fully described in "An algorithm for suffix stripping", M.F. Porter (1980), Program, Vol. 14, No. 3, pp. 130-137
This class is based on the implemenation provided by Marting Porter at http://www.tartarus.org/~martin/PorterStemmer.
Field Summary | |
private char[] |
b
|
private int |
i
|
private int |
i_end
|
private static int |
INC
|
private int |
j
|
private int |
k
|
Constructor Summary | |
PorterStemmer()
|
Method Summary | |
private void |
add(char ch)
Add a character to the word being stemmed. |
private void |
add(char[] w,
int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. |
void |
applyFilter(Document d)
Stems all terms in a document's indexible content. |
private void |
clear()
|
private boolean |
cons(int i)
|
private boolean |
cvc(int i)
|
private boolean |
doublec(int j)
|
private boolean |
ends(java.lang.String s)
|
private java.lang.String |
getResult()
After a word has been stemmed, it can be retrieved by getResult(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.) |
private char[] |
getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. |
private int |
getResultLength()
Returns the length of the word resulting from the stemming process. |
private int |
m()
|
static void |
main(java.lang.String[] args)
Reads a word from stdin and prints its stem. |
java.lang.String |
processText(java.lang.String string)
Returns the string with all terms stemmed. |
private void |
r(java.lang.String s)
|
private void |
setto(java.lang.String s)
|
java.lang.String |
stem(java.lang.String str)
Stems a single term. |
private void |
stemIt()
Stem the word placed into the Stemmer buffer through calls to add(). |
private void |
step1()
|
private void |
step2()
|
private void |
step3()
|
private void |
step4()
|
private void |
step5()
|
private void |
step6()
|
static void |
test()
Runs some tests on this class. |
private boolean |
vowelinstem()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
private char[] b
private int i
private int i_end
private int j
private int k
private static final int INC
Constructor Detail |
public PorterStemmer()
Method Detail |
private void clear()
private void add(char ch)
private void add(char[] w, int wLen)
private java.lang.String getResult()
private int getResultLength()
private char[] getResultBuffer()
private final boolean cons(int i)
private final int m()
private final boolean vowelinstem()
private final boolean doublec(int j)
private final boolean cvc(int i)
private final boolean ends(java.lang.String s)
private final void setto(java.lang.String s)
private final void r(java.lang.String s)
private final void step1()
private final void step2()
private final void step3()
private final void step4()
private final void step5()
private final void step6()
private void stemIt()
public java.lang.String stem(java.lang.String str)
public void applyFilter(Document d)
applyFilter
in interface DocumentFilter
d
- a Document
valuepublic java.lang.String processText(java.lang.String string)
string
- a String containing words. This class assumes that
the punctuation has already been dropped and that the words are
separated by spaces.
public static void test()
public static void main(java.lang.String[] args)
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |