iglu.ir
Class PorterStemmer

java.lang.Object
  |
  +--iglu.ir.PorterStemmer
All Implemented Interfaces:
DocumentFilter

public class PorterStemmer
extends java.lang.Object
implements DocumentFilter

This class implements the Porter stemming algorithm. Basically, a set of suffix transformation rules are applied to each term, transforming similar terms to a common root. The algorithm is fully described in "An algorithm for suffix stripping", M.F. Porter (1980), Program, Vol. 14, No. 3, pp. 130-137

This class is based on the implemenation provided by Marting Porter at http://www.tartarus.org/~martin/PorterStemmer.

Author:
Martin Porter, Ryan Scherle

Field Summary
private  char[] b
           
private  int i
           
private  int i_end
           
private static int INC
           
private  int j
           
private  int k
           
 
Constructor Summary
PorterStemmer()
           
 
Method Summary
private  void add(char ch)
          Add a character to the word being stemmed.
private  void add(char[] w, int wLen)
          Adds wLen characters to the word being stemmed contained in a portion of a char[] array.
 void applyFilter(Document d)
          Stems all terms in a document's indexible content.
private  void clear()
           
private  boolean cons(int i)
           
private  boolean cvc(int i)
           
private  boolean doublec(int j)
           
private  boolean ends(java.lang.String s)
           
private  java.lang.String getResult()
          After a word has been stemmed, it can be retrieved by getResult(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
private  char[] getResultBuffer()
          Returns a reference to a character buffer containing the results of the stemming process.
private  int getResultLength()
          Returns the length of the word resulting from the stemming process.
private  int m()
           
static void main(java.lang.String[] args)
          Reads a word from stdin and prints its stem.
 java.lang.String processText(java.lang.String string)
          Returns the string with all terms stemmed.
private  void r(java.lang.String s)
           
private  void setto(java.lang.String s)
           
 java.lang.String stem(java.lang.String str)
          Stems a single term.
private  void stemIt()
          Stem the word placed into the Stemmer buffer through calls to add().
private  void step1()
           
private  void step2()
           
private  void step3()
           
private  void step4()
           
private  void step5()
           
private  void step6()
           
static void test()
          Runs some tests on this class.
private  boolean vowelinstem()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

b

private char[] b

i

private int i

i_end

private int i_end

j

private int j

k

private int k

INC

private static final int INC
See Also:
Constant Field Values
Constructor Detail

PorterStemmer

public PorterStemmer()
Method Detail

clear

private void clear()

add

private void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.


add

private void add(char[] w,
                 int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.


getResult

private java.lang.String getResult()
After a word has been stemmed, it can be retrieved by getResult(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)


getResultLength

private int getResultLength()
Returns the length of the word resulting from the stemming process.


getResultBuffer

private char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.


cons

private final boolean cons(int i)

m

private final int m()

vowelinstem

private final boolean vowelinstem()

doublec

private final boolean doublec(int j)

cvc

private final boolean cvc(int i)

ends

private final boolean ends(java.lang.String s)

setto

private final void setto(java.lang.String s)

r

private final void r(java.lang.String s)

step1

private final void step1()

step2

private final void step2()

step3

private final void step3()

step4

private final void step4()

step5

private final void step5()

step6

private final void step6()

stemIt

private void stemIt()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or getResult().


stem

public java.lang.String stem(java.lang.String str)
Stems a single term.


applyFilter

public void applyFilter(Document d)
Stems all terms in a document's indexible content. Implements the DocumentFilter method.

Specified by:
applyFilter in interface DocumentFilter
Parameters:
d - a Document value

processText

public java.lang.String processText(java.lang.String string)
Returns the string with all terms stemmed.

Parameters:
string - a String containing words. This class assumes that the punctuation has already been dropped and that the words are separated by spaces.
Returns:
the same string with the same words in the same order except that the terms are stemmed.

test

public static void test()
Runs some tests on this class.


main

public static void main(java.lang.String[] args)
Reads a word from stdin and prints its stem.