iglu.ir
Class StopList

java.lang.Object
  |
  +--iglu.ir.StopList
All Implemented Interfaces:
DocumentFilter, java.io.Serializable

public class StopList
extends java.lang.Object
implements DocumentFilter, java.io.Serializable

Implementation of a simple stoplist.

Version:
1.0
Author:
Ryan Scherle rscherle@acm.org, Travis Bauer trbauer@indiana.edu
See Also:
Serialized Form

Field Summary
private  java.util.HashSet wordSet
           
 
Constructor Summary
StopList()
          Constructs a stoplist from the file stoplist.txt in the resources directory of IGLU.
StopList(java.util.Collection c)
          Constructs a new stoplist from a Collection of terms.
StopList(java.io.File list)
          Constructs a new stoplist from the contents of a file.
StopList(java.io.InputStream is)
          Constructs a new stoplist from the input stream, which is assumed to have one stopword on each line.
StopList(java.io.Reader r)
          Constructs a new stoplist from the Reader, which is assumed to provide one stopword on each line of input.
 
Method Summary
 void applyFilter(Document d)
          Filters all stopwords out of a a document's indexible content.
 boolean contains(java.lang.String word)
          Returns true if the word is in the stoplist.
 java.util.HashSet getList()
          Returns the list of stopwords.
private  void initializeFromReader(java.io.Reader r)
          Adds words to the stoplist from a Reader.
static void main(java.lang.String[] args)
          Tests to see if a word is in the stoplist.
 java.lang.String processText(java.lang.String string)
          Returns the string with the stop words dropped out.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

wordSet

private java.util.HashSet wordSet
Constructor Detail

StopList

public StopList()
Constructs a stoplist from the file stoplist.txt in the resources directory of IGLU.


StopList

public StopList(java.io.InputStream is)
Constructs a new stoplist from the input stream, which is assumed to have one stopword on each line.


StopList

public StopList(java.io.Reader r)
Constructs a new stoplist from the Reader, which is assumed to provide one stopword on each line of input.


StopList

public StopList(java.io.File list)
Constructs a new stoplist from the contents of a file. It is assumed that the file contains stopwords, one on a line. The stopwords need not be in any order.


StopList

public StopList(java.util.Collection c)
Constructs a new stoplist from a Collection of terms.

Method Detail

initializeFromReader

private void initializeFromReader(java.io.Reader r)
Adds words to the stoplist from a Reader.


contains

public boolean contains(java.lang.String word)
Returns true if the word is in the stoplist.


processText

public java.lang.String processText(java.lang.String string)
Returns the string with the stop words dropped out.

Parameters:
string - a String containing words. This class assumes that the punctuation has already been dropped and that the words are separated by spaces.
Returns:
the same string with the same words in the same order except that the stop words are dropped.

getList

public java.util.HashSet getList()
Returns the list of stopwords.


applyFilter

public void applyFilter(Document d)
Filters all stopwords out of a a document's indexible content. Implements the DocumentFilter method.

Specified by:
applyFilter in interface DocumentFilter
Parameters:
d - a Document value

main

public static void main(java.lang.String[] args)
Tests to see if a word is in the stoplist. If a stoplist file is given on the command line, that list is used. Otherwise, looks for a file called stoplist.txt in the user's lib directory.