Exception while using apache lucene for stop words removal -
i using following code stop words removal input text. getting following exception when tokenstream.incrementtoken()
runs.
java.lang.illegalstateexception: tokenstream contract violation: reset()/close() call missing, reset() called multiple times, or subclass not call super.reset(). please see javadocs of tokenstream class more information correct consuming workflow.
code :
public static string removestopwords(string textfile) throws exception { chararrayset stopwords = englishanalyzer.getdefaultstopset(); tokenstream tokenstream = new standardtokenizer(); tokenstream = new stopfilter(tokenstream, stopwords); stringbuilder sb = new stringbuilder(); chartermattribute chartermattribute = tokenstream.addattribute(chartermattribute.class); tokenstream.reset(); while (tokenstream.incrementtoken()) { string term = chartermattribute.tostring(); sb.append(term + " "); } return sb.tostring(); }
instantiate tokenstream below -
tokenstream tokenstream = new standardanalyzer().tokenstream("field",new stringreader(textfile));
Comments
Post a Comment