stanford nlp - Check for words in input text from words collection -
i have collection of noun phrases around 10,000 words. want check every new input text data these np collection , extract sentences contains of these np. don't want run loops every word because makes code dead slow. using java , stanford corenlp.
a quick , easy way use regexner identify examples of in dictionary, , check non "o" ner tags in sentence.
package edu.stanford.nlp.examples; import edu.stanford.nlp.ling.*; import edu.stanford.nlp.pipeline.*; import edu.stanford.nlp.util.*; import java.util.*; import java.util.stream.collectors; public class findsentenceswithphrase { public static boolean checkfornamedentity(coremap sentence) { (corelabel token : sentence.get(coreannotations.tokensannotation.class)) { if (token.ner() != null && !token.ner().equals("o")) { return true; } } return false; } public static void main(string[] args) { properties props = new properties(); props.setproperty("annotators", "tokenize,ssplit,pos,lemma,regexner"); props.setproperty("regexner.mapping", "phrases.rules"); stanfordcorenlp pipeline = new stanfordcorenlp(props); string exampletext = "this sentence contains phrase \"ice cream\"." + "this sentence not of interest. sentences contains pizza."; annotation ann = new annotation(exampletext); pipeline.annotate(ann); (coremap sentence : ann.get(coreannotations.sentencesannotation.class)) { if (checkfornamedentity(sentence)) { system.out.println("---"); system.out.println(sentence.get(coreannotations.tokensannotation.class). stream().map(token -> token.word()).collect(collectors.joining(" "))); } } } }
the file "phrases.rules" should this:
ice cream phrase_of_interest misc 1 pizza phrase_of_interest misc 1
Comments
Post a Comment