在Scala中的POS标记
发布时间:2020-12-16 09:01:22 所属栏目:安全 来源:网络整理
导读:我试图使用Stanford解析器在 Scala中标记一个句子,如下所示 val lp:LexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");lp.setOptionFlags("-maxLength","50","-retainTmpSubcategories")val s = "I
我试图使用Stanford解析器在
Scala中标记一个句子,如下所示
val lp:LexicalizedParser = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz"); lp.setOptionFlags("-maxLength","50","-retainTmpSubcategories") val s = "I love to play" val parse :Tree = lp.apply(s) val taggedWords = parse.taggedYield() println(taggedWords) 我有一个错误类型不匹配;发现:java.lang.String必需:java.util.List [_< ;: edu.stanford.nlp.ling.HasWord]在行val解析:Tree = lp.apply(s) 我不知道这是否是正确的做法.在Scala中有没有其他简单的方法来标记一个句子? 解决方法
你可能想考虑FACTORIE工具包(
http://github.com/factorie/factorie).它是机器学习和图形模型的通用库,恰好包括一套广泛的自然语言处理组件(标记化,标记归一化,形态分析,句子分割,词性标注,命名实体识别,依赖解析,提及发现,共同关系).
此外,它完全在Scala中编写,它是根据Apache许可证发布的. 文件目前很少,但在未来数月将会有所改善. 例如,一旦基于Maven的安装完成,您可以在命令行中键入: bin/fac nlp --pos1 --parser1 --ner1 启动套接字监听多线程NLP服务器.然后通过管道纯文本到其套接字号进行查询: echo "Mr. Jones took a job at Google in New York. He and his Australian wife moved from New South Wales on 4/1/12." | nc localhost 3228 然后输出 1 1 Mr. NNP 2 nn O 2 2 Jones NNP 3 nsubj U-PER 3 3 took VBD 0 root O 4 4 a DT 5 det O 5 5 job NN 3 dobj O 6 6 at IN 3 prep O 7 7 Google NNP 6 pobj U-ORG 8 8 in IN 7 prep O 9 9 New NNP 10 nn B-LOC 10 10 York NNP 8 pobj L-LOC 11 11 . . 3 punct O 12 1 He PRP 6 nsubj O 13 2 and CC 1 cc O 14 3 his PRP$ 5 poss O 15 4 Australian JJ 5 amod U-MISC 16 5 wife NN 6 nsubj O 17 6 moved VBD 0 root O 18 7 from IN 6 prep O 19 8 New NNP 9 nn B-LOC 20 9 South NNP 10 nn I-LOC 21 10 Wales NNP 7 pobj L-LOC 22 11 on IN 6 prep O 23 12 4/1/12 NNP 11 pobj O 24 13 . . 6 punct O 当然,所有这些功能都有编程API. import cc.factorie._ import cc.factorie.app.nlp._ val doc = new Document("Education is the most powerful weapon which you can use to change the world.") DocumentAnnotatorPipeline(pos.POS1).process(doc) for (token <- doc.tokens) println("%-10s %-5s".format(token.string,token.posLabel.categoryValue)) 将输出: Education NN is VBZ the DT most RBS powerful JJ weapon NN which WDT you PRP can MD use VB to TO change VB the DT world NN . . (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |