使用Python在NLP中的命名实体识别中提取人名
发布时间:2020-12-20 11:03:07 所属栏目:Python 来源:网络整理
导读:我有一句话,我需要单独识别人名: 例如: sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google,alongside Sergey Brin" 我使用下面的代码来识别NER. from nltk import word_tokenize,pos_tag,n
我有一句话,我需要单独识别人名:
例如: sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google,alongside Sergey Brin" 我使用下面的代码来识别NER. from nltk import word_tokenize,pos_tag,ne_chunk print(ne_chunk(pos_tag(word_tokenize(sentence)))) 我收到的输出是: (S (PERSON Larry/NNP) (ORGANIZATION Page/NNP) is/VBZ an/DT (GPE American/JJ) business/NN magnate/NN and/CC computer/NN scientist/NN who/WP is/VBZ the/DT co-founder/NN of/IN (GPE Google/NNP),/,alongside/RB (PERSON Sergey/NNP Brin/NNP)) 我想提取所有人名,例如 Larry Page Sergey Brin 为了达到这个目的,我对这个link进行了测试并尝试了这一点. from nltk.tag.stanford import StanfordNERTagger st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar') 但是我继续得到这个错误: LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar 我在哪里可以下载这个文件? 如上所述,我期望以列表或字典的形式出现的结果是: Larry Page Sergey Brin 解决方法
在龙
请仔细阅读: > https://stackoverflow.com/a/49345866/610569 了解解决方案,不要只是复制和粘贴. TL; DR 在终端: pip install -U nltk wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31 java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -preload tokenize,ssplit,pos,lemma,parse,depparse -status_port 9000 -port 9000 -timeout 15000 在Python中 from nltk.tag.stanford import CoreNLPNERTagger def get_continuous_chunks(tagged_sent): continuous_chunk = [] current_chunk = [] for token,tag in tagged_sent: if tag != "O": current_chunk.append((token,tag)) else: if current_chunk: # if the current chunk is not empty continuous_chunk.append(current_chunk) current_chunk = [] # Flush the final current_chunk into the continuous_chunk,if any. if current_chunk: continuous_chunk.append(current_chunk) return continuous_chunk stner = CoreNLPNERTagger() tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split()) named_entities = get_continuous_chunks(tagged_sent) named_entities_str_tag = [(" ".join([token for token,tag in ne]),ne[0][1]) for ne in named_entities] print(named_entities_str_tag) [OUT]: [('Rami Eid','PERSON'),('Stony Brook University','ORGANIZATION'),('NY','LOCATION')] 你也可以找到这个帮助:Unpacking a list / tuple of pairs into two lists / tuples (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |