加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

使用Python在NLP中的命名实体识别中提取人名

发布时间:2020-12-20 11:03:07 所属栏目:Python 来源:网络整理
导读:我有一句话,我需要单独识别人名: 例如: sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google,alongside Sergey Brin" 我使用下面的代码来识别NER. from nltk import word_tokenize,pos_tag,n
我有一句话,我需要单独识别人名:

例如:

sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google,alongside Sergey Brin"

我使用下面的代码来识别NER.

from nltk import word_tokenize,pos_tag,ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))

我收到的输出是:

(S
  (PERSON Larry/NNP)
  (ORGANIZATION Page/NNP)
  is/VBZ
  an/DT
  (GPE American/JJ)
  business/NN
  magnate/NN
  and/CC
  computer/NN
  scientist/NN
  who/WP
  is/VBZ
  the/DT
  co-founder/NN
  of/IN
  (GPE Google/NNP),/,alongside/RB
  (PERSON Sergey/NNP Brin/NNP))

我想提取所有人名,例如

Larry Page
Sergey Brin

为了达到这个目的,我对这个link进行了测试并尝试了这一点.

from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')

但是我继续得到这个错误:

LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar

我在哪里可以下载这个文件?

如上所述,我期望以列表或字典的形式出现的结果是:

Larry Page
Sergey Brin

解决方法

在龙

请仔细阅读:

> https://stackoverflow.com/a/49345866/610569
> Extract list of Persons and Organizations using Stanford NER Tagger in NLTK

了解解决方案,不要只是复制和粘贴.

TL; DR

在终端:

pip install -U nltk

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer 
-preload tokenize,ssplit,pos,lemma,parse,depparse 
-status_port 9000 -port 9000 -timeout 15000

在Python中

from nltk.tag.stanford import CoreNLPNERTagger

def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token,tag in tagged_sent:
        if tag != "O":
            current_chunk.append((token,tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk,if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk


stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())

named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token,tag in ne]),ne[0][1]) for ne in named_entities]


print(named_entities_str_tag)

[OUT]:

[('Rami Eid','PERSON'),('Stony Brook University','ORGANIZATION'),('NY','LOCATION')]

你也可以找到这个帮助:Unpacking a list / tuple of pairs into two lists / tuples

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读