加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

python – nltk不会在搜索路径中添加$NLTK_DATA吗?

发布时间:2020-12-20 10:33:37 所属栏目:Python 来源:网络整理
导读:在 linux下,我设置了env var $NLTK_DATA(‘/ home / user / data / nltk’),并且按预期进行了测试 from nltk.corpus import brown brown.words()['The','Fulton','County','Grand','Jury','said',...] 但是当运行另一个python脚本时,我得到了: LookupError:
在 linux下,我设置了env var $NLTK_DATA(‘/ home / user / data / nltk’),并且按预期进行了测试

>>> from nltk.corpus import brown
>>> brown.words()
['The','Fulton','County','Grand','Jury','said',...]

但是当运行另一个python脚本时,我得到了:

LookupError: 
**********************************************************************
Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:  >>>
nltk.download()
Searched in:
- '/home/user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

我们可以看到,在手动附加NLTK_DATA目录后,nltk不会向搜索路径添加$NLTK_DATA:

nltk.data.path.append("/NLTK_DATA_DIR");

脚本按预期运行,问题是:

如何让nltk自动将$NLTK_DATA添加到它的搜索路径?

解决方法

如果您不想在运行脚本之前设置$NLTK_DATA,则可以在python脚本中执行以下操作:

import nltk
nltk.path.append('/home/alvas/some_path/nltk_data/')

例如.让我们将nltk_data移动到NLTK无法自动找到的非标准路径:

alvas@ubi:~$ls nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers
alvas@ubi:~$mkdir some_path
alvas@ubi:~$mv nltk_data/ some_path/
alvas@ubi:~$ls nltk_data/
ls: cannot access nltk_data/: No such file or directory
alvas@ubi:~$ls some_path/nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers

现在,我们使用nltk.path.append()hack:

alvas@ubi:~$python
>>> import os
>>> import nltk
>>> nltk.path.append('/home/alvas/some_path/nltk_data/')
>>> nltk.pos_tag('this is a foo bar'.split())
[('this','DT'),('is','VBZ'),('a',('foo','JJ'),('bar','NN')]
>>> nltk.data
<module 'nltk.data' from '/usr/local/lib/python2.7/dist-packages/nltk/data.pyc'>
>>> nltk.data.path
['/home/alvas/some_path/nltk_data/','/home/alvas/nltk_data','/usr/share/nltk_data','/usr/local/share/nltk_data','/usr/lib/nltk_data','/usr/local/lib/nltk_data']
>>> exit()

让我们把它移回去看它是否有效:

alvas@ubi:~$ls nltk_data
ls: cannot access nltk_data: No such file or directory
alvas@ubi:~$mv some_path/nltk_data/ .
alvas@ubi:~$python
>>> import nltk
>>> nltk.data.path
['/home/alvas/nltk_data','/usr/local/lib/nltk_data']
>>> nltk.pos_tag('this is a foo bar'.split())
[('this','NN')]

如果您真的想自动找到nltk_data,请使用以下内容:

import scandir
import os,sys
import time

import nltk

def find(name,path):
    for root,dirs,files in scandir.walk(path):
        if root.endswith(name):
            return root

def find_nltk_data():
    start = time.time()
    path_to_nltk_data = find('nltk_data','/')
    print >> sys.stderr,'Finding nltk_data took',time.time() - start
    print >> sys.stderr,'nltk_data at',path_to_nltk_data
    with open('where_is_nltk_data.txt','w') as fout:
        fout.write(path_to_nltk_data)
    return path_to_nltk_data

def magically_find_nltk_data():
    if os.path.exists('where_is_nltk_data.txt'):
        with open('where_is_nltk_data.txt') as fin:
            path_to_nltk_data = fin.read().strip()
        if os.path.exists(path_to_nltk_data):
            nltk.data.path.append(path_to_nltk_data)
        else:
            nltk.data.path.append(find_nltk_data())
    else:
        path_to_nltk_data  = find_nltk_data()
        nltk.data.path.append(path_to_nltk_data)


magically_find_nltk_data()
print nltk.pos_tag('this is a foo bar'.split())

我们称之为python脚本test.py:

alvas@ubi:~$ls nltk_data/
chunkers  corpora  grammars  help  misc  models  stemmers  taggers  tokenizers
alvas@ubi:~$python test.py
Finding nltk_data took 4.27330780029
nltk_data at /home/alvas/nltk_data
[('this','NN')]
alvas@ubi:~$mv nltk_data/ some_path/
alvas@ubi:~$python test.py
Finding nltk_data took 4.75850391388
nltk_data at /home/alvas/some_path/nltk_data
[('this','NN')]

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读