python – nltk不会在搜索路径中添加$NLTK_DATA吗?
发布时间:2020-12-20 10:33:37 所属栏目:Python 来源:网络整理
导读:在 linux下,我设置了env var $NLTK_DATA(‘/ home / user / data / nltk’),并且按预期进行了测试 from nltk.corpus import brown brown.words()['The','Fulton','County','Grand','Jury','said',...] 但是当运行另一个python脚本时,我得到了: LookupError:
在
linux下,我设置了env var $NLTK_DATA(‘/ home / user / data / nltk’),并且按预期进行了测试
>>> from nltk.corpus import brown >>> brown.words() ['The','Fulton','County','Grand','Jury','said',...] 但是当运行另一个python脚本时,我得到了: LookupError: ********************************************************************** Resource u'tokenizers/punkt/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/home/user/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - u'' 我们可以看到,在手动附加NLTK_DATA目录后,nltk不会向搜索路径添加$NLTK_DATA: nltk.data.path.append("/NLTK_DATA_DIR"); 脚本按预期运行,问题是: 如何让nltk自动将$NLTK_DATA添加到它的搜索路径? 解决方法
如果您不想在运行脚本之前设置$NLTK_DATA,则可以在python脚本中执行以下操作:
import nltk nltk.path.append('/home/alvas/some_path/nltk_data/') 例如.让我们将nltk_data移动到NLTK无法自动找到的非标准路径: alvas@ubi:~$ls nltk_data/ chunkers corpora grammars help misc models stemmers taggers tokenizers alvas@ubi:~$mkdir some_path alvas@ubi:~$mv nltk_data/ some_path/ alvas@ubi:~$ls nltk_data/ ls: cannot access nltk_data/: No such file or directory alvas@ubi:~$ls some_path/nltk_data/ chunkers corpora grammars help misc models stemmers taggers tokenizers 现在,我们使用nltk.path.append()hack: alvas@ubi:~$python >>> import os >>> import nltk >>> nltk.path.append('/home/alvas/some_path/nltk_data/') >>> nltk.pos_tag('this is a foo bar'.split()) [('this','DT'),('is','VBZ'),('a',('foo','JJ'),('bar','NN')] >>> nltk.data <module 'nltk.data' from '/usr/local/lib/python2.7/dist-packages/nltk/data.pyc'> >>> nltk.data.path ['/home/alvas/some_path/nltk_data/','/home/alvas/nltk_data','/usr/share/nltk_data','/usr/local/share/nltk_data','/usr/lib/nltk_data','/usr/local/lib/nltk_data'] >>> exit() 让我们把它移回去看它是否有效: alvas@ubi:~$ls nltk_data ls: cannot access nltk_data: No such file or directory alvas@ubi:~$mv some_path/nltk_data/ . alvas@ubi:~$python >>> import nltk >>> nltk.data.path ['/home/alvas/nltk_data','/usr/local/lib/nltk_data'] >>> nltk.pos_tag('this is a foo bar'.split()) [('this','NN')] 如果您真的想自动找到nltk_data,请使用以下内容: import scandir import os,sys import time import nltk def find(name,path): for root,dirs,files in scandir.walk(path): if root.endswith(name): return root def find_nltk_data(): start = time.time() path_to_nltk_data = find('nltk_data','/') print >> sys.stderr,'Finding nltk_data took',time.time() - start print >> sys.stderr,'nltk_data at',path_to_nltk_data with open('where_is_nltk_data.txt','w') as fout: fout.write(path_to_nltk_data) return path_to_nltk_data def magically_find_nltk_data(): if os.path.exists('where_is_nltk_data.txt'): with open('where_is_nltk_data.txt') as fin: path_to_nltk_data = fin.read().strip() if os.path.exists(path_to_nltk_data): nltk.data.path.append(path_to_nltk_data) else: nltk.data.path.append(find_nltk_data()) else: path_to_nltk_data = find_nltk_data() nltk.data.path.append(path_to_nltk_data) magically_find_nltk_data() print nltk.pos_tag('this is a foo bar'.split()) 我们称之为python脚本test.py: alvas@ubi:~$ls nltk_data/ chunkers corpora grammars help misc models stemmers taggers tokenizers alvas@ubi:~$python test.py Finding nltk_data took 4.27330780029 nltk_data at /home/alvas/nltk_data [('this','NN')] alvas@ubi:~$mv nltk_data/ some_path/ alvas@ubi:~$python test.py Finding nltk_data took 4.75850391388 nltk_data at /home/alvas/some_path/nltk_data [('this','NN')] (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |