如何在Linux上并行化Python程序
我有一个脚本,它输入一个文件名列表并在它们上面循环,以便为每个输入文件生成一个输出文件,所以这是一个可以轻松并行化的情况.
我有一台8核机器. 我尝试在此命令上使用-parallel标志: python perfile_code.py list_of_files.txt 但我无法使其工作,即具体问题是:如何在Linux中使用python命令在bash中使用parallel,以及上面提到的特定情况的参数. 有一个Linux并行命令(sudo apt-get install parallel),我在某处读过可以完成这项工作,但我不知道如何使用它. 大多数互联网资源解释了如何在python中完成它,但它可以用bash完成吗? 请帮忙,谢谢. Based on an answer,here is a working example that is still not working,please suggest how to make it work. 我有一个包含2个文件的文件夹,我只是想在此示例中并行创建具有不同名称的副本. # filelist is the directory containing two file names,a.txt and b.txt. # a.txt is the first file,b.xt is the second file # i pass an .txt file with both the names to the main program from concurrent.futures import ProcessPoolExecutor,as_completed from pathlib import Path import sys def translate(filename): print(filename) f = open(filename,"r") g = open(filename + ".x","w") for line in f: g.write(line) def main(path_to_file_with_list): futures = [] with ProcessPoolExecutor(max_workers=8) as executor: for filename in Path(path_to_file_with_list).open(): executor.submit(translate,"filelist/" + filename) for future in as_completed(futures): future.result() if __name__ == "__main__": main(sys.argv[1]) 解决方法
您可以使用普通的shell命令,并附加& for:中的python命令的后台指示符:
for file in `cat list_of_files.txt`; do python perfile_code.py $file & done 当然,假设你的python代码会自己生成单独的输出. 就是这么简单. 假设你的代码有一个带有文件名的translate函数,你的Python代码可以写成: from concurrent.futures import ProcessPoolExecutor,as_completed from pathlib import Path: def translate(filename): ... def main(path_to_file_with_list): futures = [] with ProcessPoolExecutor(max_workers=8) as executor: for filename in Path(path_to_file_with_list).open(): executor.submit(translate,filename) for future in as_completed(futures): future.result() if __name__ == "__main__": import sys main(argv[1]) 这不依赖于特殊的shell语法,并且处理极端情况以及数字或工作者处理,这可能很难从bash中正确执行. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |