如何在Linux上并行化Python程序

发布时间：2020-12-14 00:43:55 所属栏目：Linux 来源：网络整理

导读：我有一个脚本,它输入一个文件名列表并在它们上面循环,以便为每个输入文件生成一个输出文件,所以这是一个可以轻松并行化的情况. 我有一台8核机器. 我尝试在此命令上使用-parallel标志： python perfile_code.py list_of_files.txt 但我无法使其工作,即具体问

我有一个脚本,它输入一个文件名列表并在它们上面循环,以便为每个输入文件生成一个输出文件,所以这是一个可以轻松并行化的情况.

我有一台8核机器.

我尝试在此命令上使用-parallel标志：

python perfile_code.py list_of_files.txt

但我无法使其工作,即具体问题是：如何在Linux中使用python命令在bash中使用parallel,以及上面提到的特定情况的参数.

有一个Linux并行命令(sudo apt-get install parallel),我在某处读过可以完成这项工作,但我不知道如何使用它.

大多数互联网资源解释了如何在python中完成它,但它可以用bash完成吗？

请帮忙,谢谢.

Based on an answer,here is a working example that is still not working,please suggest how to make it work.

我有一个包含2个文件的文件夹,我只是想在此示例中并行创建具有不同名称的副本.

# filelist is the directory containing two file names,a.txt and b.txt.
# a.txt is the first file,b.xt is the second file
# i pass an .txt file with both the names to the main program

from concurrent.futures import ProcessPoolExecutor,as_completed
from pathlib import Path
import sys

def translate(filename):
    print(filename)
    f = open(filename,"r")
    g = open(filename + ".x","w")
    for line in f:
        g.write(line)

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            executor.submit(translate,"filelist/" + filename)
        for future in as_completed(futures):
            future.result()

if __name__ == "__main__":
     main(sys.argv[1])

解决方法

您可以使用普通的shell命令,并附加& for：中的python命令的后台指示符：

for file in `cat list_of_files.txt`;
   do python perfile_code.py $file &
done

当然,假设你的python代码会自己生成单独的输出.

就是这么简单.
虽然不常见 – 一般来说,如果您可以编辑程序,人们会倾向于使用Python本身来控制循环的并行执行.一个很好的方法是在Python中使用concurrent.futures来创建一个包含8个worker的工作池 – 上面的shell方法将同时并行启动所有实例.

假设你的代码有一个带有文件名的translate函数,你的Python代码可以写成：

from concurrent.futures import ProcessPoolExecutor,as_completed
from pathlib import Path:

def translate(filename):
    ...

def main(path_to_file_with_list):
    futures = []
    with ProcessPoolExecutor(max_workers=8) as executor:
        for filename in Path(path_to_file_with_list).open():
            executor.submit(translate,filename)
        for future in as_completed(futures):
            future.result()

if __name__ == "__main__":
     import sys
     main(argv[1])

这不依赖于特殊的shell语法,并且处理极端情况以及数字或工作者处理,这可能很难从bash中正确执行.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!