如何并行mmap以便更快地读取文件？

发布时间：2020-12-16 06:49:58 所属栏目：百科来源：网络整理

导读：我正在研究 this code并且现在正在使用mmap,但我想知道我是否可以并行使用mmap,如果是的话,如何实现它.假设我在并行文件系统(GPFS,RAID0,无论什么)上有我的数据,我想用n个进程读取它. 例如,我怎样才能让每个处理器将1 / n个连续的数据块读入内存？或者,或者,

我正在研究 this code并且现在正在使用mmap,但我想知道我是否可以并行使用mmap,如果是的话,如何实现它.假设我在并行文件系统(GPFS,RAID0,无论什么)上有我的数据,我想用n个进程读取它.

例如,我怎样才能让每个处理器将1 / n个连续的数据块读入内存？或者,或者,将每第n个内存块(1 B,1 MB,100 MB,1 GB,无论我选择哪种优化)读入内存？

我在这里假设一个posix文件系统.

解决方法

这是我用于并行读取的mpi功能.它根据pagesize将文件切割成n个连续的片段,并让每个进程通过mmap读取一个单独的片段.一些额外的技巧需要在最后完成,因为我将(可能)获得一行的前半部分作为它的最后一行,并且处理i 1将获得与第一行相同的行的后半部分.

ikind nchars_orig; // how many characters were in the original file
int pagesize = getpagesize();
off_t offset;
struct stat file_stat;
int finp = open(inpfile,O_RDONLY);
int status = fstat(finp,&file_stat);
nchars_orig = file_stat.st_size;

// find out hwich pieces of the file each process should read
ikind nchars_per_proc[nprocs];
for(int ii = 0; ii < nprocs; ii++) {
    nchars_per_proc[ii] = 0;
}   
// start at the second to last proc,so the last proc will get hit first
// we will decrement him at the end,so this will distribute the work more evenly
int jproc = nprocs-2;
ikind nchars_tot = 0;
ikind nchardiff = 0;
for(ikind ic = 0; ic < nchars_orig; ic+= pagesize) {
    jproc += 1;
    nchars_tot += pagesize;
    if(jproc == nprocs) jproc = 0;
    if(nchars_tot > nchars_orig) nchardiff = nchars_tot - nchars_orig;
    nchars_per_proc[jproc] += pagesize;
}   
nchars = nchars_per_proc[iproc];
if( iproc == nprocs-1 ) nchars = nchars - nchardiff;
offset = 0;
for(int ii = 0; ii < nprocs; ii++) {
    if( ii < iproc ) offset += nchars_per_proc[ii];
} 
cs = (char*)mmap(0,nchars,PROT_READ,MAP_PRIVATE,finp,offset);

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!