为什么在Linux中访问内存对齐的缓冲区更加昂贵？

发布时间：2020-12-14 01:55:03 所属栏目：Linux 来源：网络整理

导读：在下面的程序中,我有2个缓冲区,一个是64字节对齐的另一个,我假设在运行2.6.x内核的64位 Linux主机上是16字节对齐. 缓存行长度为64byte.所以,在这个程序中,我一次只访问一个缓存行.我希望看到posix_memaligned相等,如果不比非对齐缓冲区快. 以下是一些指标 ./

在下面的程序中,我有2个缓冲区,一个是64字节对齐的另一个,我假设在运行2.6.x内核的64位 Linux主机上是16字节对齐.

缓存行长度为64byte.所以,在这个程序中,我一次只访问一个缓存行.我希望看到posix_memaligned相等,如果不比非对齐缓冲区快.
以下是一些指标

./readMemory 10000000

time taken by posix_memaligned buffer: 293020299 
time taken by standard buffer: 119724294 

./readMemory 100000000

time taken by posix_memaligned buffer: 548849137 
time taken by standard buffer: 211197082

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/time.h>

void now(struct timespec * t);

int main(int argc,char **argv)
{        
  char *buf;        
  struct timespec st_time,end_time;        
  int runs;        
  if (argc !=2) 
  {
             printf("Usage: ./readMemory <number of runs>n");                
             exit(1);        
  }        
  errno = 0;        
  runs = strtol(argv[1],NULL,10);        
  if (errno !=0)        {
            printf("Invalid number of runs: %s n",argv[1]);
            exit(1);
    }

    int returnVal = -1;

    returnVal = posix_memalign((void **)&buf,64,1024);
    if (returnVal != 0)
    {
            printf("error in posix_memalighn");
    }

    char tempBuf[64];
    char * temp = buf;

    size_t cpyBytes = 64;

    now(&st_time);
    for(int x=0; x<runs; x++) {
    temp = buf;
    for(int i=0; i < ((1024/64) -1); i+=64)
    {
            memcpy(tempBuf,temp,cpyBytes);
            temp += 64;
    }
    }
    now(&end_time);

    printf("time taken by posix_memaligned buffer: %ld n",(end_time.tv_nsec - st_time.tv_nsec));

    char buf1[1024];        
    temp = buf1;        
    now(&st_time);        
    for(int x=0; x<runs; x++) 
    {        
      temp = buf1;        
      for(int i=0; i < ((1024/64) -1); i+=64)        
     {                
        memcpy(tempBuf,cpyBytes);                
        temp += 64;        
      }          
    }        
    now(&end_time);        
    printf("time taken by standard buffer: %ld n",(end_time.tv_nsec - st_time.tv_nsec));
    return 0;
}

void now(struct timespec *tnow)
{
    if(clock_gettime(CLOCK_MONOTONIC_RAW,tnow) <0 )
    {
            printf("error getting time");
            exit(1);
    }
}

第一个循环的反汇编是

movq    -40(%rbp),%rdx        
    movq    -48(%rbp),%rcx        
    leaq    -176(%rbp),%rax
    movq    %rcx,%rsi
    movq    %rax,%rdi
    call    memcpy
    addq    $64,-48(%rbp)
    addl    $64,-20(%rbp)

第二个循环的反汇编是

movq    -40(%rbp),%rdx
    movq    -48(%rbp),%rcx
    leaq    -176(%rbp),-4(%rbp)

解决方法

原因可能是缓冲区的相对对齐.

当复制字对齐数据(32/64位)时,memcpy工作最快.
如果两个缓冲区对齐良好,则一切正常.
如果两个缓冲区以相同的方式错位,memcpy通过逐字节复制一个小前缀,然后在余数上逐字运行来处理它.

但是,如果一个缓冲区是字对齐而另一个缓冲区不是,那么就没有办法让读写字对齐.所以memcpy仍然一字一句地工作,但是一半的内存访问严重对齐.

如果两个堆栈缓冲区都以相同的方式未对齐(例如,两个地址都是8 * x 2),但是posix_memalign的缓冲区是对齐的,它可以解释你看到的内容.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!