c – Pthread程序在线程增加时运行较慢
发布时间:2020-12-16 07:08:55 所属栏目:百科 来源:网络整理
导读:我是并行编程的初学者,我尝试用pthread库编写并行程序.我在8处理器计算机上运行程序.问题在于,当我增加NumProcs时,每个线程都会减慢,尽管它们的任务总是相同的.有人可以帮我弄清楚发生了什么吗? ` #define MAX_NUMP 16using namespace std;int NumProcs;pth
我是并行编程的初学者,我尝试用pthread库编写并行程序.我在8处理器计算机上运行程序.问题在于,当我增加NumProcs时,每个线程都会减慢,尽管它们的任务总是相同的.有人可以帮我弄清楚发生了什么吗?
` #define MAX_NUMP 16 using namespace std; int NumProcs; pthread_mutex_t SyncLock; /* mutex */ pthread_cond_t SyncCV; /* condition variable */ int SyncCount; /* number of processors at the barrier so far */ pthread_mutex_t ThreadLock; /* mutex */ // used only in solaris. use clock_gettime in linux //hrtime_t StartTime; //hrtime_t EndTime; struct timespec StartTime; struct timespec EndTime; void Barrier() { int ret; pthread_mutex_lock(&SyncLock); /* Get the thread lock */ SyncCount++; if(SyncCount == NumProcs) { ret = pthread_cond_broadcast(&SyncCV); assert(ret == 0); } else { ret = pthread_cond_wait(&SyncCV,&SyncLock); assert(ret == 0); } pthread_mutex_unlock(&SyncLock); } /* The function which is called once the thread is allocated */ void* ThreadLoop(void* tmp) { /* each thread has a private version of local variables */ long threadId = (long) tmp; int ret; int startTime,endTime; int count=0; /* ********************** Thread Synchronization*********************** */ Barrier(); /* ********************** Execute Job ********************************* */ startTime = clock(); for(int i=0;i<65536;i++) for(int j=0;j<1024;j++) count++; endTime = clock(); printf("threadid:%ld,time:%dn",threadId,endTime-startTime); } int main(int argc,char** argv) { pthread_t* threads; pthread_attr_t attr; int ret; int dx; if(argc != 2) { fprintf(stderr,"USAGE: %s <numProcesors>n",argv[0]); exit(-1); } assert(argc == 2); NumProcs = atoi(argv[1]); assert(NumProcs > 0 && NumProcs <= MAX_NUMP); /* Initialize array of thread structures */ threads = (pthread_t *) malloc(sizeof(pthread_t) * NumProcs); assert(threads != NULL); /* Initialize thread attribute */ pthread_attr_init(&attr); pthread_attr_setscope(&attr,PTHREAD_SCOPE_SYSTEM); // sys manages contention /* Initialize mutexs */ ret = pthread_mutex_init(&SyncLock,NULL); assert(ret == 0); ret = pthread_mutex_init(&ThreadLock,NULL); assert(ret == 0); /* Init condition variable */ ret = pthread_cond_init(&SyncCV,NULL); assert(ret == 0); SyncCount = 0; Count = 0; /* get high resolution timer,timer is expressed in nanoseconds,relative * to some arbitrary time.. so to get delta time must call gethrtime at * the end of operation and subtract the two times. */ //StartTime = gethrtime(); ret = clock_gettime(CLOCK_MONOTONIC,&StartTime); for(dx=0; dx < NumProcs; dx++) { /* ************************************************************ * pthread_create takes 4 parameters * p1: threads(output) * p2: thread attribute * p3: start routine,where new thread begins * p4: arguments to the thread * ************************************************************ */ ret = pthread_create(&threads[dx],&attr,ThreadLoop,(void*) dx); assert(ret == 0); } /* Wait for each of the threads to terminate */ for(dx=0; dx < NumProcs; dx++) { ret = pthread_join(threads[dx],NULL); assert(ret == 0); } //EndTime = gethrtime(); ret = clock_gettime(CLOCK_MONOTONIC,&EndTime); printf("Time = %ld nanosecondsn",EndTime.tv_nsec - StartTime.tv_nsec); pthread_mutex_destroy(&ThreadLock); pthread_mutex_destroy(&SyncLock); pthread_cond_destroy(&SyncCV); pthread_attr_destroy(&attr); return 0; } 解决方法
你的意见是预期的.
通常影响这种情况的主要因素(工人在本地计算上旋转)是: >比率nb_threads / nb_available_machine_cores 这里的最佳方案是当比率为1时,每个线程与其中一个核心具有唯一的亲和力. 我们的想法是最大化每个核心吞吐量.你可以通过在每个核心上运行一个且只有一个线程来实现.如果增加线程数(比率> 1),多个线程将共享同一个内核,迫使内核(通过任务调度程序)在每个线程的执行之间切换.这就是你所观察到的. 每次内核必须操作这样的开关时,您需要支付上下文切换.它可能会成为明显的开销. 注意: 您可以使用pthread_setaffinity设置线程的亲缘关系. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
相关内容