<<用于C或CUDA中的取幂的用法
发布时间:2020-12-16 09:38:18 所属栏目:百科 来源:网络整理
导读:声明的含义是什么? // create arrays of 1M elementsconst int num_elements = 120; 在下面的代码?它是特定于CUDA还是可以在标准C中使用? 当我打
声明的含义是什么?
// create arrays of 1M elements const int num_elements = 1<<20; 在下面的代码?它是特定于CUDA还是可以在标准C中使用? 当我打印’ed num_elements时,我得到了num_elements == 1048576 原来是2 ^ 20. <<<<<<运算符是C中取幂的简写? // This example demonstrates parallel floating point vector // addition with a simple __global__ function. #include <stdlib.h> #include <stdio.h> // this kernel computes the vector sum c = a + b // each thread performs one pair-wise addition __global__ void vector_add(const float *a,const float *b,float *c,const size_t n) { // compute the global element index this thread should process unsigned int i = threadIdx.x + blockDim.x * blockIdx.x; // avoid accessing out of bounds elements if(i < n) { // sum elements c[i] = a[i] + b[i]; } } int main(void) { // create arrays of 1M elements const int num_elements = 1<<20; // compute the size of the arrays in bytes const int num_bytes = num_elements * sizeof(float); // points to host & device arrays float *device_array_a = 0; float *device_array_b = 0; float *device_array_c = 0; float *host_array_a = 0; float *host_array_b = 0; float *host_array_c = 0; // malloc the host arrays host_array_a = (float*)malloc(num_bytes); host_array_b = (float*)malloc(num_bytes); host_array_c = (float*)malloc(num_bytes); // cudaMalloc the device arrays cudaMalloc((void**)&device_array_a,num_bytes); cudaMalloc((void**)&device_array_b,num_bytes); cudaMalloc((void**)&device_array_c,num_bytes); // if any memory allocation failed,report an error message if(host_array_a == 0 || host_array_b == 0 || host_array_c == 0 || device_array_a == 0 || device_array_b == 0 || device_array_c == 0) { printf("couldn't allocate memoryn"); return 1; } // initialize host_array_a & host_array_b for(int i = 0; i < num_elements; ++i) { // make array a a linear ramp host_array_a[i] = (float)i; // make array b random host_array_b[i] = (float)rand() / RAND_MAX; } // copy arrays a & b to the device memory space cudaMemcpy(device_array_a,host_array_a,num_bytes,cudaMemcpyHostToDevice); cudaMemcpy(device_array_b,host_array_b,cudaMemcpyHostToDevice); // compute c = a + b on the device const size_t block_size = 256; size_t grid_size = num_elements / block_size; // deal with a possible partial final block if(num_elements % block_size) ++grid_size; // launch the kernel vector_add<<<grid_size,block_size>>>(device_array_a,device_array_b,device_array_c,num_elements); // copy the result back to the host memory space cudaMemcpy(host_array_c,cudaMemcpyDeviceToHost); // print out the first 10 results for(int i = 0; i < 10; ++i) { printf("result %d: %1.1f + %7.1f = %7.1fn",i,host_array_a[i],host_array_b[i],host_array_c[i]); } // deallocate memory free(host_array_a); free(host_array_b); free(host_array_c); cudaFree(device_array_a); cudaFree(device_array_b); cudaFree(device_array_c); } 解决方法
不,<<运算符是位移运算符.它取一个数字的位,例如00101,并将它们移到左边的n个位置,这具有将数字乘以2的幂的效果.所以x<< y是x * 2 ^ y.这是数字在内部存储在计算机中的方式的结果,这是二进制的. 例如,数字1,当存储为2位补码的32位整数时(它是):
00000000000000000000000000000001 当你这样做 1 << 20 您正在使用该二进制表示中的所有1并将它们移动超过20个位置: 00000000000100000000000000000000 这是2 ^ 20.这也适用于符号幅度表示,1的补码等. 另一个例子,如果你采用5的表示: 00000000000000000000000000000101 并且做5<< 1,你明白了 00000000000000000000000000001010 这是10,或5 * 2 ^ 1. 相反,>>通过将位移到右边n位,将除以2的幂. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |