加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

c – CUDA:嵌入式循环内核

发布时间:2020-12-16 06:55:32 所属栏目:百科 来源:网络整理
导读:我有一些代码,我想进入一个cuda内核.看吧: for (r = Y; r Y + H; r+=2) { ch1RowSum = ch2RowSum = ch3RowSum = 0; for (c = X; c X + W; c+=2) { chan1Value = //some calc'd value chan3Value = //some calc'd value chan2Value = //some calc'd value ch
我有一些代码,我想进入一个cuda内核.看吧:

for (r = Y; r < Y + H; r+=2)
    {
        ch1RowSum = ch2RowSum = ch3RowSum = 0;
        for (c = X; c < X + W; c+=2)
        {
            chan1Value = //some calc'd value
                            chan3Value = //some calc'd value
            chan2Value = //some calc'd value
            ch2RowSum  += chan2Value;
            ch3RowSum  += chan3Value;
            ch1RowSum  += chan1Value;
        }
        ch1Mean += ch1RowSum / W;
        ch2Mean += ch2RowSum / W;
        ch3Mean += ch3RowSum / W;
    }

这应该分成两个内核,一个用于计算RowSums,另一个用于计算均值,我应该如何处理我的循环索引从零开始并在N结束的事实?

解决方法

假设你有一个计算三个值的内核.配置中的每个线程将计算每个(r,c)对的三个值.

__global__ value_kernel(Y,H,X,W)
{
    r = blockIdx.x + Y;
    c = threadIdx.x + W;

    chan1value = ...
    chan2value = ...
    chan3value = ...
}

我不相信你可以在上面的内核中计算总和(至少完全平行).您将无法像上面那样使用=.你可以把它全部放在一个内核中,如果你在每个块(行)中只有一个线程做总和和意思,就像这样……

__global__ both_kernel(Y,W)
{
    r = blockIdx.x + Y;
    c = threadIdx.x + W;

    chan1value = ...
    chan2value = ...
    chan3value = ...

    if(threadIdx.x == 0)
    {
        ch1RowSum = 0;
        ch2RowSum = 0;
        ch3RowSum = 0;

        for(i=0; i<blockDim.x; i++)
        {
            ch1RowSum += chan1value;
            ch2RowSum += chan2value;
            ch3RowSum += chan3value;
        }

        ch1Mean = ch1RowSum / blockDim.x;
        ch2Mean = ch2RowSum / blockDim.x;
        ch3Mean = ch3RowSum / blockDim.x;
    }
}

但是最好使用第一个值内核,然后使用第二个内核来获得总和,这意味着…可以进一步并行化下面的内核,如果它是独立的,你可以在准备好时专注于它.

__global__ sum_kernel(Y,W)
{
    r = blockIdx.x + Y;

    ch1RowSum = 0;
    ch2RowSum = 0;
    ch3RowSum = 0;

    for(i=0; i<W; i++)
    {
        ch1RowSum += chan1value;
        ch2RowSum += chan2value;
        ch3RowSum += chan3value;
    }

    ch1Mean = ch1RowSum / W;
    ch2Mean = ch2RowSum / W;
    ch3Mean = ch3RowSum / W;
}

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读