这个C循环可以进一步优化吗?
发布时间:2020-12-16 10:29:58 所属栏目:百科 来源:网络整理
导读:我大声尖叫. 这真让你好奇. 我担心如果我选择“赞成大小超速”会发生什么. 设置:Visual Studio 2010 OptimizationMaxSpeed/OptimizationIntrinsicFunctionstrue/IntrinsicFunctionsFavorSizeOrSpeedSpeed/FavorSizeOrSpeedEnableEnhancedInstructionSetStre
我大声尖叫.
这真让你好奇. 我担心如果我选择“赞成大小超速”会发生什么. 设置:Visual Studio 2010 <Optimization>MaxSpeed</Optimization> <IntrinsicFunctions>true</IntrinsicFunctions> <FavorSizeOrSpeed>Speed</FavorSizeOrSpeed> <EnableEnhancedInstructionSet>StreamingSIMDExtensions2</EnableEnhancedInstructionSet> <FloatingPointModel>Precise</FloatingPointModel> 如何: for (i = 0; i < some_num; i++) { one += buf[i] * buf[i]; two += buf[i] * buf[off+i]; } 翻译成这个: 131: for (i = 0; i < some_num; i++) 132: { 133: one += buf[i] * buf[i]; 00404B40 movss xmm0,dword ptr [eax-4] 00404B45 movss xmm7,dword ptr [esp+18h] 00404B4B movss xmm2,dword ptr [eax] 00404B4F cvtps2pd xmm3,xmm2 00404B52 movss xmm4,dword ptr [eax+4] 00404B57 cvtps2pd xmm1,xmm0 00404B5A mulsd xmm3,xmm3 00404B5E movss xmm6,dword ptr [eax+8] 00404B63 mulsd xmm1,xmm1 00404B67 cvtps2pd xmm5,xmm4 00404B6A mulsd xmm5,xmm5 00404B6E cvtps2pd xmm7,xmm7 00404B71 addsd xmm1,xmm7 00404B75 cvtpd2ps xmm1,xmm1 00404B79 cvtss2sd xmm1,xmm1 00404B7D addsd xmm1,xmm3 00404B81 xorps xmm3,xmm3 00404B84 cvtpd2ps xmm1,xmm1 00404B88 cvtss2sd xmm1,xmm1 00404B8C addsd xmm1,xmm5 00404B90 cvtpd2ps xmm1,xmm1 00404B94 cvtss2sd xmm3,xmm1 134: two += buf[i] * buf[off+i]; 00404B98 cvtps2pd xmm0,xmm0 00404B9B cvtps2pd xmm2,xmm2 00404B9E cvtps2pd xmm1,xmm6 00404BA1 mulsd xmm1,xmm1 00404BA5 addsd xmm3,xmm1 00404BA9 xorps xmm1,xmm1 00404BAC cvtpd2ps xmm1,xmm3 00404BB0 cvtps2pd xmm5,xmm1 00404BB3 movss xmm1,dword ptr [eax+0Ch] 00404BB8 cvtps2pd xmm3,xmm1 00404BBB mulsd xmm3,xmm3 00404BBF addsd xmm5,xmm3 00404BC3 xorps xmm3,xmm3 00404BC6 cvtpd2ps xmm3,xmm5 00404BCA cvtps2pd xmm5,xmm3 00404BCD movss xmm3,dword ptr [eax+10h] 00404BD2 cvtps2pd xmm3,xmm3 00404BD5 mulsd xmm3,xmm3 00404BD9 addsd xmm5,xmm3 00404BDD xorps xmm3,xmm3 00404BE0 cvtpd2ps xmm3,xmm5 00404BE4 cvtps2pd xmm5,xmm3 00404BE7 movss xmm3,dword ptr [eax+14h] 00404BEC cvtps2pd xmm3,xmm3 00404BEF mulsd xmm3,xmm3 00404BF3 addsd xmm5,xmm3 00404BF7 xorps xmm3,xmm3 00404BFA cvtpd2ps xmm3,xmm5 00404BFE cvtps2pd xmm5,xmm3 00404C01 movss xmm3,dword ptr [eax+18h] 00404C06 cvtps2pd xmm3,xmm3 00404C09 mulsd xmm3,xmm3 00404C0D addsd xmm5,xmm3 00404C11 xorps xmm3,xmm3 00404C14 cvtpd2ps xmm3,xmm5 00404C18 movss dword ptr [esp+18h],xmm3 00404C1E movss xmm3,dword ptr [ecx-4] 00404C23 cvtps2pd xmm3,xmm3 00404C26 mulsd xmm3,xmm0 00404C2A movss xmm0,dword ptr [esp+10h] 00404C30 cvtps2pd xmm0,xmm0 00404C33 addsd xmm3,xmm0 00404C37 xorps xmm0,xmm0 00404C3A cvtpd2ps xmm0,xmm3 00404C3E movss xmm3,dword ptr [ecx] 00404C42 cvtps2pd xmm0,xmm0 00404C45 cvtps2pd xmm3,xmm3 00404C48 mulsd xmm2,xmm3 00404C4C addsd xmm0,xmm2 00404C50 movss xmm2,dword ptr [ecx+4] 00404C55 cvtpd2ps xmm0,xmm0 00404C59 cvtss2sd xmm0,xmm0 00404C5D cvtps2pd xmm2,xmm2 00404C60 cvtps2pd xmm3,xmm4 00404C63 mulsd xmm2,xmm3 00404C67 addsd xmm0,xmm2 00404C6B movss xmm2,dword ptr [ecx+8] 00404C70 cvtpd2ps xmm0,xmm0 00404C74 cvtss2sd xmm0,xmm0 00404C78 cvtps2pd xmm2,xmm2 00404C7B cvtps2pd xmm1,xmm1 00404C7E cvtps2pd xmm3,xmm6 00404C81 mulsd xmm2,xmm3 00404C85 addsd xmm0,xmm2 00404C89 movss xmm2,dword ptr [ecx+0Ch] 00404C8E cvtpd2ps xmm0,xmm0 00404C92 cvtss2sd xmm0,xmm0 00404C96 cvtps2pd xmm2,xmm2 00404C99 mulsd xmm2,xmm1 00404C9D addsd xmm0,xmm2 00404CA1 cvtpd2ps xmm0,xmm0 00404CA5 xorps xmm1,xmm1 00404CA8 cvtss2sd xmm1,xmm0 00404CAC movss xmm0,dword ptr [ecx+10h] 00404CB1 cvtps2pd xmm2,xmm0 00404CB4 movss xmm0,dword ptr [eax+10h] 00404CB9 cvtps2pd xmm0,xmm0 00404CBC mulsd xmm2,xmm0 00404CC0 addsd xmm1,xmm2 00404CC4 xorps xmm0,xmm0 00404CC7 cvtpd2ps xmm0,xmm1 00404CCB add eax,20h 00404CCE add ecx,20h 00404CD1 cvtps2pd xmm1,xmm0 00404CD4 movss xmm0,dword ptr [ecx-0Ch] 00404CD9 cvtps2pd xmm2,xmm0 00404CDC movss xmm0,dword ptr [eax-0Ch] 00404CE1 cvtps2pd xmm0,xmm0 00404CE4 mulsd xmm2,xmm0 00404CE8 addsd xmm1,xmm2 00404CEC xorps xmm0,xmm0 00404CEF cvtpd2ps xmm0,xmm1 00404CF3 xorps xmm1,xmm1 00404CF6 cvtps2pd xmm1,xmm0 00404CF9 movss xmm0,dword ptr [ecx-8] 00404CFE xorps xmm2,xmm2 00404D01 cvtps2pd xmm2,xmm0 00404D04 movss xmm0,dword ptr [eax-8] 00404D09 cvtps2pd xmm0,xmm0 00404D0C mulsd xmm2,xmm0 00404D10 addsd xmm1,xmm2 00404D14 xorps xmm0,xmm0 00404D17 cvtpd2ps xmm0,xmm1 00404D1B movss dword ptr [esp+10h],xmm0 00404D21 cmp eax,offset buf+84h (42D6A4h) 00404D26 jl gem+290h (404B40h) 135: } 解决方法
答案是肯定的. Visual Studio当前没有向量化代码.如果你看一下程序集,那些都是标量的SSE指令.而你的循环显然是可矢量化的.
您将不得不使用矢量化编译器来获得更好的结果.或者使用内在函数自己发出vector-SSE指令. http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_bk_intro.htm 你可以尝试的另一件事是: 将浮点模式更改为“快速”而不是“精确”.编译器正在推广中间体以实现双精度并将其转换回来 – 这会增加很多开销. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |