将c映射到程序集
|
当使用clang 3.9.1和优化(-O2)编译一些代码时,我在运行时遇到了一些我没有看到过其他编译器的意外行为(clang 3.8和
gcc 6.3).
我以为我可能有一些无意的未定义行为(使用ubsan编译删除了意外的行为)所以我试图简化程序,发现一个特定的函数似乎导致行为的差异. 现在,我将程序集映射回c以查看它出错的地方,尝试确定为什么会发生这种情况,并且有一些部分我难以映射回来. Godbolt link C : #include <atomic>
#include <cstdint>
#include <cstdlib>
#include <thread>
#include <cstdio>
enum class FooState { A,B };
struct Foo {
std::atomic<std::int64_t> counter{0};
std::atomic<std::int64_t> counter_a{0};
std::atomic<std::int64_t> counter_b{0};
};
//__attribute__((noinline))
FooState to_state(const std::int64_t c) {
return c >= 0 ? FooState::A : FooState::B;
}
static const int NUM_MODIFIES = 100;
int value_a = 0,value_b = 0;
Foo foo;
std::atomic<std::int64_t> total_sum{0};
void test_function() {
bool done = false;
while (!done) {
const std::int64_t count =
foo.counter.fetch_add(1,std::memory_order_seq_cst);
const FooState state = to_state(count);
int &val = FooState::A == state ? value_a : value_b;
if (val == NUM_MODIFIES) {
total_sum += val;
done = true;
}
std::atomic<std::int64_t> &c =
FooState::A == state ? foo.counter_a : foo.counter_b;
c.fetch_add(1,std::memory_order_seq_cst);
}
}
部件: test_function(): # @test_function()
test rax,rax
setns al
lock
inc qword ptr [rip + foo]
mov ecx,value_a
mov edx,value_b
cmovg rdx,rcx
cmp dword ptr [rdx],100
je .LBB1_3
mov ecx,foo+8
mov edx,value_a
.LBB1_2: # =>This Inner Loop Header: Depth=1
test al,1
mov eax,foo+16
cmovne rax,rcx
lock
inc qword ptr [rax]
test rax,rax
setns al
lock
inc qword ptr [rip + foo]
mov esi,value_b
cmovg rsi,rdx
cmp dword ptr [rsi],100
jne .LBB1_2
.LBB1_3:
lock
add qword ptr [rip + total_sum],100
test al,al
mov eax,foo+8
mov ecx,foo+16
cmovne rcx,rax
lock
inc qword ptr [rcx]
ret
我发现将to_state标记为noinline或者将done更改为全局似乎“修复”了意外行为. 我看到的意外行为是,当计数器为> = 0时,则应增加counter_a,否则应增加counter_b.从我所知,有时候这种情况并没有发生,但确切地确定何时/为何困难. 我可以使用一些帮助的组件的一部分是测试rax,rax; setns al和测试al,1部分.似乎初始测试不会确定性地设置,然后该值用于确定要递增的计数器,但也许我误解了一些东西. 下面是一个演示此问题的小例子.它通常在使用clang 3.9和-O2编译时永远挂起,否则运行完成. #include <atomic>
#include <cstdint>
#include <cstdlib>
#include <thread>
#include <cstdio>
enum class FooState { A,B };
struct Foo {
std::atomic<std::int64_t> counter{0};
std::atomic<std::int64_t> counter_a{0};
std::atomic<std::int64_t> counter_b{0};
};
//__attribute__((noinline))
FooState to_state(const std::int64_t c) {
return c >= 0 ? FooState::A : FooState::B;
}
//__attribute__((noinline))
FooState to_state2(const std::int64_t c) {
return c >= 0 ? FooState::A : FooState::B;
}
static const int NUM_MODIFIES = 100;
int value_a = 0,std::memory_order_seq_cst);
}
}
int main() {
std::thread thread = std::thread(test_function);
for (std::size_t i = 0; i <= NUM_MODIFIES; ++i) {
const std::int64_t count =
foo.counter.load(std::memory_order_seq_cst);
const FooState state = to_state2(count);
unsigned log_count = 0;
auto &inactive_val = FooState::A == state ? value_b : value_a;
inactive_val = i;
if (FooState::A == state) {
foo.counter_b.store(0,std::memory_order_seq_cst);
const auto accesses_to_wait_for =
foo.counter.exchange((std::numeric_limits<std::int64_t>::min)(),std::memory_order_seq_cst);
while (accesses_to_wait_for !=
foo.counter_a.load(std::memory_order_seq_cst)) {
std::this_thread::yield();
if(++log_count <= 10) {
std::printf("#1 wait_for=%ld,val=%ldn",accesses_to_wait_for,foo.counter_a.load(std::memory_order_seq_cst));
}
}
} else {
foo.counter_a.store(0,std::memory_order_seq_cst);
auto temp = foo.counter.exchange(0,std::memory_order_seq_cst);
std::int64_t accesses_to_wait_for = 0;
while (temp != INT64_MIN) {
++accesses_to_wait_for;
--temp;
}
while (accesses_to_wait_for !=
foo.counter_b.load(std::memory_order_seq_cst)) {
std::this_thread::yield();
if (++log_count <= 10) {
std::printf("#2 wait_for=%ld,foo.counter_b.load(std::memory_order_seq_cst));
}
}
}
std::printf("modify #%lu completen",i);
}
std::printf("modifies completen");
thread.join();
const std::size_t expected_result = NUM_MODIFIES;
std::printf("%sn",total_sum == expected_result ? "ok" : "fail");
}
解决方法
我不是100%肯定(没有调试它,只是模拟头部),但我认为两对测试rax,rax setns al测试错误.
首先的结果取决于rax< 0在调用函数(UB)时,其他测试内部循环将始终为“NS”(在rax中测试32b地址=> SF = 0 => al = 1),因此修复al == 1循环将始终选择counter_a. 现在我读了你的问题,你也有同样的怀疑(我确实先看了一下代码). (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
