c – 多线程性能std :: string
我们在使用OpenMP的项目上运行一些代码,我遇到了一些奇怪的事情.我已经包含了一些演示代码的部分内容,以展示我所看到的内容.
测试比较在多线程循环中使用带有std :: string参数的const char *参数调用函数.这些函数基本上什么都不做,所以没有开销. 我所看到的是完成循环所需时间的主要差异.对于执行100,000,000次迭代的const char *版本,代码需要0.075秒才能完成,而std :: string版本需要5.08秒.这些测试是在带有gcc-4.4的Ubuntu-10.04-x64上完成的. 我的问题基本上是否这完全是由于std :: string的动态分配以及为什么在这种情况下无法优化掉,因为它是const并且不能改变? 以下代码,非常感谢您的回复. 编译:g -Wall -Wextra -O3 -fopenmp string_args.cpp -o string_args #include <iostream> #include <map> #include <string> #include <stdint.h> // For wall time #ifdef _WIN32 #include <time.h> #else #include <sys/time.h> #endif namespace { const int64_t g_max_iter = 100000000; std::map<const char*,int> g_charIndex = std::map<const char*,int>(); std::map<std::string,int> g_strIndex = std::map<std::string,int>(); class Timer { public: Timer() { #ifdef _WIN32 m_start = clock(); #else /* linux & mac */ gettimeofday(&m_start,0); #endif } float elapsed() { #ifdef _WIN32 clock_t now = clock(); const float retval = float(now - m_start)/CLOCKS_PER_SEC; m_start = now; #else /* linux & mac */ timeval now; gettimeofday(&now,0); const float retval = float(now.tv_sec - m_start.tv_sec) + float((now.tv_usec - m_start.tv_usec)/1E6); m_start = now; #endif return retval; } private: // The type of this variable is different depending on the platform #ifdef _WIN32 clock_t #else timeval #endif m_start; ///< The starting time (implementation dependent format) }; } bool contains_char(const char * id) { if( g_charIndex.empty() ) return false; return (g_charIndex.find(id) != g_charIndex.end()); } bool contains_str(const std::string & name) { if( g_strIndex.empty() ) return false; return (g_strIndex.find(name) != g_strIndex.end()); } void do_serial_char() { int found(0); Timer clock; for( int64_t i = 0; i < g_max_iter; ++i ) { if( contains_char("pos") ) { ++found; } } std::cout << "Loop time: " << clock.elapsed() << "n"; ++found; } void do_parallel_char() { int found(0); Timer clock; #pragma omp parallel for for( int64_t i = 0; i < g_max_iter; ++i ) { if( contains_char("pos") ) { ++found; } } std::cout << "Loop time: " << clock.elapsed() << "n"; ++found; } void do_serial_str() { int found(0); Timer clock; for( int64_t i = 0; i < g_max_iter; ++i ) { if( contains_str("pos") ) { ++found; } } std::cout << "Loop time: " << clock.elapsed() << "n"; ++found; } void do_parallel_str() { int found(0); Timer clock; #pragma omp parallel for for( int64_t i = 0; i < g_max_iter ; ++i ) { if( contains_str("pos") ) { ++found; } } std::cout << "Loop time: " << clock.elapsed() << "n"; ++found; } int main() { std::cout << "Starting single-threaded loop using std::stringn"; do_serial_str(); std::cout << "nStarting multi-threaded loop using std::stringn"; do_parallel_str(); std::cout << "nStarting single-threaded loop using char *n"; do_serial_char(); std::cout << "nStarting multi-threaded loop using const char*n"; do_parallel_char(); } 解决方法
是的,这是由于每次迭代时std :: string的分配和复制. sufficiently smart compiler可能会对此进行优化,但目前的优化器不太可能发生这种情况.相反,你可以自己提升弦: void do_parallel_str() { int found(0); Timer clock; std::string const str = "pos"; // you can even make it static,if desired #pragma omp parallel for for( int64_t i = 0; i < g_max_iter; ++i ) { if( contains_str(str) ) { ++found; } } //clock.stop(); // Or use something to that affect,so you don't include // any of the below expression (such as outputing "Loop time: ") in the timing. std::cout << "Loop time: " << clock.elapsed() << "n"; ++found; } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |