Scala并行收集运行时迷惑
编辑:我的样本量太小了当我按照8 CPU的真实数据运行时,我看到速度提高了7.2倍.对我的代码添加4个字符不是太破旧;)
我目前正在试图通过使用Scala的优势“销售”管理,特别是在扩展CPU时.为此,我创建了一个简单的测试应用程序,它执行了一系列的向量数学,并且有点惊讶的发现,运行时在我的四核机器上并没有显着更好.有趣的是,我发现运行时是第一次通过收集并且随着后续调用变得越来越糟糕.在并行集合中有没有一些懒惰的东西是导致这个,还是我只是这样做错了?应该注意的是,我来自C/C++#世界,所以完全有可能我搞砸了我的配置.无论如何,这是我的设置: InteliJ Scala插件 Scala 2.9.1.final Windows 7 64位,四核处理器(无超线程) import util.Random // simple Vector3D class that has final x,y,z components a length,and a '-' function class Vector3D(val x:Double,val y:Double,val z:Double) { def length = math.sqrt(x*x+y*y+z*z) def -(rhs : Vector3D ) = new Vector3D(x - rhs.x,y - rhs.y,z - rhs.z) } object MainClass { def main(args : Array[String]) = { println("Available CPU's: " + Runtime.getRuntime.availableProcessors()) println("Parallelism Degree set to: " + collection.parallel.ForkJoinTasks.defaultForkJoinPool.getParallelism); // my position val myPos = new Vector3D(0,0); val r = new Random(0); // define a function nextRand that gets us a random between 0 and 100 def nextRand = r.nextDouble() * 100; // make 10 million random targets val targets = (0 until 10000000).map(_ => new Vector3D(nextRand,nextRand,nextRand)).toArray // take the .par hit before we start profiling val parTargets = targets.par println("Created " + targets.length + " vectors") // define a range function val rangeFunc : (Vector3D => Double) = (targetPos) => (targetPos - myPos).length // we'll select ones that are <50 val within50 : (Vector3D => Boolean) = (targetPos) => rangeFunc(targetPos) < 50 // time it sequentially val startTime_sequential = System.currentTimeMillis() val numTargetsInRange_sequential = targets.filter(within50) val endTime_sequential = System.currentTimeMillis() println("Sequential (ms): " + (endTime_sequential - startTime_sequential)) // do the parallel version 10 times for(i <- 1 to 10) { val startTime_par = System.currentTimeMillis() val numTargetsInRange_parallel = parTargets.filter(within50) val endTime_par = System.currentTimeMillis() val ms = endTime_par - startTime_par; println("Iteration[" + i + "] Executed in " + ms + " ms") } } } 该程序的输出是: Available CPU's: 4 Parallelism Degree set to: 4 Created 10000000 vectors Sequential (ms): 216 Iteration[1] Executed in 227 ms Iteration[2] Executed in 253 ms Iteration[3] Executed in 76 ms Iteration[4] Executed in 78 ms Iteration[5] Executed in 77 ms Iteration[6] Executed in 80 ms Iteration[7] Executed in 78 ms Iteration[8] Executed in 78 ms Iteration[9] Executed in 79 ms Iteration[10] Executed in 82 ms 那么这里发生了什么我们做过滤器的前2次,速度比较慢,然后事情加快了?我明白本来就是一个并行启动的成本,我只是想弄清楚在应用程序中表达并行性是什么,特别是我想要能够显示管理程序,运行3-4次在四核心盒上更快.这不是一个好问题吗? 想法? 解决方法
你有微基准疾病.您最有可能对JIT编译阶段进行基准测试.您需要先预先运行JIT.
最好的想法是使用像http://code.google.com/p/caliper/这样的微型基准框架来处理所有这些. 编辑:对于Caliper基准测试Scala项目,有一个很好的SBT Template,参考from this blog post (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |