java – jmh表示M1比M2快,但M1代表M2
我写了一个JMH基准,涉及两种方法:M1和M2. M1调用M2,但由于某种原因,JMH声称M1比M2快.
以下是基准源代码: import java.util.concurrent.TimeUnit; import static org.bitbucket.cowwoc.requirements.Requirements.assertThat; import static org.bitbucket.cowwoc.requirements.Requirements.requireThat; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class MyBenchmark { @Benchmark public void assertMethod() { assertThat("value","name").isNotNull().isNotEmpty(); } @Benchmark public void requireMethod() { requireThat("value","name").isNotNull().isNotEmpty(); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(MyBenchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); } } 在上面的例子中,M1是assertThat(),M2是requireThat().意思是assertThat()调用requireThat(). 这是基准输出: # JMH 1.13 (released 8 days ago) # VM version: JDK 1.8.0_102,VM 25.102-b14 # VM invoker: C:Program FilesJavajdk1.8.0_102jrebinjava.exe # VM options: -ea # Warmup: 20 iterations,1 s each # Measurement: 20 iterations,1 s each # Timeout: 10 min per iteration # Threads: 1 thread,will synchronize iterations # Benchmark mode: Average time,time/op # Benchmark: com.mycompany.jmh.MyBenchmark.assertMethod # Run progress: 0.00% complete,ETA 00:01:20 # Fork: 1 of 1 # Warmup Iteration 1: 8.268 ns/op # Warmup Iteration 2: 6.082 ns/op # Warmup Iteration 3: 4.846 ns/op # Warmup Iteration 4: 4.854 ns/op # Warmup Iteration 5: 4.834 ns/op # Warmup Iteration 6: 4.831 ns/op # Warmup Iteration 7: 4.815 ns/op # Warmup Iteration 8: 4.839 ns/op # Warmup Iteration 9: 4.825 ns/op # Warmup Iteration 10: 4.812 ns/op # Warmup Iteration 11: 4.806 ns/op # Warmup Iteration 12: 4.805 ns/op # Warmup Iteration 13: 4.802 ns/op # Warmup Iteration 14: 4.813 ns/op # Warmup Iteration 15: 4.805 ns/op # Warmup Iteration 16: 4.818 ns/op # Warmup Iteration 17: 4.815 ns/op # Warmup Iteration 18: 4.817 ns/op # Warmup Iteration 19: 4.812 ns/op # Warmup Iteration 20: 4.810 ns/op Iteration 1: 4.805 ns/op Iteration 2: 4.816 ns/op Iteration 3: 4.813 ns/op Iteration 4: 4.938 ns/op Iteration 5: 5.061 ns/op Iteration 6: 5.129 ns/op Iteration 7: 4.828 ns/op Iteration 8: 4.837 ns/op Iteration 9: 4.819 ns/op Iteration 10: 4.815 ns/op Iteration 11: 4.872 ns/op Iteration 12: 4.806 ns/op Iteration 13: 4.811 ns/op Iteration 14: 4.827 ns/op Iteration 15: 4.837 ns/op Iteration 16: 4.842 ns/op Iteration 17: 4.812 ns/op Iteration 18: 4.809 ns/op Iteration 19: 4.806 ns/op Iteration 20: 4.815 ns/op Result "assertMethod": 4.855 ?(99.9%) 0.077 ns/op [Average] (min,avg,max) = (4.805,4.855,5.129),stdev = 0.088 CI (99.9%): [4.778,4.932] (assumes normal distribution) # JMH 1.13 (released 8 days ago) # VM version: JDK 1.8.0_102,time/op # Benchmark: com.mycompany.jmh.MyBenchmark.requireMethod # Run progress: 50.00% complete,ETA 00:00:40 # Fork: 1 of 1 # Warmup Iteration 1: 7.193 ns/op # Warmup Iteration 2: 4.835 ns/op # Warmup Iteration 3: 5.039 ns/op # Warmup Iteration 4: 5.053 ns/op # Warmup Iteration 5: 5.077 ns/op # Warmup Iteration 6: 5.102 ns/op # Warmup Iteration 7: 5.088 ns/op # Warmup Iteration 8: 5.109 ns/op # Warmup Iteration 9: 5.096 ns/op # Warmup Iteration 10: 5.096 ns/op # Warmup Iteration 11: 5.091 ns/op # Warmup Iteration 12: 5.089 ns/op # Warmup Iteration 13: 5.099 ns/op # Warmup Iteration 14: 5.097 ns/op # Warmup Iteration 15: 5.090 ns/op # Warmup Iteration 16: 5.096 ns/op # Warmup Iteration 17: 5.088 ns/op # Warmup Iteration 18: 5.086 ns/op # Warmup Iteration 19: 5.087 ns/op # Warmup Iteration 20: 5.097 ns/op Iteration 1: 5.097 ns/op Iteration 2: 5.088 ns/op Iteration 3: 5.092 ns/op Iteration 4: 5.097 ns/op Iteration 5: 5.082 ns/op Iteration 6: 5.089 ns/op Iteration 7: 5.086 ns/op Iteration 8: 5.084 ns/op Iteration 9: 5.090 ns/op Iteration 10: 5.086 ns/op Iteration 11: 5.084 ns/op Iteration 12: 5.088 ns/op Iteration 13: 5.091 ns/op Iteration 14: 5.092 ns/op Iteration 15: 5.085 ns/op Iteration 16: 5.096 ns/op Iteration 17: 5.078 ns/op Iteration 18: 5.125 ns/op Iteration 19: 5.089 ns/op Iteration 20: 5.091 ns/op Result "requireMethod": 5.091 ?(99.9%) 0.008 ns/op [Average] (min,max) = (5.078,5.091,5.125),stdev = 0.010 CI (99.9%): [5.082,5.099] (assumes normal distribution) # Run complete. Total time: 00:01:21 Benchmark Mode Cnt Score Error Units MyBenchmark.assertMethod avgt 20 4.855 ? 0.077 ns/op MyBenchmark.requireMethod avgt 20 5.091 ? 0.008 ns/op 在本地再现: >创建一个包含上述基准的Maven项目. <dependency> <groupId>org.bitbucket.cowwoc</groupId> <artifactId>requirements</artifactId> <version>2.0.0</version> </dependency> >或者,从https://bitbucket.org/cowwoc/requirements/下载库 我有以下问题: 你可以重现这个结果吗? 更新:我发布了更新的基准源代码,基准输出,jmh测试输出和xperfasm输出到https://bitbucket.org/cowwoc/requirements/downloads每Aleksey Shipilev的建议.由于问题的30k个字符限制,我无法将其发布到Stackoverflow. UPDATE2:我终于获得了一贯的,有意义的结果. Benchmark Mode Cnt Score Error Units MyBenchmark.assertMethod avgt 60 22.552 ± 0.020 ns/op MyBenchmark.requireMethod avgt 60 22.411 ± 0.114 ns/op 通过一致,我的意思是我在运行中得到几乎相同的值. 意思是说assertMethod()比requireMethod()慢. 我进行了以下更改: >锁定CPU时钟(Windows电源选项中,最小/最大CPU设置为99%) 有没有人可以实现这些结果,而不用运行时间倍增? UPDATE3:禁用内联会产生相同的结果,而没有明显的性能下降.我发了一个更详细的答案here. 解决方法
在这种特殊情况下,由于注册分配问题,assertMethod确实比requireMethod编译得更好.
基准看起来是正确的,我可以一直重现你的结果. package bench; import com.google.common.collect.ImmutableMap; import org.openjdk.jmh.annotations.*; @State(Scope.Benchmark) public class Requirements { private static boolean enabled = true; private String name = "name"; private String value = "value"; @Benchmark public Object assertMethod() { if (enabled) return requireThat(value,name); return null; } @Benchmark public Object requireMethod() { return requireThat(value,name); } public static Object requireThat(String parameter,String name) { if (name.trim().isEmpty()) throw new IllegalArgumentException(); return new StringRequirementsImpl(parameter,name,new Configuration()); } static class Configuration { private Object context = ImmutableMap.of(); } static class StringRequirementsImpl { private String parameter; private String name; private Configuration config; private ObjectRequirementsImpl asObject; StringRequirementsImpl(String parameter,String name,Configuration config) { this.parameter = parameter; this.name = name; this.config = config; this.asObject = new ObjectRequirementsImpl(parameter,config); } } static class ObjectRequirementsImpl { private Object parameter; private String name; private Configuration config; ObjectRequirementsImpl(Object parameter,Configuration config) { this.parameter = parameter; this.name = name; this.config = config; } } } 首先,我已经通过-XX:PrintInlining验证了整个基准测试是一个大的方法.显然这个编译单元有很多节点,没有足够的CPU寄存器来容纳所有的中间变量.也就是说,编译器需要spill其中一些. >在assertMethod 4 registers在调用trim()之前溢出到堆栈中. -XX:PrintAssembly输出: assertMethod | requireMethod -------------------------|------------------------ mov %r11d,0x5c(%rsp) | mov %rcx,0x20(%rsp) mov %r10d,0x58(%rsp) | mov %r11,0x48(%rsp) mov %rbp,0x50(%rsp) | mov %r10,0x30(%rsp) mov %rbx,0x48(%rsp) | mov %rbp,0x50(%rsp) | mov %r9d,0x58(%rsp) | mov %edi,0x5c(%rsp) | mov %r8,0x60(%rsp) 除了if(启用)检查之外,这几乎是两种编译方法的唯一区别.因此,性能差异由更多的变量溢出到内存来解释. 为什么较小的方法被编译得不太好?那么这个注册分配问题就知道是NP完整的.由于在合理的时间内无法理想解决问题,编译者通常依赖于某些启发式.在一个大的方法中,像一个额外的小事情可能会显着改变寄存器分配算法的结果. 但是你不需要担心.我们看到的效果并不意味着requireMethod总是被编译得更糟.在其他用例中,由于内联,编译图将完全不同.无论如何,1纳秒的差异对于真正的应用程序性能没有任何影响. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |