java – 如果String在列表中(在编译时给出):HashSet是最快的解
在编译时给定一个固定的字符串列表,例如:
"zero","one","two","three","four","five","six","seven","eight","nine" 利用HashSet,我们有一种非常快速的方法(O(1))来判断运行时提供的String是否在字符串列表中. 例如: Set<String> SET = new HashSet<>(Arrays.asList( "zero","nine")); boolean listed = SET.contains("some-text"); 有没有其他更快的方法来判断在运行时提供的字符串是否在字符串的初始列表中(在编译时给出)或HashSet是最快的解决方案? 让我们将问题正式化 鉴于以下界面: interface Checker { String[] VALUES = { "zero","nine" }; boolean contains(String s); } 提供最快的实现,因为Checker.VALUES中列出的值不会被更改(例如,如果您愿意,可以在代码中使用这些文字). 演示:HashSetChecker实现 使用HashSet的实现如下所示: class HashSetChecker implements Checker { private final Set<String> set = new HashSet<>(Arrays.asList(VALUES)); @Override public boolean contains(String s) { return set.contains(s); } } 用于测试实现的代码 在测试时,我们想要使用初始实习字符串测试Checker.contains()方法,还要测试其他无法找到的实际字符串(字符串文字),以及具有相同值(等于)但不是的字符串实习生.为此,将使用以下CheckerTester.TESTS数组. public class CheckerTester { private static final String[] TESTS = { "zero","nine","ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen","seventeen","eighteen","nineteen",new String("zero"),new String("one"),new String("two"),new String("three"),new String("four"),new String("five"),new String("six"),new String("seven"),new String("eight"),new String("nine"),new String("ten"),new String("eleven"),new String("twelve"),new String("thirteen"),new String("fourteen"),new String("fifteen"),new String("sixteen"),new String("seventeen"),new String("eighteen"),new String("nineteen") }; public static void test(Checker checker) { final int N = 1_000_000; long start = System.nanoTime(); for (int i = 0; i < N; i++) for (String test : TESTS) checker.contains(test); long end = System.nanoTime(); System.out.printf("%s: %d msn",checker.getClass().getName(),(end - start) / 1_000_000); } } 解决方法
让我们看看一些实现:
第一个想法:具有大容量的HashSet 有些人可能会说多个字符串最终可能会出现在同一个HashSet存储桶中,所以让我们使用一个很大的初始容量: class HashSetChecker2 implements Checker { private final Set<String> set = new HashSet<>(1000); { set.addAll(Arrays.asList(VALUES)); } @Override public boolean contains(String s) { return set.contains(s); } } 使用HashMap而不是HashSet HashSet在其实现中使用HashMap,让我们做同样的事情:摆脱“shell”HashSet: class HashMapChecker implements Checker { private final Map<String,Object> map = new HashMap<>(1000); { for (String s : VALUES) map.put(s,s); } @Override public boolean contains(String s) { return map.containsKey(s); } } TreeSet中 有些人可能会说给TreeSet一个尝试(它已经排序,所以它可能有机会).我知道它是O(log(n)),但是n很小(在这种情况下为10): class TreeSetChecker implements Checker { private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES)); @Override public boolean contains(String s) { return set.contains(s); } } 短路OR检查(if-else逻辑): class OrChecker implements Checker { @Override public boolean contains(String s) { return "zero".equals(s) || "one".equals(s) || "two".equals(s) || "three".equals(s) || "four".equals(s) || "five".equals(s) || "six".equals(s) || "seven".equals(s) || "eight".equals(s) || "nine".equals(s); } } 参考等式然后恢复为短路OR检查 有些人可能会说我们应首先检查是否通过引用我们有字符串,如果没有,则返回短路OR检查: class RefOrChecker extends OrChecker { @Override public boolean contains(String s) { return "zero" == s || "one" == s || "two" == s || "three" == s || "four" == s || "five" == s || "six" == s || "seven" == s || "eight" == s || "nine" == s || super.contains(s); } } 使用开关:到目前为止我发现的速度最快 由于我们在编译时有一个固定的字符串列表,我们可以利用在switch语句中使用字符串的可能性. 我们可以为固定列表中的每个String添加一个case并返回true,并添加一个默认大小写以返回false. class SwitchChecker implements Checker { @Override public boolean contains(String s) { switch (s) { case "zero": case "one": case "two": case "three": case "four": case "five": case "six": case "seven": case "eight": case "nine": return true; default: return false; } } } 新发现:嵌入式开关(2个开关块) Maaartinus关于完美哈希的answer让我思考.即使我们有一个完美的哈希,它仍然必须运行在运行时提供的字符串的整个内容,我们要检查.所以我们应该使用String中可用的东西:它的长度.根据String的长度,我们使用一个开关,在该开关内部,我们使用内部开关,仅列出具有指定长度的字符串.有了这个,我们减少了一个开关内的case语句的数量: class EmbeddedSwitchChecker implements Checker { @Override public boolean contains(String s) { switch (s.length()) { case 3: switch (s) { case "one": case "two": case "six": return true; default: return false; } case 4: switch (s) { case "zero": case "four": case "five": case "nine": return true; default: return false; } case 5: switch (s) { case "three": case "seven": case "eight": return true; default: return false; } default: return false; } } } 新发现:CharSwitchChecker:冠军 这基本上是改进的EmbeddedSwitchChecker和OldCurmudgeon的idea about a state machine的组合:这里我们使用字符串的第一个字符上的开关(但首先我们检查它的长度),并且基于此我们要么缩小到一个可能的字符串,或者如果没有,我们也检查第二个字符,在这种情况下,可能的String只能是一个(我们可以通过调用String.equals()来决定): class CharSwitchChecker implements Checker { @Override public boolean contains(String s) { final int length = s.length(); if (length < 3 || length > 5) return false; switch (s.charAt(0)) { case 'z': return "zero".equals(s); case 'o': return "one".equals(s); case 't': return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s); case 'f': return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s); case 's': return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s); case 'e': return "eight".equals(s); case 'n': return "nine".equals(s); } return false; } } 检测结果 以下是测试结果: TIME HOW FAST (compared to HashSetChecker) ----------------------------------------------------------------------------- HashSetChecker: 929 ms 1.00x HashSetChecker2: 892 ms 1.04x HashMapChecker: 873 ms 1.06x TreeSetChecker: 2265 ms 0.41x OrChecker: 1815 ms 0.51x RefOrChecker: 1708 ms 0.54x SwitchChecker: 538 ms 1.73x EmbeddedSwitchChecker: 467 ms 1.99x CharSwitchChecker: 436 ms 2.13x SwitchChecker解决方案快约1.7倍,EmbeddedSwitchChecker快2倍,冠军CharSwitchChecker比HashSetChecker实现快约2.13倍.正如预期的那样,具有较大初始容量的HashSet和HashMap解决方案稍快一些,所有其他解决方案都落后了. 完整的可运行测试程序(所有列出的解决方案) 完整的Runnalbe测试计划以及所有列出的解决方案都在一个框中,供那些想要尝试或尝试新实现的人使用. 编辑:按照Luiggi Mendoza对rules for performing a micro benchmark的建议,我改变了main()方法进行测试.我执行整个测试两次,我只分析第二个结果.此外,由于测试不会在循环中创建新对象,我认为没有理由调用System.gc(). import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Set; import java.util.TreeSet; interface Checker { String[] VALUES = { "zero","nine" }; boolean contains(String s); } class HashSetChecker implements Checker { private final Set<String> set = new HashSet<>(Arrays.asList(VALUES)); @Override public boolean contains(String s) { return set.contains(s); } } class HashSetChecker2 implements Checker { private final Set<String> set = new HashSet<>(1000); { set.addAll(Arrays.asList(VALUES)); } @Override public boolean contains(String s) { return set.contains(s); } } class HashMapChecker implements Checker { private final Map<String,s); } @Override public boolean contains(String s) { return map.containsKey(s); } } class TreeSetChecker implements Checker { private final Set<String> set = new TreeSet<>(Arrays.asList(VALUES)); @Override public boolean contains(String s) { return set.contains(s); } } class OrChecker implements Checker { @Override public boolean contains(String s) { return "zero".equals(s) || "one".equals(s) || "two".equals(s) || "three".equals(s) || "four".equals(s) || "five".equals(s) || "six".equals(s) || "seven".equals(s) || "eight".equals(s) || "nine".equals(s); } } class RefOrChecker extends OrChecker { @Override public boolean contains(String s) { return "zero" == s || "one" == s || "two" == s || "three" == s || "four" == s || "five" == s || "six" == s || "seven" == s || "eight" == s || "nine" == s || super.contains(s); } } class SwitchChecker implements Checker { @Override public boolean contains(String s) { switch (s) { case "zero": case "one": case "two": case "three": case "four": case "five": case "six": case "seven": case "eight": case "nine": return true; default: return false; } } } class EmbeddedSwitchChecker implements Checker { @Override public boolean contains(String s) { switch (s.length()) { case 3: switch (s) { case "one": case "two": case "six": return true; default: return false; } case 4: switch (s) { case "zero": case "four": case "five": case "nine": return true; default: return false; } case 5: switch (s) { case "three": case "seven": case "eight": return true; default: return false; } default: return false; } } } class CharSwitchChecker implements Checker { @Override public boolean contains(String s) { final int length = s.length(); if (length < 3 || length > 5) return false; switch (s.charAt(0)) { case 'z': return "zero".equals(s); case 'o': return "one".equals(s); case 't': return s.charAt(1) == 'w' ? "two".equals(s) : "three".equals(s); case 'f': return s.charAt(1) == 'o' ? "four".equals(s) : "five".equals(s); case 's': return s.charAt(1) == 'i' ? "six".equals(s) : "seven".equals(s); case 'e': return "eight".equals(s); case 'n': return "nine".equals(s); } return false; } } public class CheckerTester { private static final String[] TESTS = { "zero",(end - start) / 1_000_000); } public static void main(String args[]) { for (int i = 1; i <= 2; i++) { System.out.println("---- Check #" + i); test(new HashSetChecker()); test(new HashSetChecker2()); test(new HashMapChecker()); test(new TreeSetChecker()); test(new OrChecker()); test(new RefOrChecker()); test(new SwitchChecker()); test(new EmbeddedSwitchChecker()); test(new CharSwitchChecker()); } } } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |