正则表达式在Java中没有明显的最大长度
我一直认为,Java的regex-API(以及许多其他语言)中的后瞻性断言必须有明显的长度。因此,STAR和PLUS量词不允许在look-behinds内部使用。
优秀的在线资源regular-expressions.info似乎证实(一些)我的假设:
使用大括号只要look-behind内的字符范围的总长度小于或等于Integer.MAX_VALUE即可。所以这些正则表达式是有效的: "(?<=a{0," +(Integer.MAX_VALUE) + "})B" "(?<=Ca{0," +(Integer.MAX_VALUE-1) + "})B" "(?<=CCa{0," +(Integer.MAX_VALUE-2) + "})B" 但这些不是: "(?<=Ca{0," +(Integer.MAX_VALUE) +"})B" "(?<=CCa{0," +(Integer.MAX_VALUE-1) +"})B" 但是,我不明白以下: 当我使用*和量词在后台中运行测试时,一切都很好(见输出测试1和测试2)。 但是,当我从测试1和测试2开始添加单个字符时,它将中断(见输出测试3)。 使来自测试3的贪心*没有效果,它仍然断裂(见测试4)。 这里是测试工具: public class Main { private static String testFind(String regex,String input) { try { boolean returned = java.util.regex.Pattern.compile(regex).matcher(input).find(); return "testFind : Valid -> regex = "+regex+",input = "+input+",returned = "+returned; } catch(Exception e) { return "testFind : Invalid -> "+regex+","+e.getMessage(); } } private static String testReplaceAll(String regex,String input) { try { String returned = input.replaceAll(regex,"FOO"); return "testReplaceAll : Valid -> regex = "+regex+",returned = "+returned; } catch(Exception e) { return "testReplaceAll : Invalid -> "+regex+","+e.getMessage(); } } private static String testSplit(String regex,String input) { try { String[] returned = input.split(regex); return "testSplit : Valid -> regex = "+regex+",returned = "+java.util.Arrays.toString(returned); } catch(Exception e) { return "testSplit : Invalid -> "+regex+","+e.getMessage(); } } public static void main(String[] args) { String[] regexes = {"(?<=a*)B","(?<=a+)B","(?<=Ca*)B","(?<=Ca*?)B"}; String input = "CaaaaaaaaaaaaaaaBaaaa"; int test = 0; for(String regex : regexes) { test++; System.out.println("********************** Test "+test+" **********************"); System.out.println(" "+testFind(regex,input)); System.out.println(" "+testReplaceAll(regex,input)); System.out.println(" "+testSplit(regex,input)); System.out.println(); } } } 输出: ********************** Test 1 ********************** testFind : Valid -> regex = (?<=a*)B,input = CaaaaaaaaaaaaaaaBaaaa,returned = true testReplaceAll : Valid -> regex = (?<=a*)B,returned = CaaaaaaaaaaaaaaaFOOaaaa testSplit : Valid -> regex = (?<=a*)B,returned = [Caaaaaaaaaaaaaaa,aaaa] ********************** Test 2 ********************** testFind : Valid -> regex = (?<=a+)B,returned = true testReplaceAll : Valid -> regex = (?<=a+)B,returned = CaaaaaaaaaaaaaaaFOOaaaa testSplit : Valid -> regex = (?<=a+)B,aaaa] ********************** Test 3 ********************** testFind : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6 (?<=Ca*)B ^ testReplaceAll : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6 (?<=Ca*)B ^ testSplit : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6 (?<=Ca*)B ^ ********************** Test 4 ********************** testFind : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7 (?<=Ca*?)B ^ testReplaceAll : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7 (?<=Ca*?)B ^ testSplit : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7 (?<=Ca*?)B ^ 我的问题可能很明显,但我仍然会问:任何人都可以向我解释为什么测试1和2失败,测试3和4失败?我希望他们都失败,不是一半的工作,其中一半失败。 谢谢。 PS。我使用:Java版本1.6.0_14
查看Pattern.java的源代码可以看出,’*’和”被实现为Curly的实例(它是为curly操作符创建的对象)。所以,
a* 实现为 a{0,0x7FFFFFFF} 和 a+ 实现为 a{1,0x7FFFFFFF} 这就是为什么你看到完全相同的行为curlies和明星。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |