正则表达式在Java中没有明显的最大长度

发布时间：2020-12-14 06:38:51 所属栏目：百科来源：网络整理

导读：我一直认为，Java的regex-API(以及许多其他语言)中的后瞻性断言必须有明显的长度。因此，STAR和PLUS量词不允许在look-behinds内部使用。优秀的在线资源regular-expressions.info似乎证实(一些)我的假设： “[…] Java takes things a step further by allow

我一直认为，Java的regex-API(以及许多其他语言)中的后瞻性断言必须有明显的长度。因此，STAR和PLUS量词不允许在look-behinds内部使用。

优秀的在线资源regular-expressions.info似乎证实(一些)我的假设：

“[…] Java takes things a step further by
allowing finite repetition. You still
cannot use the star or plus,but you
can use the question mark and the
curly braces with the max parameter
specified. Java recognizes the fact
that finite repetition can be
rewritten as an alternation of strings
with different,but fixed lengths.
Unfortunately,the JDK 1.4 and 1.5
have some bugs when you use
alternation inside lookbehind. These
were fixed in JDK 1.6. […]”

— 07001

使用大括号只要look-behind内的字符范围的总长度小于或等于Integer.MAX_VALUE即可。所以这些正则表达式是有效的：

"(?<=a{0,"   +(Integer.MAX_VALUE)   + "})B"
"(?<=Ca{0,"  +(Integer.MAX_VALUE-1) + "})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-2) + "})B"

但这些不是：

"(?<=Ca{0,"  +(Integer.MAX_VALUE)   +"})B"
"(?<=CCa{0," +(Integer.MAX_VALUE-1) +"})B"

但是，我不明白以下：

当我使用*和量词在后台中运行测试时，一切都很好(见输出测试1和测试2)。

但是，当我从测试1和测试2开始添加单个字符时，它将中断(见输出测试3)。

使来自测试3的贪心*没有效果，它仍然断裂(见测试4)。

这里是测试工具：

public class Main {

    private static String testFind(String regex,String input) {
        try {
            boolean returned = java.util.regex.Pattern.compile(regex).matcher(input).find();
            return "testFind       : Valid   -> regex = "+regex+",input = "+input+",returned = "+returned;
        } catch(Exception e) {
            return "testFind       : Invalid -> "+regex+","+e.getMessage();
        }
    }

    private static String testReplaceAll(String regex,String input) {
        try {
            String returned = input.replaceAll(regex,"FOO");
            return "testReplaceAll : Valid   -> regex = "+regex+",returned = "+returned;
        } catch(Exception e) {
            return "testReplaceAll : Invalid -> "+regex+","+e.getMessage();
        }
    }

    private static String testSplit(String regex,String input) {
        try {
            String[] returned = input.split(regex);
            return "testSplit      : Valid   -> regex = "+regex+",returned = "+java.util.Arrays.toString(returned);
        } catch(Exception e) {
            return "testSplit      : Invalid -> "+regex+","+e.getMessage();
        }
    }

    public static void main(String[] args) {
        String[] regexes = {"(?<=a*)B","(?<=a+)B","(?<=Ca*)B","(?<=Ca*?)B"};
        String input = "CaaaaaaaaaaaaaaaBaaaa";
        int test = 0;
        for(String regex : regexes) {
            test++;
            System.out.println("********************** Test "+test+" **********************");
            System.out.println("    "+testFind(regex,input));
            System.out.println("    "+testReplaceAll(regex,input));
            System.out.println("    "+testSplit(regex,input));
            System.out.println();
        }
    }
}

输出：

********************** Test 1 **********************
    testFind       : Valid   -> regex = (?<=a*)B,input = CaaaaaaaaaaaaaaaBaaaa,returned = true
    testReplaceAll : Valid   -> regex = (?<=a*)B,returned = CaaaaaaaaaaaaaaaFOOaaaa
    testSplit      : Valid   -> regex = (?<=a*)B,returned = [Caaaaaaaaaaaaaaa,aaaa]

********************** Test 2 **********************
    testFind       : Valid   -> regex = (?<=a+)B,returned = true
    testReplaceAll : Valid   -> regex = (?<=a+)B,returned = CaaaaaaaaaaaaaaaFOOaaaa
    testSplit      : Valid   -> regex = (?<=a+)B,aaaa]

********************** Test 3 **********************
    testFind       : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
      ^
    testReplaceAll : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
      ^
    testSplit      : Invalid -> (?<=Ca*)B,Look-behind group does not have an obvious maximum length near index 6
(?<=Ca*)B
      ^

********************** Test 4 **********************
    testFind       : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
       ^
    testReplaceAll : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
       ^
    testSplit      : Invalid -> (?<=Ca*?)B,Look-behind group does not have an obvious maximum length near index 7
(?<=Ca*?)B
       ^

我的问题可能很明显，但我仍然会问：任何人都可以向我解释为什么测试1和2失败，测试3和4失败？我希望他们都失败，不是一半的工作，其中一半失败。

谢谢。

PS。我使用：Java版本1.6.0_14

查看Pattern.java的源代码可以看出，’*’和”被实现为Curly的实例(它是为curly操作符创建的对象)。所以，

a*

实现为

a{0,0x7FFFFFFF}

和

a+

实现为

a{1,0x7FFFFFFF}

这就是为什么你看到完全相同的行为curlies和明星。

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!