c# – 如何在Regex中指定匹配模式的优先级？

发布时间：2020-12-15 17:14:03 所属栏目：百科来源：网络整理

导读：我正在编写一个函数解析引擎,它使用正则表达式来分隔各个术语(定义为常量或变量,后跟(可选)由运算符).它工作得很好,除非我在其他分组术语中对术语进行分组.这是我正在使用的代码： //This matches an opening delimiterRegex openers = new Regex("[[{

我正在编写一个函数解析引擎,它使用正则表达式来分隔各个术语(定义为常量或变量,后跟(可选)由运算符).它工作得很好,除非我在其他分组术语中对术语进行分组.这是我正在使用的代码：

//This matches an opening delimiter
Regex openers = new Regex("[[{(]");

//This matches a closing delimiter
Regex closers = new Regex("[]})]");

//This matches the name of a variable (w+) or a constant numeric value (d+(.d+)?)
Regex VariableOrConstant = new Regex("((d+(.d+)?)|w+)" + FunctionTerm.opRegex + "?");

//This matches the binary operators +,*,-,or /
Regex ops = new Regex("[*+-/]");

//This compound Regex finds a single variable or constant term (including a proceeding operator,//if any) OR a group containing multiple terms (and their proceeding operators,if any)
//and a proceeding operator,if any.
//Matches that match this second pattern need to be added to the function as sub-functions,//not as individual terms,to ensure the correct evalutation order with parentheses.
Regex splitter = new Regex(
openers + 
"(" + VariableOrConstant + ")+" + closers + ops + "?" +
"|" +
"(" + VariableOrConstant + ")" + ops + "?");

当“分离器”与字符串“4 /(2 * X * [2 1])”匹配时,匹配的值为“4 /”,“2 *”,“X *”,“2”和“ 1“,完全忽略所有分隔括号和括号.我相信这是因为“分离器”正则表达式的后半部分(“|”之后的部分)正在匹配并覆盖模式的其他部分.这很糟糕 – 我希望分组表达式优先于单个术语.有谁知道我怎么做到这一点？我考虑使用正面/负面的前瞻和外观,但我真的不确定如何使用这些,或者它们甚至是什么,就此而言,我找不到任何相关的例子……在此先感谢.

解决方法

你没有告诉我们你是如何应用正则表达式的,所以这是一个我掀起的演示：

private static void ParseIt(string subject)
{
  Console.WriteLine("subject : {0}n",subject);

  Regex openers = new Regex(@"[[{(]");
  Regex closers = new Regex(@"[]})]");
  Regex ops = new Regex(@"[*+/-]");
  Regex VariableOrConstant = new Regex(@"((d+(.d+)?)|w+)" + ops + "?");

  Regex splitter = new Regex(
    openers + @"(?<FIRST>" + VariableOrConstant + @")+" + closers + ops + @"?" +
    @"|" +
    @"(?<SECOND>" + VariableOrConstant + @")" + ops + @"?",RegexOptions.ExplicitCapture
  );

  foreach (Match m in splitter.Matches(subject))
  {
    foreach (string s in splitter.GetGroupNames())
    {
      Console.WriteLine("group {0,-8}: {1}",s,m.Groups[s]);
    }
    Console.WriteLine();
  }
}

输出：

subject : 4/(2*X*[2+1])

group 0       : 4/
group FIRST   :
group SECOND  : 4/

group 0       : 2*
group FIRST   :
group SECOND  : 2*

group 0       : X*
group FIRST   :
group SECOND  : X*

group 0       : [2+1]
group FIRST   : 1
group SECOND  :

如您所见,正如您所预期的那样,术语[2 1]与正则表达式的第一部分相匹配.它无法做任何事情(但是,因为之后的下一个包围角色是另一个“开启者”([),并且它正在寻找“更接近”.

您可以使用.NET的“平衡匹配”功能来允许其他组中包含的分组术语,但这不值得.正则表达式不是为解析而设计的 – 实际上,解析和正则表达式匹配是根本不同的操作类型.这是差异的一个很好的例子：正则表达式主动寻找匹配,跳过它不能使用的任何东西(比如你的例子中的开括号),但解析器必须检查每个字符(即使它只是为了决定忽略它).

关于演示：我尝试进行必要的最小功能更改以使代码工作(这就是为什么我没有纠正将捕获组放在外部的错误),但我也进行了几次表面更改,这些代表了积极的建议.以机智：

>在C#中创建正则表达式时,始终使用逐字字符串文字(@“…”)(我认为原因很明显).
>如果您正在使用捕获组,请尽可能使用命名组,但不要在同一个正则表达式中使用命名组和编号组.命名组可以省去跟踪捕获的位置的麻烦,并且ExplicitCapture选项可以保存您在需要非捕获组的地方使用(？：…)来混淆正则表达式.

最后,从一堆较小的正则表达式构建大型正则表达式的整个方案对IMO的用处非常有限.跟踪部件之间的相互作用非常困难,例如哪个部件在哪个部件内. C#的逐字字符串的另一个优点是它们是多行的,因此您可以利用自由间隔模式(a.k.a./x或COMMENTS模式)：

Regex r = new Regex(@"
    (?<GROUPED>
      [[{(]                  # opening bracket
      (                      # group containing:
        ((d+(.d+)?)|w+)     # number or variable
        [*+/-]?                 # and proceeding operator
      )+                     # ...one or more times
      []})]                  # closing bracket
      [*+/-]?                # and proceeding operator
    )
    |
    (?<UNGROUPED>
      ((d+(.d+)?)|w+)    # number or variable
      [*+/-]?                # and proceeding operator
    )
    ",RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
  );

这不是解决您的问题的方法;正如我所说,这不是正则表达式的工作.这只是一些有用的正则表达式技术的演示.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!