perl – 使用Parse :: RecDescent

发布时间：2020-12-15 23:21:52 所属栏目：大数据来源：网络整理

导读：我有以下输入 @Book{press,author = "Press,W. and Teutolsky,S. and Vetterling,W. and Flannery B.",title = "Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing",year = 2007,publisher = "Cambridge University Press"} 我必须为RecDe

我有以下输入

@Book{press,author    = "Press,W. and Teutolsky,S. and Vetterling,W. and Flannery B.",title     = "Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing",year      = 2007,publisher = "Cambridge University Press"
}

我必须为RecDescent解析器生成器编写语法.
输出中的数据应该针对xml结构进行修改,并且应该如下所示：

<book>
    <keyword>press</keyword>
    <author>Press,W.+Teutolsky,S.+Vetterling,W.+Flannery B.</author>
    <title>Numerical {R}ecipes in {C}: The {A}rt of {S}cientific {C}omputing</title>

    <year>2007</year>
    <publisher>Cambridge University Press</publisher>
</book>

应将附加和重复字段报告为错误(带有行号的正确消息,不再进行解析).我试着从这样的事情开始：

use Parse::RecDescent;

open(my $in,"<","parsing.txt")  or die "Can't open parsing.txt: $!";

my $text;
while (<$in>) {
    $text .= $_;
}

print $text;

my $grammar = q {
    beginning: "@Book{" keyword fields "}"           { print "<book>n",$item[2],$item[3],"</book>"; }
    keyword: /[a-zA-Z]+/ ","                            { return "    <keyword>".$item[1]."</keyword>n"; }
    fields: one "," two "," tree "," four               { return $item[1].$item[3].$item[5].$item[7]; }
    one: "author" "=" """ /[a-zA-Zs.,{}:]+/ """ {   $item[4] =~ s/sands/+/g;
                                                            return "    <author>",$item[4],"</author>n"; }
    two: "title" "=" """ /[a-zA-Zs.,{}:]+/ """  {   $item[4] =~ s/sands/+/g;
                                                            return "    <title>","</title>n"; }
    three: "year" "=" /[0-2][0-9][0-9][0-9]/            {   return "    <year>","</year>n"; }
    four: "publisher" "=" """ /[a-zA-Zs.,{}:]+/ """ 
                                                        {   $item[4] =~ s/sands/+/g;
                                                            return "    <publisher>","</publisher>n"; }
};

my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");

但我甚至不知道这是否是正确的方法.请帮忙.

还有一个小问题.输入处的标签可能不是特定顺序,但每个标签只能出现一次.我是否必须为(作者,标题,年份,出版商)的所有排列编写子规则？因为我提出了：

fields: field "," field "," field
field: one | two | three | four

但它显然不能阻止重复标签.

解决方法

首先,你有一个错字：树而不是三个.

我运行你的程序,但添加了以下行：

use strict;
use warnings; # you should always have strict and warnings on
$::RD_HINT = 1; # Parse::RecDescent hints
$::RD_TRACE = 1; # Parse::RecDescent trace

并获得此调试输出：

1|beginning |>>Matched terminal<< (return value:   |
  |          |[@Book{])                             |
 1|beginning |                                      |"press,n author = "Press,|          |                                      |W. and Teutolsky,S. and
  |          |                                      |Vetterling,W. and Flannery
  |          |                                      |B.",n title = "Numerical
  |          |                                      |{R}ecipes in {C}: The {A}rt
  |          |                                      |of {S}cientific
  |          |                                      |{C}omputing",n year =
  |          |                                      |2007,n publisher =
  |          |                                      |"Cambridge University
  |          |                                      |Press"n}n"
 1|beginning |Trying subrule: [keyword]             |
 2| keyword  |Trying rule: [keyword]                |
 2| keyword  |Trying production: [/[a-zA-Z]+/ ',']  |
 2| keyword  |Trying terminal: [/[a-zA-Z]+/]        |
 2| keyword  |>>Matched terminal<< (return value:   |
  |          |[press])                              |
 2| keyword  |                                      |",W. and
  |          |                                      |Teutolsky,n publisher =
  |          |                                      |"Cambridge University
  |          |                                      |Press"n}n"
 2| keyword  |Trying terminal: [',']                |
 2| keyword  |>>Matched terminal<< (return value:   |
  |          |[,])                                  |
 2| keyword  |                                      |"n author = "Press,n publisher =
  |          |                                      |"Cambridge University
  |          |                                      |Press"n}n"
 2| keyword  |Trying action                         |
 1|beginning |>>Matched subrule: [keyword]<< (return|
  |          |value: [    <keyword>press</keyword> ]|
 1|beginning |                                      |"press,n publisher =
  |          |                                      |"Cambridge University
  |          |                                      |Press"n}n"
 1|beginning |Trying subrule: [fields]              |
 2|  fields  |Trying rule: [fields]                 |
 2|  fields  |Trying production: [one ',' two ','   |
  |          |three ',' four]                       |
 2|  fields  |Trying subrule: [one]                 |
 3|   one    |Trying rule: [one]                    |
 3|   one    |Trying production: ['author' '=' '"' |
  |          |/[a-zA-Zs.,{}:]+/ '"']           |
 3|   one    |Trying terminal: ['author']           |
 3|   one    |<<Didn't match terminal>>             |
 3|   one    |<<Didn't match rule>>                 |
 2|  fields  |<<Didn't match subrule: [one]>>       |
 2|  fields  |<<Didn't match rule>>                 |
 1|beginning |<<Didn't match subrule: [fields]>>    |
 1|beginning |<<Didn't match rule>>                 |
Bad text! at parser.pl line 32,<$in> line 6.

这表明它已经陷入第一个阶段,并且按下,将被放回输入流.这是因为你使用return而不是$return =作为Parse :: RecDescent手册says you should.

此外,一旦分配给$return变量,就不能再返回列表,并且必须手动将字符串连接在一起.

这是最终结果：

use strict;
use warnings;
use Parse::RecDescent;

open(my $in,"parsing.txt")  or die "Can't open parsing.txt: $!";

my $text;
while (<$in>) {
    $text .= $_;
}

print $text;

my $grammar = q {
    beginning: "@Book{" keyword fields /s*}s*/           { print "<book>n","                            { $return = "    <keyword>$item[1]</keyword>n"; }
    fields: one /,s*/ two /,s*/ three /,s*/ four               { $return = $item[1].$item[3].$item[5].$item[7]; }
    one: "author" "=" """ /[a-zA-Zs.,{}:]+/ """ {   $item[4] =~ s/sands/+/g;
                                                            $return =  "    <author>$item[4]</author>n"; }
    two: "title" "=" """ /[a-zA-Zs.,{}:]+/ """  {   $item[4] =~ s/sands/+/g;
                                                            $return =  "    <title>$item[4]</title>n"; }
    three: "year" "=" /[0-2][0-9][0-9][0-9]/            {   $return = "    <year>$item[3]</year>n"; }
    four: "publisher" "=" """ /[a-zA-Zs.,{}:]+/ """ 
                                                        {   $item[4] =~ s/sands/+/g;
                                                            $return = "    <publisher>$item[4]</publisher>n"; }
};

my $parser = new Parse::RecDescent($grammar) or die ("Bad grammar!");
defined $parser->beginning($text) or die ("Bad text!");

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!