perl – 在Marpa :: R2 :: Scanless中防止天真最长的令牌匹配
在Marpa解析器中的
Scanless Interface(SLIF)的当前实现中,词法分析器似乎以下列方式执行最长令牌匹配(LTM):
>尝试在输入中的当前位置匹配所有终端符号. 当我的语法包含与最长子字符串匹配但不能在当前位置发生的标记时,这会产生令人沮丧的解析失败.请考虑以下代码: #!/usr/bin/env perl use strict; use warnings; use feature qw/say/; use utf8; use Marpa::R2; use Data::Dump; my @data = ('! key : value','! key:value'); my $grammar = Marpa::R2::Scanless::G->new({ source => &;<'END_GRAMMAR',:default ::= action => [values] :start ::= record :discard ~ ws ws ~ [s]+ record ::= ('!') key (':') value key ~ [w]+ value ~ [^s]+ END_GRAMMAR }); for my $data (@data) { my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar,trace_terminals => 0,# set this to "1" to see how the tokens are recognized }); $recce->read($data); my $val = $recce->value // die "no parse"; say ">> $data"; dd $$val; } 这会产生输出: >> ! key : value ["key","value"] Error in SLIF G1 read: No lexemes accepted at position 2 * Error was at end of input * String before error: ! key:value Marpa::R2 exception at marpa.pl line 33. 预期产量: >> ! key : value ["key","value"] >> ! key:value ["key","value"] 之后!被认可,必须遵循一个关键的标记.在此位置的lexing期间,值标记匹配最长的子字符串键:值虽然它不能出现在此位置.因此,解析失败. 问题:如果不编写手册词法分析器,是否可以实现预期的输出? (我知道词法分析器可以向识别器查询预期的令牌,并且可以将自己限制为只匹配这些令牌,但我不知道如何说服SLIF为我这样做.) 我在perl5 v16.2上运行Marpa :: R2 v2.064 编辑 根据Jeffrey Kegler的建议,我实现了一个规则,它总是匹配一个比普通值更长的子字符串,因此是首选.使用暂停事件,然后我可以手动解析它,虽然我必须保持一个幻像规则以获得正确的语义. 这是完整的,更新的代码,包括.事件处理和更新的测试用例: #!/usr/bin/env perl use strict; use warnings; use feature qw/say/; use utf8; use Marpa::R2; use Data::Dump; my @data = ('! key : value','! key:value','! key :value','! key: value'); my $grammar = Marpa::R2::Scanless::G->new({ source => &;<'END_GRAMMAR',:default ::= action => [values] :start ::= Record :discard ~ ws ws ~ [s]+ Record ::= ('!') Key (<Op colon>) Value # not directly used | ('!') KeyValue Key ~ key Value ~ value KeyValue~ key <ws any> ':' <ws any> value :lexeme ~ KeyValue pause => before event => 'before KeyValue' <Op colon> ~ ':' key ~ [w]+ value ~ [^s]+ <ws any>~ [s]* END_GRAMMAR }); my %events = ( 'before KeyValue' => sub { my ($recce,$string,$start,$length) = @_; my ($k,$o,$v) = split /(s*:s*)/,2; say STDERR qq(k="$k" o="$o" v="$v"); my $pos = $start; $recce->lexeme_read('Key' => $pos,length($k),$k); $pos += length $k; $recce->lexeme_read('Op colon' => $pos,length($o),$o); $pos += length $o; $recce->lexeme_read('Value' => $pos,length($v),$v); },); for my $data (@data) { my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar,}); my $length = length $data; for ( my $pos = $recce->read($data); $pos < $length; $pos = $recce->resume() ) { say STDERR "pause"; my ($start,$length) = $recce->pause_span(); my $str = substr $data,$length; for my $event_data (@{ $recce->events }) { my ($name) = @$event_data; my $code = $events{$name} // die "no code for event $name"; $recce->$code($str,$length); } } my $val = $recce->value // die "no parse"; say ">> $data"; dd $$val; } 这产生了 >> ! key : value ["key","value"] >> ! key :value ["key","value"] >> ! key: value ["key","value"] 这是预期的行为. 解决方法
请注意,自
2.079_015版以来,Marpa支持
Longest Acceptable Tokens Matching的概念,这意味着只需添加:
lexeme default = forgiving => 1 你的语法会产生预期的输出.即: #!env perl -w use strict; use Marpa::R2; use Data::Dump; use feature qw/say/; my $grammar = Marpa::R2::Scanless::G->new({source => do {local $/; <DATA>}}); my @data = ('! key : value','! key: value'); foreach (@data) { my $r = Marpa::R2::Scanless::R->new({grammar => $grammar}); $r->read($_); my $val = $r->value; say ">> $_"; dd $$val; } __DATA__ :default ::= action => [values] lexeme default = forgiving => 1 :start ::= record :discard ~ ws ws ~ [s]+ record ::= ('!') key (':') value key ~ [w]+ value ~ [^s]+ 会给: >> ! key : value ["key","value"] (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |