perl – 使用Marpa进行不正确的标记化
发布时间:2020-12-16 06:28:01 所属栏目:大数据 来源:网络整理
导读:我有一个相当大的Marpa语法(用于解析XPath),我遇到了令牌化的问题.我创建了一个最小的破解示例: use strict;use warnings;use Marpa::R2;my $grammar = Marpa::R2::Scanless::G-new( { source = ('END_OF_SOURCE'),:default ::= action = ::array :start :
我有一个相当大的Marpa语法(用于解析XPath),我遇到了令牌化的问题.我创建了一个最小的破解示例:
use strict; use warnings; use Marpa::R2; my $grammar = Marpa::R2::Scanless::G->new( { source => (<<'END_OF_SOURCE'),:default ::= action => ::array :start ::= Start Start ::= Child DoubleColon Token DoubleColon ~ '::' Child ~ 'child' Token ~ word | word ':' word word ~ [w]+ END_OF_SOURCE } ); my $reader = Marpa::R2::Scanless::R->new( { grammar => $grammar,trace_terminals => 1,} ); my $input = 'child::book'; $reader->read($input); 此脚本打印以下内容: Registering character U+0063 as symbol 10: [[w]] Registering character U+0063 as symbol 3: [[c]] Registering character U+0068 as symbol 10: [[w]] Registering character U+0068 as symbol 4: [[h]] Registering character U+0069 as symbol 10: [[w]] Registering character U+0069 as symbol 5: [[i]] Registering character U+006c as symbol 10: [[w]] Registering character U+006c as symbol 6: [[l]] Registering character U+0064 as symbol 10: [[w]] Registering character U+0064 as symbol 7: [[d]] Registering character U+003a as symbol 1: [[:]] Rejected lexeme @0-5: Token; value="child" Accepted lexeme @0-5: Child; value="child" Registering character U+0062 as symbol 10: [[w]] Error in SLIF G1 read: No lexeme found at position 6 * String before error: child:: * The error was at line 1,column 8,and at character 0x0062 'b',... * here: book 我希望输入被标记为[Child] [DoubleColon] [word].如终端跟踪显示,只读取和处理一个冒号字符.似乎它试图将字符串的开头标记为[word] [‘:’] [word]并且在中途失败.如果删除语法的第10行(| word’:’word),将不再抛出错误. 我尝试为DoubleColon创建优先级(:lexeme~< DoubleColon> priority> 1),但这不起作用.有人能告诉我该怎么做才能使这个语法正确解析输入字符串吗?它仍然需要能够解析child :: ns:book等. 解决方法
这似乎是Marpa :: R2当前版本2.058中的一个错误.我很抱歉并感谢您仔细撰写问题.
我有一个修复程序,它通过了测试套件,我很快就会发布一个新版本. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |