加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

perl – 使用Marpa进行不正确的标记化

发布时间:2020-12-16 06:28:01 所属栏目:大数据 来源:网络整理
导读:我有一个相当大的Marpa语法(用于解析XPath),我遇到了令牌化的问题.我创建了一个最小的破解示例: use strict;use warnings;use Marpa::R2;my $grammar = Marpa::R2::Scanless::G-new( { source = ('END_OF_SOURCE'),:default ::= action = ::array :start :
我有一个相当大的Marpa语法(用于解析XPath),我遇到了令牌化的问题.我创建了一个最小的破解示例:

use strict;
use warnings;
use Marpa::R2;

my $grammar = Marpa::R2::Scanless::G->new(
    {
        source => (<<'END_OF_SOURCE'),:default ::= action => ::array
            :start ::= Start

            Start  ::= Child DoubleColon Token

            DoubleColon ~ '::'
            Child ~ 'child'
            Token ~
                word
                | word ':' word
            word ~ [w]+

END_OF_SOURCE
    }
);
my $reader = Marpa::R2::Scanless::R->new(
    {
        grammar => $grammar,trace_terminals => 1,}
);

my $input = 'child::book';
$reader->read($input);

此脚本打印以下内容:

Registering character U+0063 as symbol 10: [[w]]
Registering character U+0063 as symbol 3: [[c]]
Registering character U+0068 as symbol 10: [[w]]
Registering character U+0068 as symbol 4: [[h]]
Registering character U+0069 as symbol 10: [[w]]
Registering character U+0069 as symbol 5: [[i]]
Registering character U+006c as symbol 10: [[w]]
Registering character U+006c as symbol 6: [[l]]
Registering character U+0064 as symbol 10: [[w]]
Registering character U+0064 as symbol 7: [[d]]
Registering character U+003a as symbol 1: [[:]]
Rejected lexeme @0-5: Token; value="child"
Accepted lexeme @0-5: Child; value="child"
Registering character U+0062 as symbol 10: [[w]]
Error in SLIF G1 read: No lexeme found at position 6
* String before error: child::
* The error  was at line 1,column 8,and at character 0x0062 'b',...
* here: book

我希望输入被标记为[Child] [DoubleColon] [word].如终端跟踪显示,只读取和处理一个冒号字符.似乎它试图将字符串的开头标记为[word] [‘:’] [word]并且在中途失败.如果删除语法的第10行(| word’:’word),将不再抛出错误.

我尝试为DoubleColon创建优先级(:lexeme~< DoubleColon> priority> 1),但这不起作用.有人能告诉我该怎么做才能使这个语法正确解析输入字符串吗?它仍然需要能够解析child :: ns:book等.

解决方法

这似乎是Marpa :: R2当前版本2.058中的一个错误.我很抱歉并感谢您仔细撰写问题.

我有一个修复程序,它通过了测试套件,我很快就会发布一个新版本.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读