正则表达式 – 如何在Node / V8中实现正则表达式匹配？

发布时间：2020-12-14 05:38:05 所属栏目：百科来源：网络整理

导读：我遇到过 an article,它表明正则表达式匹配通常是使用潜在表现不佳的算法而不是建议的Thompson NFA算法实现的. 考虑到这一点,如何在Node或V8中实现？是否有可能使用Thompson NFA的JS实现来提高性能,可能只使用了有限的一部分功能(可能删除了前瞻或其他“高级

我遇到过 an article,它表明正则表达式匹配通常是使用潜在表现不佳的算法而不是建议的Thompson NFA算法实现的.

考虑到这一点,如何在Node或V8中实现？是否有可能使用Thompson NFA的JS实现来提高性能,可能只使用了有限的一部分功能(可能删除了前瞻或其他“高级”功能)？

解决方法

正如Chrome的开发团队在 announcement中所提到的,V8引擎使用 Irregexp正则表达式引擎：

以下是有关此引擎实现的一些引用：

A fundamental decision we made early in the design of Irregexp was
that we would be willing to spend extra time compiling a regular
expression if that would make running it faster. During compilation
Irregexp first converts a regexp into an intermediate automaton
representation. This is in many ways the “natural” and most accessible
representation and makes it much easier to analyze and optimize the
regexp. For instance,when compiling /Sun|Mon/ the automaton
representation lets us recognize that both alternatives have an ‘n’ as
their third character. We can quickly scan the input until we find an
‘n’ and then start to match the regexp two characters earlier.
Irregexp looks up to four characters ahead and matches up to four
characters at a time.

After optimization we generate native machine code which uses
backtracking to try different alternatives. Backtracking can be
time-consuming so we use optimizations to avoid as much of it as we
can. There are techniques to avoid backtracking altogether but the
nature of regexps in JavaScript makes it difficult to apply them in
our case,though it is something we may implement in the future.

因此V8会编译为本机自动机表示 – 尽管它不使用Thompson NFA.

至于性能,this article将V8正则表达式性能与其他库/语言进行了比较.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!