加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > 安全 > 正文

如何匹配bash正则表达式中的“什么都没有”?

发布时间:2020-12-15 22:56:16 所属栏目:安全 来源:网络整理
导读:我无法使用bash正则表达式捕获此格式字符串中的数字(t | b | bug_ | task_ |)1234.以下不起作用: [[ $current_branch =~ ^(t|b|bug_|task_|)([0-9]+) ]] 但是一旦我把它改成这样的东西: [[ $current_branch =~ ^(t|b|bug_|task_)([0-9]+) ]] 它有效,但当然
我无法使用bash正则表达式捕获此格式字符串中的数字(t | b | bug_ | task_ |)1234.以下不起作用:

[[ $current_branch =~ ^(t|b|bug_|task_|)([0-9]+) ]]

但是一旦我把它改成这样的东西:

[[ $current_branch =~ ^(t|b|bug_|task_)([0-9]+) ]]

它有效,但当然是错误的,因为它没有涵盖没有前缀的情况.我知道在这种情况下我能做到

[[ $current_branch =~ ^(t|b|bug_|task_)?([0-9]+) ]]

并获得相同的结果,但我想知道为什么第二个例子不起作用.例如,正则表达式似乎在Ruby中工作正常.

(这是在GNU bash上,版本3.2.48(1)-release(x86_64-apple-darwin11),OSX Lion)

解决方法

我确信正则表达式的工作版本和非工作版本之间的区别在于阅读 regex (7)的不同方式.我将引用整个相关部分,因为我认为它涉及到问题的核心:

Regular expressions (“RE”s),as defined in POSIX.2,come in two forms: modern
REs (roughly those of egrep; POSIX.2 calls these “extended” REs) and obsolete
REs (roughly those of ed(1); POSIX.2 “basic” REs). Obsolete REs mostly exist
for backward compatibility in some old programs; they will be discussed at the
end. POSIX.2 leaves some aspects of RE syntax and semantics open; “(!)” marks
decisions on these aspects that may not be fully portable to other POSIX.2
implementations.

A (modern) RE is one(!) or more nonempty(!) branches,separated by ‘|’. It
matches anything that matches one of the branches.

A branch is one(!) or more pieces,concatenated. It matches a match for the
first,followed by a match for the second,etc.

A piece is an atom possibly followed by a single(!) ‘*’,‘+’,‘?’,or bound.
An atom followed by ‘*’ matches a sequence of 0 or more matches of the atom.
An atom followed by ‘+’ matches a sequence of 1 or more matches of the atom.
An atom followed by ‘?’ matches a sequence of 0 or 1 matches of the atom.

A bound is ‘{‘ followed by an unsigned decimal integer,possibly followed by
‘,’ possibly followed by another unsigned decimal integer,always followed by
‘}’. The integers must lie between 0 and RE_DUP_MAX (255(!)) inclusive,and
if there are two of them,the first may not exceed the second. An atom
followed by a bound containing one integer i and no comma matches a sequence
of exactly i matches of the atom. An atom followed by a bound containing one
integer i and a comma matches a sequence of i or more matches of the atom. An
atom followed by a bound containing two integers i and j matches a sequence of
i through j (inclusive) matches of the atom.

An atom is a regular expression enclosed in “()” (matching a match for the
regular expression),an empty set of “()” (matching the null string)(!),a
bracket expression (see below),‘.’ (matching any single character),‘^’
(matching the null string at the beginning of a line),‘$’ (matching the null
string at the end of a line),a ‘’ followed by one of the characters
“^.[$()|*+?{” (matching that character taken as an ordinary character),a ‘’
followed by any other character(!) (matching that character taken as an
ordinary character,as if the ‘’ had not been present(!)),or a single
character with no other significance (matching that character). A ‘{‘
followed by a character other than a digit is an ordinary character,not the
beginning of a bound(!). It is illegal to end an RE with ‘’.

好的,这里有很多东西需要打开包装.首先,请注意“(!)”符号表示存在开放或不可移植的问题.

关键问题在下一段:

A (modern) RE is one(!) or more nonempty(!) branches,separated by ‘|’.

你的情况是你有一个空分支.正如您从“(!)”中看到的那样,空分支是一个开放或不可移植的问题.我认为这就是为什么它适用于某些系统但不适用于其他系统. (我在Cygwin 4.1.10(4)上测试了它 – 发布它并不起作用,然后在Linux 3.2.25(1)上发布 – 它发生了.这两个系统有相同但不完全相同的手册页regex7.)

假设分支必须是非空的,分支可以是一个分支,可以是一个原子.

原子可以是“空集”()“(匹配空字符串)(!)”. < sarcasm>嗯,这真的很有帮助.< / sarcasm>因此,POSIX指定空字符串的正则表达式,即(),但也附加“(!)”,表示这是一个开放的问题,或者不是可移植的.

因为您正在寻找的是与空字符串匹配的分支,请尝试

[[ $current_branch =~ ^(t|b|bug_|task_|())([0-9]+) ]]

它使用()正则表达式来匹配空字符串. (这对我来说在我的Cygwin 4.1.10(4) – 释放shell中,你的原始正则表达式没有.)

然而,虽然(希望)这个建议在您当前的设置中对您有用,但不能保证它是可移植的.抱歉让人失望.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读