加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

正则表达式

发布时间:2020-12-13 21:55:17 所属栏目:百科 来源:网络整理
导读:正则表达式(regular expression)用于指定字符串模式,可以在任何需要定位匹配某种特定模式的字符串的情况下使用正则表达式。 grep,Perl,Tcl,Python,PHP,and awk都提供了正则表达式的功能特性,但各自语法都不尽相同,java.util.regex和Perl语法最为相似。j

正则表达式(regular expression)用于指定字符串模式,可以在任何需要定位匹配某种特定模式的字符串的情况下使用正则表达式。

grep,Perl,Tcl,Python,PHP,and awk都提供了正则表达式的功能特性,但各自语法都不尽相同,java.util.regex和Perl语法最为相似。java.util.regexpackage 主要有三个类组成:Pattern,Matcher and PatternSyntaxException.

----APatternobject is a compiled representation of a regular expression. ThePatternclass provides no public constructors. To create a pattern,you must first invoke one of itspublic static compilemethods,which will then return aPatternobject. These methods accept a regular expression as the first argument;

----AMatcherobject is the engine that interprets the pattern and performs match operations against an input string. Like thePatternclass,Matcherdefines no public constructors. You obtain aMatcherobject by invoking thematchermethod on aPatternobject.

----APatternSyntaxExceptionobject is an unchecked exception that indicates a syntax error in a regular expression pattern.

metacharacters(通配符)
java正则表达式支持的有:< ( [ { ^ - = $ ! | ] } ) ? * + . >。有两种方法可以强制让通配符被作为普通字符处理:
  • precede the metacharacter with a backslash,or
  • enclose it withinQ(which starts the quote) andE(which ends it).
在java的正则表达式中,的意思是“我要插入一个正则表达式的反斜线,所以其后的字符具有特殊的意义。”例如,如果你想表示一位数字,那么正则表达式应该是d。如果你想插入一个普通的反斜线,则应该这样\。不过换行和制表符之类(总称空白符s)的东西只需要使用单反斜线:nt。

character classes(字符串类)

这篇文章阐述了反向引用。
http://blog.csdn.net/aspirinvagrant/article/details/48949047

正则表达式的0长度匹配???????????????


Differences Among Greedy,Reluctant,and Possessive Quantifiers

There are subtle differences among greedy,reluctant,and possessive quantifiers.

Greedy quantifiers are considered "greedy" because they force the matcher to read in,oreat,the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails,the matcher backs off the input string by one character and tries again,repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression,the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers,however,take the opposite approach: They start at the beginning of the input string,then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

Finally,the possessive quantifiers always eat the entire input string,trying once (and only once) for a match. Unlike the greedy quantifiers,possessive quantifiers never back off,even if doing so would allow the overall match to succeed.

To illustrate,consider the input stringxfooxxxxxxfoo.

 
Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier.*to find "anything",zero or more times,followed by the letters"f" "o" "o". Because the quantifier is greedy,the.*portion of the expression first eats the entire input string. At this point,the overall expression cannot succeed,because the last three letters ("f" "o" "o") have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated,at which point the match succeeds and the search ends.

The second example,is reluctant,so it starts by first consuming "nothing". Because "foo" doesn't appear at the beginning of the string,it's forced to swallow the first letter (an "x"),which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case,the entire input string is consumed by.*+,leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读