使用split进行分割时遇到特殊字符的问题
使用split分割时: String[] a="aa|bb|cc".split("|"output:
[a,a,|,b,|,c,c] 先看一下split的用法: Splits <span style="color: #0000ff;">this<span style="color: #000000;"> string around matches of the given regular expression.
This method works as <span style="color: #0000ff;">if by invoking the two-<span style="color: #000000;">argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array. The string "boo:and:foo",<span style="color: #0000ff;">for<span style="color: #000000;"> example,yields the following results with these expressions: Regex Result Parameters: 可以看到split中参数是一个正则表达式,正则表达式中有一些特殊字符需要注意,它们有自己的用法: http://www.fon.hum.uva.nl/praat/manual/Regular_expressions_1__Special_characters.html <span style="color: #ff0000;"> the backslash escape character.
The backslash gives special meaning to the character following it. For example,the combination "n" stands <span style="color: #0000ff;">for the newline,one of the control characters. The combination "w" stands <span style="color: #0000ff;">for a "word" character,one of the convenience escape sequences <span style="color: #0000ff;">while "1"<span style="color: #000000;"> is one of the substitution special characters. Example: The regex "aan" tries to match two consecutive "a"<span style="color: #000000;">s at the end of a line,inclusive the newline character itself. Example: "a+" matches "a+" and not a series of one or "a"<span style="color: #000000;">s. <span style="color: #ff0000;">^<span style="color: #000000;"> the caret is the start of line anchor or the negate symbol. Example: "^a" matches "a"<span style="color: #000000;"> at the start of a line. Example: "[^0-9]"<span style="color: #000000;"> matches any non digit. <span style="color: #ff0000;">$ the dollar is the end of line anchor. Example: "b$" matches a "b"<span style="color: #000000;"> at the end of a line. Example: "^b$"<span style="color: #000000;"> matches the empty line. <span style="color: #ff0000;">{ } the open and close curly bracket are used as range quantifiers. Example: "a{2,3}" matches "aa" or "aaa"<span style="color: #000000;">. <span style="color: #ff0000;">[ ] the open and close square bracket define a character <span style="color: #0000ff;">class<span style="color: #000000;"> to match a single character. The "^" as the first character following the "[" negates and the match is <span style="color: #0000ff;">for the characters not listed. The "-" denotes a range of characters. Inside a "[ ]" character <span style="color: #0000ff;">class<span style="color: #000000;"> construction most special characters are interpreted as ordinary characters. Example: "[d-f]" is the same as "[def]" and matches "d","e" or "f"<span style="color: #000000;">. Example: "[a-z]"<span style="color: #000000;"> matches any lowercase characters in the alfabet. Example: "[^0-9]"<span style="color: #000000;"> matches any character that is not a digit. Example: A search <span style="color: #0000ff;">for "[][()?<>.?]" in the string "[]()?<>.?" followed by a replace string "r" has the result "rrrrrrrrrrrrr". Here the search string is one character <span style="color: #0000ff;">class<span style="color: #000000;"> and all the meta characters are interpreted as ordinary characters without the need to escape them. <span style="color: #ff0000;">( ) the open and close parenthesis are used <span style="color: #0000ff;">for<span style="color: #000000;"> grouping characters (or other regex). The groups can be referenced in both the search and the substitution phase. There also exist some special constructs with parenthesis. Example: "(ab)1" matches "abab"<span style="color: #000000;">. <span style="color: #ff0000;">. the dot matches any character except the newline. Example: ".a" matches two consecutive characters where the last one is "a"<span style="color: #000000;">. Example: "..txt$" matches all strings that end in ".txt"<span style="color: #000000;">. <span style="color: #ff0000;"> the star is the match-zero-or-<span style="color: #000000;">more quantifier. Example: "^.*$"<span style="color: #000000;"> matches an entire line. <span style="color: #ff0000;">+ the plus is the match-one-or-<span style="color: #000000;">more quantifier. ? the question mark is the match-zero-or-<span style="color: #000000;">one quantifier. The question mark is also used in special constructs with parenthesis and in changing match behaviour. <span style="color: #ff0000;">|<span style="color: #000000;"> the vertical pipe separates a series of alternatives. Example: "(a|b|c)a" matches "aa" or "ba" or "ca"<span style="color: #000000;">. <span style="color: #ff0000;">< ><span style="color: #000000;"> the smaller and greater signs are anchors that specify a left or right word boundary. <span style="color: #ff0000;">- the minus indicates a range in a character <span style="color: #0000ff;">class (when it is not at the first position after the "[" opening bracket or the last position before the "]"<span style="color: #000000;"> closing bracket. Example: "[A-Z]"<span style="color: #000000;"> matches any uppercase character. Example: "[A-Z-]" or "[-A-Z]" match any uppercase character or "-"<span style="color: #000000;">. <span style="color: #ff0000;">& the and is the "substitute complete match" symbol. 那么上述方法的解决方法是使用转义来分割: String[] a="aa|bb|cc".split("|");
小结: 对字符串的正则操作时要注意特殊字符的转义。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |