加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

leetcode-Regular Expression Matching

发布时间:2020-12-13 23:09:11 所属栏目:百科 来源:网络整理
导读:Implement regular expression matching with support for '.' and '*' . '.' Matches any single character.'*' Matches zero or more of the preceding element.The matching should cover the entire input string (not partial).The function prototype

Implement regular expression matching with support for'.'and'*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s,const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa","a*") → true
isMatch("aa",".*") → true
isMatch("ab",".*") → true
isMatch("aab","c*a*b") → true

注意:这里的a*表示a可以重复0次或者多次,不是a和*分开的。

It seems that some readers are confused about why the regex pattern".*"matches the string"ab".".*"means repeat theprecedingelement 0 or more times. Here,the"preceding"element is thedotcharacter in thepattern,which can match any characters. Therefore,the regex pattern".*"allows the dot to be repeated any number of times,which matches any string (even an empty string). Think carefully how you would do matching of'*'.Please note that'*'in regular expression isdifferentfrom wildcard matching,as we match the previous character 0 or more times. But,how many times? If you are stuck,recursion is your friend.

[cpp] view plain copy print ?
  1. boolisMatch(constchar*s,constchar*p){
  2. //StarttypingyourC/C++solutionbelow
  3. //DONOTwriteintmain()function
  4. if(*p==0)return*s==0;
  5. if(*(p+1)!='*')
  6. {
  7. if(*s!=0&&(*p==*s||*p=='.'))returnisMatch(s+1,p+1);
  8. elsereturnfalse;
  9. }
  10. else
  11. {
  12. //*s==*p
  13. while(*s!=0&&(*s==*p||*p=='.'))
  14. {
  15. if(isMatch(s,p+2))returntrue;
  16. s++;
  17. }
  18. return(isMatch(s,p+2));
  19. }
  20. }


[cpp] view plain copy print ?
  1. boolisMatch(constchar*s,constchar*p){
  2. assert(s&&p);
  3. if(*p=='')return*s=='';
  4. //nextcharisnot'*':mustmatchcurrentcharacter
  5. if(*(p+1)!='*'){
  6. assert(*p!='*');
  7. return((*p==*s)||(*p=='.'&&*s!=''))&&isMatch(s+1,p+1);
  8. }
  9. //nextcharis'*'
  10. while((*p==*s)||(*p=='.'&&*s!='')){
  11. if(isMatch(s,p+2))returntrue;
  12. s++;
  13. }
  14. returnisMatch(s,p+2);
  15. }


This problem is a tricky one. Due to the huge number of edge cases,many people would write lengthy code and have numerous bugs on their first try. Try your best getting your code correct first,then refactor mercilessly to as clean and concise as possible!


A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX's grep tool. Please note that this post does not attempt to descibe a solution using DFA.

Solution:
This looks just like a straight forward string matching,isn't it? Couldn't we just match the pattern and the input string character by character? The question is,how to match a'*'?

A natural way is to use a greedy approach; that is,we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

s="abbbc",p="ab*c"Assume we have matched the first'a'on bothsandp. When we see"b*"inp,we skip all b's ins. Since the last'c'matches on both side,they both match.

s="ac",238)">"ab*c"After the first'a',we see that there is no b's to skip for"b*". We match the last'c'on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s="abbc",238)">"ab*bbc"When we seeOne might be tempted to think of a quick workaround. How about counting the number of consecutive b's ins? If it is smaller or equal to the number of consecutive b's afterThis seem to solve the above problem,but how about this case:s="abcbcd",238)">"a.*c.*d"

Here,".*"inpmeans repeat'.'0 or more times. Since'.'can match any character,it is not clear how many times'.'should be repeated. Should the'c'inpmatches the first or second'c'ins? Unfortunately,there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails,we return to the last successful matching state and attempt to match more characters inswith'*'. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

  1. If the next character ofpisNOT'*',then it must match the current character ofs. Continue pattern matching with the next character of bothsandp.
  2. If the next character ofpis

You would need to consider the base case carefully too. That would be left as an exercise to the reader. :)


Further Thoughts:
Some extra exercises to this problem:

  1. If you think carefully,you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
  2. Try to implement partial matching instead of full matching. In addition,add'^'and'$'to the rule.'^'matches the starting position within the string,while'$'matches the ending position of the string.
  3. Try to implement wildcard matching where'*'means any sequence of zero or more characters.

For the interested reader,real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it,you may readthis article.



ref:

http://discuss.leetcode.com/questions/175/regular-expression-matching

http://leetcode.com/2011/09/regular-expression-matching.html

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读