加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

【自动机】简单的正则表达式匹配

发布时间:2020-12-14 00:41:42 所属栏目:百科 来源:网络整理
导读:题目来源 https://leetcode.com/problems/regular-expression-matching/ 内容:实现一个能支持”.” “*”的模式匹配程序,并判断是否整个都匹配上。例如: isMatch ( "aa" , "a" ) - false isMatch ( "aab" , "c*a*b" ) - true 做法比较传统,用自动机来做

题目来源
https://leetcode.com/problems/regular-expression-matching/

内容:实现一个能支持”.” “*”的模式匹配程序,并判断是否整个都匹配上。例如:

isMatch("aa","a") -> false
isMatch("aab","c*a*b") -> true

做法比较传统,用自动机来做。

自动机形如下方(其表达式为c*aa*bb*):

当状态为s1的时候,如果此处读到的数据是a的话,可以进入状态s2。

那么对于 表达式为”a*ab”怎么办?
它既要适配”ab”,“aab”,“aaab”…
其实可以用一个栈stackSavePoint保存这种(a*)既可以读当前的数据,也可以直接跳过到下一个这种类型的节点,保存当前数据的index,然后直接将当前数据交由下一状态处理。如果遇到当前状态并不能处理,则从stackSavePoint栈中弹出一节点,当前状态转到这一节点,并将当前数据的指针指向(index + 1),也就是回退N-1步,效率有点低,但还是前进了一步。

具体代码如下:

class Solution {
public:

// c*cab

    struct Node {
        bool isEverything;
        bool isSelfContain;
        char accept;
        struct Node* son;
    };

    struct Node* addNode(struct Node* parent,char ch) { // return current node after added.

        struct Node* son = parent->son;

        if (ch == '*') {
            parent->isSelfContain = true;
            return parent;
        }

        struct Node* newSon = (Node*)malloc(sizeof(struct Node));
        newSon->accept = ch;
        newSon->isEverything = '.' == ch;
        newSon->isSelfContain = false;
        newSon->son = NULL;
        parent->son = newSon;
        return newSon;
    }

    // to merge something like a*a*a*a*b => a*b or .*a*b*c*d => .*d
    struct Node* mergeSameNeighbour(struct Node* parent) {
        struct Node* header = parent;
        while (parent && parent->son) {
            if (parent->isSelfContain && parent->son->isSelfContain) {
                bool isEverything = parent->isEverything || parent->son->isEverything;
                if (isEverything || parent->accept == parent->son->accept) {
                    parent->isEverything = isEverything;
                    struct Node* tmp = parent->son;
                    parent->son = parent->son->son;
                    free(tmp);
                }
            }
            parent = parent->son;
        }
        return header;
    }

    bool isMatch(string s,string p) {

        if (p.empty()) {
            return s.empty();
        }

        Node* emptyHeader = (Node*) malloc(sizeof(struct Node));
        Node* current = emptyHeader;
        int index = 0;
        int len = p.length();
        while (index < len) {
            current = addNode(current,p.at(index++));
        }
        mergeSameNeighbour(emptyHeader->son);
        index = 0;
        len = s.length();
        stack<Node*> savePoint;
        stack<int> saveIndex;
        current = emptyHeader->son;
        while (index < len) {
            char ch = s[index];
            if (!current) {
                goto helpme;
            }
            // printf("char[%d]: %c,current: %cn",index,ch,current->accept);
            if (current->isSelfContain) {
                if (current->isEverything) {
                    // call me out when you cannot handle it
                    savePoint.push(current);
                    saveIndex.push(index);
                } else if (current->accept == ch) {
                    savePoint.push(current);
                    saveIndex.push(index);
                } 
                current = current->son;
                continue;
            }

            if (current->isEverything || current->accept == ch) {
                index++;
                current = current->son;
                continue;
            }
            helpme:
            if (savePoint.size()) {
                current = savePoint.top();
                savePoint.pop();
                int lastIndex = saveIndex.top();
                saveIndex.pop();
                index = lastIndex + 1;
            } else {
                return false;
            }
        }
        free(emptyHeader);
        /* in here,the test word run out,so we just find whethere the rest is NULL,or all are selfContain,which can be regarded as 0 */
        while(current) {
            if (current->isSelfContain) {
                current = current->son;
            } else {
                return false;
            }
        }
        return !current;
    }
};

这里有个问题, 如果需要支持[a-zA-Z.*-+]等的话,估计在Node.accept 中修改成keymap的形式,估计问题应该不是很大。

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读