根据空格或“双引号字符串”将字符串解析为数组
发布时间:2020-12-16 10:31:10 所属栏目:百科 来源:网络整理
导读:我试图取一个用户输入字符串并解析为一个名为char * entire_line [100]的数组;其中每个单词放在数组的不同索引处,但如果字符串的一部分由引号封装,则应将其放在单个索引中. 所以,如果我有 char buffer[1024]={0,};fgets(buffer,1024,stdin); 示例输入:“wor
我试图取一个用户输入字符串并解析为一个名为char * entire_line [100]的数组;其中每个单词放在数组的不同索引处,但如果字符串的一部分由引号封装,则应将其放在单个索引中.
所以,如果我有 char buffer[1024]={0,}; fgets(buffer,1024,stdin); 示例输入:“word filename.txt”这是一个字符串,shoudl占用输出数组中的一个索引“; tokenizer=strtok(buffer," ");//break up by spaces do{ if(strchr(tokenizer,'"')){//check is a word starts with a " is_string=YES; entire_line[i]=tokenizer;// if so,put that word into current index tokenizer=strtok(NULL,"""); //should get rest of string until end " strcat(entire_line[i],tokenizer); //append the two together,ill take care of the missing space once i figure out this issue } entire_line[i]=tokenizer; i++; }while((tokenizer=strtok(NULL," n"))!=NULL); 这显然不起作用,只有在双引号封装字符串位于输入字符串的末尾时才会关闭 解决方法
strtok函数是一种在C中进行标记化的可怕方法,除了一个(公认的常见)情况:简单的空格分隔的单词. (即使这样,由于缺乏重新进入和递归能力,它仍然不是很好,这就是为什么我们为BSD发明了strsep的原因.)
在这种情况下,您最好的选择是构建自己的简单状态机: char *p; int c; enum states { DULL,IN_WORD,IN_STRING } state = DULL; for (p = buffer; *p != ' '; p++) { c = (unsigned char) *p; /* convert to unsigned char for is* functions */ switch (state) { case DULL: /* not in a word,not in a double quoted string */ if (isspace(c)) { /* still not in a word,so ignore this char */ continue; } /* not a space -- if it's a double quote we go to IN_STRING,else to IN_WORD */ if (c == '"') { state = IN_STRING; start_of_word = p + 1; /* word starts at *next* char,not this one */ continue; } state = IN_WORD; start_of_word = p; /* word starts here */ continue; case IN_STRING: /* we're in a double quoted string,so keep going until we hit a close " */ if (c == '"') { /* word goes from start_of_word to p-1 */ ... do something with the word ... state = DULL; /* back to "not in word,not in string" state */ } continue; /* either still IN_STRING or we handled the end above */ case IN_WORD: /* we're in a word,so keep going until we get to a space */ if (isspace(c)) { /* word goes from start_of_word to p-1 */ ... do something with the word ... state = DULL; /* back to "not in word,not in string" state */ } continue; /* either still IN_WORD or we handled the end above */ } } 请注意,这并不能说明单词中双引号的可能性,例如: "some text in quotes" plus four simple words p"lus something strange" 通过上面的状态机,您将看到“引号中的某些文本”变为单个标记(忽略双引号),但p“lus也是单个标记(包括引号),有些是单个令牌,奇怪的“是一个令牌.无论您是想要这个,还是想要如何处理它,都取决于您.对于更复杂但彻底的词法标记化,您可能希望使用像flex这样的代码构建工具. 此外,当for循环退出时,如果状态不是DULL,则需要处理最后的单词(我将其从上面的代码中删除)并决定如果状态为IN_STRING该怎么办(意味着没有close-double-quote ). (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |