庞果英雄会——xml字符串文件的解析
题目本题来自蓝港在线技术团队的idea,详情如下: XML-可扩展标记语言 ,用于标记电子文件使其具有结构性的标记语言,可以用来标记数据、定义数据类型,是一种允许用户对自己的标记语言进行定义的源语言,被广泛的运用于数据传输和存储。请编写一段程序,不使用语言之外的开源库,解析对应的XML文件,并格式化后在屏幕上打印出来。 举个例子如下,当给定下述XML文件时: <?xml version="1.0" ?> <Books> <Book> <Name = “The C++ Programming Language” Author=”Bjarne Stroustrup” /> </Book> <Book> <Name = “Effective C++” Author = “Scott Meyers” /> </Book> </Books> 它对应的输出应该是: Books Book 1 Name:The C++ Programming Language Author:Bjarne Stroustrup Book 2 Name:Effective C++ Author:Scott Meyers 输入:简化的一段xml文件,用字符串表示,如下(属性名字不包含引号和等号,也不包含大于小于等特殊字符,详细规则见下面的答题说明) string in = "<?xml version="1.0" ?><Books><Book><Name = "The C++ Programming Language" Author="Bjarne Stroustrup" /></Book><Book><Name = "Effective C++" Author = "Scott Meyers" /></Book></Books>"; 输出:对输入的xml字符串解析,得到输出如下: string out = "BooksrntBook 1rnttName:The C++ Programming LanguagernttAuthor:Bjarne StroustruprntBook 2rnttName:Effective C++rnttAuthor:Scott Meyers"; 函数原型: C++ ParsingXML(string in); Java ParsingXML(String in); C# ParsingXML(string input) 代码
#include <iostream> #include <stdio.h> #include <string.h> #include <string> #include <vector> #include <stack> using namespace std; enum FSM{FSM_NULL,FSM_START,FSM_LevelOne,FSM_LevelTwo}; void dealAbstract(string &out,string &in,const string &letter) { int flag = 0; string::iterator iter = in.begin(); while(iter < in.end()) { if(*iter == ' ') { ++iter; continue; } switch(flag) { case 0: out += letter; while(iter < in.end() && *iter != '=') { if(*iter != ' ') { out.push_back(*iter); } ++iter; } out.push_back(':'); flag = 1; break; case 1: while(*iter++ != '"'); while(*iter != '"') { out.push_back(*iter); ++iter; } ++iter; flag = 0; break; } } } string ParsingXML(string in) { string out; string tmp; stack<string> mark; stack<enum FSM> markLevel; char index; string letter; FSM fsm_flag = FSM_NULL; markLevel.push(fsm_flag); for(string::iterator iter = in.begin(); iter < in.end(); ++iter) { tmp.clear(); while(*iter++ != '<');//取得标签数据 for(; *iter != '>'; ++iter) { tmp.push_back(*iter); } fsm_flag = markLevel.top();//得到当前标签级别 if(tmp[0] == '/')//结束标签标志 { markLevel.pop(); } else if(tmp[tmp.length() - 1] == '/' || tmp[tmp.length() - 1] == '"')//标签属性 { if(tmp[tmp.length() - 1] != '"') { tmp = tmp.substr(0,tmp.length() - 1); } dealAbstract(out,tmp,letter); } else//开始标签 { switch(fsm_flag)//标签转换状态机 { case FSM_NULL: if(tmp[0] == '?' && tmp[tmp.length() - 1] == '?') { fsm_flag = FSM_START; markLevel.push(fsm_flag); } break; case FSM_START://一级标签 fsm_flag = FSM_LevelOne; markLevel.push(fsm_flag); out += "rn" + tmp; letter = "rnt"; index = '0'; break; case FSM_LevelOne://二级标签 fsm_flag = FSM_LevelTwo; letter = "rnt"; markLevel.push(fsm_flag); ++index; out += letter + tmp + ' ' + index; letter = "rntt"; break; } } } out = out.substr(2,out.length());//去除开始处的两个回车换行 return out; } int main() { // string in = "<?xml version="1.0" ?><Books><Class = "art"/><Book><Name = "The C++ Programming Language" Author="Bjarne Stroustrup" /></Book><Book><Name = "Effective C++" Author = "Scott Meyers" /></Book></Books><VideoS><video><Name = "1123213"/></video><video><Name = "23456"/></video></VideoS>"; string in = "<?xml version="1.0" ?><Books><Book><Name = "The C++ Programming Language" Author="Bjarne Stroustrup"></Book><Book><Name = "Effective C++" Author = "Scott Meyers"></Book></Books>"; string out; out = ParsingXML(in); cout << out << endl; return 0; } 结果
1.<?xml version="1.0" ?><Books><Class = "art"/><Book><Name = "The C++ Programming Language" Author="Bjarne Stroustrup" /></Book><Book><Name = "Effective C++" Author = "Scott Meyers" /></Book></Books><VideoS><video><Name = "1123213"/></video><video><Name = "23456"/></video></VideoS>
PS:比较坑的地方是在这个测试用例中
<?xml version="1.0" ?><Books><Book><Name = "The C++ Programming Language" Author="Bjarne Stroustrup"></Book><Book><Name = "Effective C++" Author = "Scott Meyers"></Book></Books>
测试用例没有完全遵守题目中的要求,所以没通过,这实在是太坑了,又没说要容错的,当然不去考虑各种出错的情况了!!
上面的代码是改后,完美通过。
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |