tinyxml解析UTF-8字符集的xml
今天在程序中遇到,当通讯的xml里面含有中文字符的时候,tinyxml解析时总是报错,不能进行解析,查找原因后发现是tinyxml在解析UTF-8字符集的xml时,需要特殊指定字符集才行,下面是对于读取文件和直接解析字符串所需的tinyxml函数的使用方式。 一、需解析的xml <?xml version="1.0" encoding="utf-8"?> <Parament> <SchedTempl> <CommandType>2</CommandType> <SchedTemplID>1</SchedTemplID> <SchedType>0</SchedType> <SchedName>全天侯模板</SchedName> <MondaySched>222222222222222222222222222222222222222222222110</MondaySched> <TuesdaySched>222222222222222222222222222222222222222222222110</TuesdaySched> <WednesdaySched>222222222222222222222222222222222222222222222110</WednesdaySched> <ThursdaySched>222222222222222222222222222222222222222222222110</ThursdaySched> <FridaySched>222222222222222222222222222222222222222222222110</FridaySched> <SaturdaySched>222222222222222222222222222222222222222222222110</SaturdaySched> <SundaySched>222222222222222222222222222222222222222222222110</SundaySched> </SchedTempl> <SchedTempl> <CommandType>2</CommandType> <SchedTemplID>2</SchedTemplID> <SchedType>0</SchedType> <SchedName>工作日模板</SchedName> <MondaySched>000000000000000000222222222222222110000000000000</MondaySched> <TuesdaySched> 000000000000000000222222222222222110000000000000</TuesdaySched> <WednesdaySched> 000000000000000000222222222222222110000000000000</WednesdaySched> <ThursdaySched> 000000000000000000222222222222222110000000000000</ThursdaySched> <FridaySched> 000000000000000000222222222222222110000000000000</FridaySched> <SaturdaySched> 000000000000000000000000000000000000000000000000</SaturdaySched> <SundaySched> 000000000000000000000000000000000000000000000000</SundaySched> </SchedTempl> </Parament> 二、读取xml文件,tinyxml处理方式 TiXmlDocument *xmlfile= new TiXmlDocument(FilePath); xmlfile->LoadFile(TIXML_ENCODING_UTF8); 三、读取xml字符串,tinyxml处理方式 TiXmlDocument myDocument; myDocument.Parse(xmlParament,TIXML_ENCODING_UTF8); //或者myDocument.Parse(xmlParament,TIXML_ENCODING_LEGACY); if( !myDocument.Error() ){ TiXmlElement* paramentEle = myDocument.FirstChildElement(PARAMENT_PARENT_NODE_NAME); if (paramentEle==NULL) { return FALSE; } TiXmlElement* schedTemplEle = paramentEle->FirstChildElement("SchedTempl"); while(schedTemplEle) { RecordScheduleTemplateInfo templInfo; TIXML_NODE_VALUE_FROM_PARENT(schedTemplEle,"SchedTemplID",&templInfo.schedTemplId); TIXML_NODE_VALUE_FROM_PARENT(schedTemplEle,"SchedType",&templInfo.schedType); templInfo.schedName = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"SchedName",""); templInfo.mondaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"MondaySched",""); templInfo.tuesdaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"TuesdaySched",""); templInfo.wednesdaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"WednesdaySched",""); templInfo.thursdaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"ThursdaySched",""); templInfo.fridaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"FridaySched",""); templInfo.saturdaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"SaturdaySched",""); templInfo.sundaySched = TIXML_NODE_TEXT_FROM_PARENT(schedTemplEle,"SundaySched",""); this->schedTemplInfoList.push_back(templInfo); schedTemplEle = schedTemplEle->NextSiblingElement(); } } 说明(引用官方的翻译) 一般地,TinyXML 试着检测正确的编码方式并使用它。但是,可以通过在头文件中设置 TIXML_DEFAULT_ENCODING 的值来强制使用某一种编码方式。
对于 high-ascii 语言,TinyXML 可以处理所有的语言,同时,只要 XML 被编码成 UTF-8。这样有些滑稽,老的程序员和操作系统趋向于使用 default 和 traditional 的代码页。许多应用可以输出 UTF-8,但是老或者顽固的应用是以默认的代码页输出文本的。
http://skew.org/xml/tutorial 对转换编码做了很好的介绍 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |