php – 如何从DBLP中提取最新文章
我需要从DBLP中提取最新的文章
所有元素的描述和所有字段都可以在以下位置找到: http://dblp.uni-trier.de/xml/dblp.dtd 帮助文件位于: http://dblp.uni-trier.de/xml/docu/dblpxml.pdf 那么,你有一个API,你按年份做GET请求,你得到一个JSON文档; 我想从今天的文章中获得一个JSON文档; 但我不知道如何使用mdate属性发出GET请求; 这是一篇文章的结构: <article key="journals/cacm/Szalay08" mdate="2008-11-03"> <author>Alexander S. Szalay</author> <title>Jim Gray,astronomer.</title> <pages>58-65</pages> <year>2008</year> <volume>51</volume> <journal>Commun. ACM</journal> <number>11</number> <ee>http://doi.acm.org/10.1145/ 1400214.1400231</ee> <url>db/journals/cacm/ cacm51.html#Szalay08</url> </article> 我试过这个http://dblp.uni-trier.de/rec/bibtex/journals/acta/BayerM72并得到: <?xml version="1.0"?> <dblp> <article key="journals/acta/BayerM72" mdate="2003-11-25"> <author>Rudolf Bayer</author> <author>Edward M. McCreight</author> <title>Organization and Maintenance of Large Ordered Indices</title> ... </article> </dblp> 我需要使用字段mdate提取所有最新文章. 这是一篇关于各种要求的文章:http://dblp.uni-trier.de/xml/docu/dblpxmlreq.pdf php代码: <pre> <?php $url = 'http://dblp.uni-trier.de/rec/bibtex/'; $key = 'journals/acta/BayerM72'; $content = file_get_contents($url . $key); echo $content; ?> </pre> 解决方法
对于解析xml,有
XML Parser,XMLReader和
SimpleXML.XML Parser和XMLReader用于大文件,SimpleXML – 用于小文件(< 1Mb).
function startElement($parser,$tag,$attrs) { global $articles,$isArticle,$i,$globTag; $globTag = $tag; if ($tag == 'article') { $isArticle = true; if (isset ( $attrs ['mdate'] )) { // add date from attribute in article $articles [$i] ['mdate'] = $attrs ['mdate']; } } } function endElement($parser,$tag) { global $articles,$globTag; if ($tag == 'article') { $isArticle = false; ++ $i; } } function getElement($parser,$data) { global $articles,$globTag; if ($isArticle) { $articles [$i] = $articles [$i] + [ $globTag => $data ]; } } global $articles,$globTag; $articles = [ ]; $i = 0; $isArticle = false; $url = 'http://dblp.uni-trier.de/rec/bibtex/'; $key = 'journals/acta/BayerM72'; $url .= $key; $parser = xml_parser_create (); xml_set_element_handler ( $parser,"startElement","endElement" ); xml_set_character_data_handler ( $parser,'getElement' ); xml_parser_set_option ( $parser,XML_OPTION_CASE_FOLDING,false ); $file = fopen ( $url,'rb' ); if ($file === false) { die ( "File isnt!!" ); } $clasterSize = 8192; while ( $data = fread ( $file,$clasterSize ) ) { if (! xml_parse ( $parser,$data,feof ( $file ) )) { die ( sprintf ( "XML error: %s at line %d",xml_error_string ( xml_get_error_code ( $parser ) ),xml_get_current_line_number ( $parser ) ) ); } } xml_parser_free ( $parser ); fclose ( $file ); 这是XML Parser中的示例. <?php $url = 'http://dblp.uni-trier.de/rec/bibtex/'; $key = 'journals/acta/BayerM72'; $content = file_get_contents($url . $key); $xml = new SimpleXMLElement($content); /* Search for <dblp><article> */ $result = $xml->xpath('/dblp/article'); // $result is an array of SimpleXMLElement objects var_dump($result); ?> 有SimpleXML示例.您在结果中获得了一个SimpleXMLElement对象数组.查看manual以获取SimpleXMLElement属性SimpleXMLElement-> attributes();. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |