将sdmx-xml文件读入R中的数据帧
我想知道是否有人设法将SDMX-
XML文件读入数据帧.我想读的文件是
https://www.ecb.europa.eu/stats/sdmx/icpf/1/data/pension_funds.xml(1mb).
我将文件保存为“pensions_funds.xml”到pwd并尝试使用XML包来读取它: fileName <- system.file("pensions","pensions_funds.xml",package="XML") parsed<-xmlTreeParse("pension_funds.xml",getDTD=F) r<-xmlRoot(parsed) tmp = xmlSApply(r,function(x) xmlSApply(x,xmlValue)) 以上几行基本上遵循http://www.omegahat.org/RSXML/gettingStarted.html中的示例 第一件事就是找到正确的节点然后从那里开始,但是我怀疑我可能是傻瓜的事,因为我对数据格式知之甚少,而且我不确定XML包可以用于SDMX-XML文件.聪明的人似乎试图这样做 EDIT3 url <- "http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml" sdmxHandler <- function() { ## data.frame which stores results data <- data.frame(stringsAsFactors=FALSE) ## counter to store current row i <- 1 ## temp value to store current REF_AREA ## temp value to store current REF_AREA refArea <- NA bsItem <- NA bsCountSector <- NA ## handler subroutine for Obs tag Obs <- function(name,attr) { ## found an Obs tag and now fill data.frame data[i,"refArea"] <<- refArea data[i,"timePeriod"] <<- as.numeric(attr["TIME_PERIOD"]) data[i,"obsValue"] <<- as.numeric(attr["OBS_VALUE"]) data[i,"bsItem"] <<- bsItem data[i,"bsCountSector"] <<- bsCountSector i <<- i + 1 } ## handler subroutine for Series tag Series <- function(name,attr) { refArea <<- attr["REF_AREA"] bsItem <<- as.character(attr["BS_ITEM"]) bsCountSector <<- as.numeric(attr["BS_ITEM"]) } return(list(getData=function() {return(data)},Obs=Obs,Series=Series)) } ## run parser df <- xmlEventParse(file(url),handlers=sdmxHandler())$getData() Specification mandate value for attribute OBS_VALUE attributes construct error Couldn't find end of Start Tag Obs line 15108 Premature end of data in tag Series line 15041 Premature end of data in tag DataSet line 91 Premature end of data in tag CompactData line 2 Error: 1: Specification mandate value for attribute OBS_VALUE 2: attributes construct error 3: Couldn't find end of Start Tag Obs line 15108 4: Premature end of data in tag Series line 15041 5: Premature end of data in tag DataSet line 91 6: Premature end of data in tag CompactData line 2 In addition: There were 50 or more warnings (use warnings() to see the first 50) EDIT2: url <- http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml (10mb文件,原始链接已更正),唯一的修改是添加两行: data[i,"bsItem"] <<- as.character(attr["BS_ITEM"]) data[i,"bsCountSector"] <<- as.numeric(attr["BS_COUNT_SECTOR"]) (这些是识别此较大数据集中的行所需的附加id变量).
另外:有50个或更多警告(使用警告()查看前50个) 数据的基本格式看起来非常相似,所以我认为这可能有用. 10mb文件的基本格式如下: <Series FREQ="M" REF_AREA="AT" ADJUSTMENT="N" BS_REP_SECTOR="A" BS_ITEM="A20" MATURITY_ORIG="A" DATA_TYPE="1" COUNT_AREA="U2" BS_COUNT_SECTOR="0000" CURRENCY_TRANS="Z01" BS_SUFFIX="E" TIME_FORMAT="P1M" COLLECTION="E"> <Obs TIME_PERIOD="1997-09" OBS_VALUE="275.3" OBS_STATUS="A" OBS_CONF="F"/> <Obs TIME_PERIOD="1997-10" OBS_VALUE="275.9" OBS_STATUS="A" OBS_CONF="F"/> <Obs TIME_PERIOD="1997-11" OBS_VALUE="276.6" OBS_STATUS="A" OBS_CONF="F"/> EDIT1: 所需的数据格式: Ref_area time_period obs_value At 2006 118 At 2007 119 … Be 2006 101 … 这是数据的第一位. </Header> DataSet xsi:schemaLocation="https://www.ecb.europa.eu/vocabulary/stats/icpf/1 https://www.ecb.europa.eu/stats/sdmx/icpf/1/structure/2011-08-11/sdmx-compact.xsd" xmlns="https://www.ecb.europa.eu/vocabulary/stats/icpf/1"> <Group DECIMALS="0" TITLE_COMPL="Austria,reporting institutional sector Insurance corporations and pension funds - Closing balance sheet - All financial assets and liabilities - counterpart area World (all entities),counterpart institutional sector Total economy including Rest of the World (all sectors) - Credit (resources/liabilities) - Non-consolidated,Current prices - Euro,Neither seasonally nor working day adjusted - ESA95 TP table Not applicable" UNIT_MULT="9" UNIT="EUR" ESA95TP_SUFFIX="Z" ESA95TP_DENOM="E" ESA95TP_CONS="N" ESA95TP_DC_AL="2" ESA95TP_CPSECTOR="S" ESA95TP_CPAREA="A1" ESA95TP_SECTOR="S125" ESA95TP_ASSET="F" ESA95TP_TRANS="LE" ESA95TP_PRICE="V" ADJUSTMENT="N" REF_AREA="AT"/><Series ESA95TP_SUFFIX="Z" ESA95TP_DENOM="E" ESA95TP_CONS="N" ESA95TP_DC_AL="2" ESA95TP_CPSECTOR="S" ESA95TP_CPAREA="A1" ESA95TP_SECTOR="S125" ESA95TP_ASSET="F" ESA95TP_TRANS="LE" ESA95TP_PRICE="V" ADJUSTMENT="N" REF_AREA="AT" COLLECTION="E" TIME_FORMAT="P1Y" FREQ="A"><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="112" TIME_PERIOD="2008"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="119" TIME_PERIOD="2009"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="125" TIME_PERIOD="2010"/><Obs OBS_CONF="F" OBS_STATUS="E" OBS_VALUE="127" TIME_PERIOD="2011"/></Series><Group D
RSDMX似乎处于早期开发状态.恕我直言,还没有包.但是您可以使用XML包轻松地自己实现它.我建议使用xmlEventParse(有关详细信息,请参阅?xmlEventParse):
编辑:根据outstanding_amounts.xml的更改要求调整示例 library("XML") #url <- "http://www.ecb.europa.eu/stats/sdmx/icpf/1/data/pension_funds.xml" url <- "http://www.ecb.europa.eu/stats/sdmx/bsi/1/data/outstanding_amounts.xml" ## download xml file to avoid download errors disturbing xmlEventParse tmp <- tempfile() download.file(url,tmp) sdmxHandler <- function() { ## data.frame which stores results data <- data.frame(stringsAsFactors=FALSE) ## counter to store current row i <- 1 ## temp value to store current REF_AREA,BS_ITEM and BS_COUNT_SECTOR refArea <- NA bsItem <- NA bsCountSector <- NA ## handler subroutine for Obs tag Obs <- function(name,"bsCountSector"] <<- bsCountSector data[i,"timePeriod"] <<- as.Date(paste(attr["TIME_PERIOD"],"-01",sep=""),format="%Y-%m-%d") data[i,"obsValue"] <<- as.double(attr["OBS_VALUE"]) ## update current row i <<- i + 1 } ## handler subroutine for Series tag Series <- function(name,attr) { refArea <<- attr["REF_AREA"] bsItem <<- attr["BS_ITEM"] bsCountSector <<- as.numeric(attr["BS_COUNT_SECTOR"]) } return(list(getData=function() {return(data)},Series=Series)) } ## run parser df <- xmlEventParse(tmp,handlers=sdmxHandler())$getData() head(df) # refArea bsItem bsCountSector timePeriod obsValue #1 DE A20 2210 12053 39.6 #2 DE A20 2210 12084 46.1 #3 DE A20 2210 12112 50.2 #4 DE A20 2210 12143 52.0 #5 DE A20 2210 12173 52.3 #6 DE A20 2210 12204 47.3 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |