python – lxml：获取所有叶节点？

发布时间：2020-12-20 11:42:11 所属栏目：Python 来源：网络整理

导读：给一个 XML文件,有没有办法使用lxml来获取所有叶子节点的名称和属性？这是感兴趣的XML文件： ?xml version="1.0" encoding="UTF-8"?clinical_study !-- This xml conforms to an XML Schema at: http://clinicaltrials.gov/ct2/html/images/info/public.xsd

给一个 XML文件,有没有办法使用lxml来获取所有叶子节点的名称和属性？

这是感兴趣的XML文件：

<?xml version="1.0" encoding="UTF-8"?>
<clinical_study>
  <!-- This xml conforms to an XML Schema at:
    http://clinicaltrials.gov/ct2/html/images/info/public.xsd
 and an XML DTD at:
    http://clinicaltrials.gov/ct2/html/images/info/public.dtd -->
  <id_info>
    <org_study_id>3370-2(-4)</org_study_id>
    <nct_id>NCT00753818</nct_id>
    <nct_alias>NCT00222157</nct_alias>
  </id_info>
  <brief_title>Developmental Effects of Infant Formula Supplemented With LCPUFA</brief_title>
  <sponsors>
    <lead_sponsor>
      <agency>Mead Johnson Nutrition</agency>
      <agency_class>Industry</agency_class>
    </lead_sponsor>
  </sponsors>
  <source>Mead Johnson Nutrition</source>
  <oversight_info>
    <authority>United States: Institutional Review Board</authority>
  </oversight_info>
  <brief_summary>
    <textblock>
      The purpose of this study is to compare the effects on visual development,growth,cognitive
      development,tolerance,and blood chemistry parameters in term infants fed one of four study
      formulas containing various levels of DHA and ARA.
    </textblock>
  </brief_summary>
  <overall_status>Completed</overall_status>
  <phase>N/A</phase>
  <study_type>Interventional</study_type>
  <study_design>N/A</study_design>
  <primary_outcome>
    <measure>visual development</measure>
  </primary_outcome>
  <secondary_outcome>
    <measure>Cognitive development</measure>
  </secondary_outcome>
  <number_of_arms>4</number_of_arms>
  <condition>Cognitive Development</condition>
  <condition>Growth</condition>
  <arm_group>
    <arm_group_label>1</arm_group_label>
    <arm_group_type>Experimental</arm_group_type>
  </arm_group>
  <arm_group>
    <arm_group_label>2</arm_group_label>
    <arm_group_type>Experimental</arm_group_type>
  </arm_group>
  <arm_group>
    <arm_group_label>3</arm_group_label>
    <arm_group_type>Experimental</arm_group_type>
  </arm_group>
  <arm_group>
    <arm_group_label>4</arm_group_label>
    <arm_group_type>Other</arm_group_type>
    <description>Control</description>
  </arm_group>
  <intervention>
    <intervention_type>Other</intervention_type>
    <intervention_name>DHA and ARA</intervention_name>
    <description>various levels of DHA and ARA</description>
    <arm_group_label>1</arm_group_label>
    <arm_group_label>2</arm_group_label>
    <arm_group_label>3</arm_group_label>
  </intervention>
  <intervention>
    <intervention_type>Other</intervention_type>
    <intervention_name>Control</intervention_name>
    <arm_group_label>4</arm_group_label>
  </intervention>
</clinical_study>

我想要的是一个看起来像这样的字典：

{
   'id_info_org_study_id': '3370-2(-4)','id_info_nct_id': 'NCT00753818','id_info_nct_alias': 'NCT00222157','brief_title': 'Developmental Effects...'
}

这可能与lxml – 或任何其他Python库有关吗？

更新：

我最终这样做了：

response = requests.get(url)
tree = lxml.etree.fromstring(response.content)
mydict = self._recurse_over_nodes(tree,None,{})

def _recurse_over_nodes(self,tree,parent_key,data):
    for branch in tree:
        key = branch.tag
        if branch.getchildren():
            if parent_key:
                key = '%s_%s' % (parent_key,key)
            data = self._recurse_over_nodes(branch,key,data)
        else:
            if parent_key:
                key = '%s_%s' % (parent_key,key)
            if key in data:
                data[key] = data[key] + ',%s' % branch.text
            else:
                data[key] = branch.text
    return data

解决方法

假设你已经完成了getroot(),像下面这样简单的东西可以用你期望的东西构建一个字典：

import lxml.etree

tree = lxml.etree.parse('sample_ctgov.xml')
root = tree.getroot()

d = {}
for node in root:
    key = node.tag
    if node.getchildren():
        for child in node:
            key += '_' + child.tag
            d.update({key: child.text})
    else:
        d.update({key: node.text})

应该做的技巧,没有优化,也不是递归地搜索所有子节点,但你知道从哪里开始.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!