PHP解析xml文件错误

发布时间：2020-12-13 22:46:12 所属栏目：PHP教程来源：网络整理

导读：我正在尝试使用simple XML从 http://rates.fxcm.com/RatesXML获取数据使用simplexml_load_file()我有时会遇到错误,因为这个网站在xml文件之前和之后总是有奇怪的字符串/数字. 例： 2000?xml version="1.0" encoding="UTF-8"?Rates Rate Symbol="EURUSD" Bid

我正在尝试使用simple XML从 http://rates.fxcm.com/RatesXML获取数据
使用simplexml_load_file()我有时会遇到错误,因为这个网站在xml文件之前和之后总是有奇怪的字符串/数字.
例：

2000<?xml version="1.0" encoding="UTF-8"?>
<Rates>
    <Rate Symbol="EURUSD">
    <Bid>1.27595</Bid>
    <Ask>1.2762</Ask>
    <High>1.27748</High>
    <Low>1.27385</Low>
    <Direction>-1</Direction>
    <Last>23:29:11</Last>
</Rate>
</Rates>
0

然后我决定使用file_get_contents并将其解析为带有simplexml_load_string()的字符串,之后我使用substr()来删除前后的字符串.但是,有时随机字符串将出现在节点之间,如下所示：

<Rate Symbol="EURTRY">
    <Bid>2.29443</Bid>
    <Ask>2.29562</Ask>
    <High>2.29841</High>
    <Low>2.28999</Low>

137b

 <Direction>1</Direction>
    <Last>23:29:11</Last>
</Rate>

我的问题是,无论如何我可以使用任何正则表达式函数处理所有这些随机字符串,无论它们放在何处？ (认为??这将是一个更好的主意,而不是联系该网站,让他们广播正确的xml文件)

解决方法

我相信 preprocessing XML with regular expressions might be just as bad as parsing it.

但是这里有一个preg替换,它从字符串的开头,字符串的结尾以及关闭/自闭标签之后删除所有非空白字符：

$string = preg_replace( '~
    (?|           # start of alternation where capturing group count starts from
                  # 1 for each alternative
      ^[^<]*      # match non-< characters at the beginning of the string
    |             # OR
      [^>]*$     # match non-> characters at the end of the string
    |             # OR
      (           # start of capturing group $1: closing tag
        </[^>]++> # match a closing tag; note the possessive quantifier (++); it
                  # suppresses backtracking,which is a convenient optimization,# the following bit is mutually exclusive anyway (this will be
                  # used throughout the regex)
        s++      # and the following whitespace
      )           # end of $1
      [^<s]*+    # match non-<,non-whitespace characters (the "bad" ones)
      (?:         # start subgroup to repeat for more whitespace/non-whitespace
                  # sequences
        s++      # match whitespace
        [^<s]++  # match at least one "bad" character
      )*          # repeat
                  # note that this will kind of pattern keeps all whitespace
                  # before the first and the last "bad" character
    |             # OR
      (           # start of capturing group $1: self-closing tag
        <[^>/]+/> # match a self-closing tag
        s++      # and the following whitespace
      )
      [^<]*+(?:s++[^<s]++)*
                  # same as before
    )             # end of alternation
    ~x','$1',$input);

然后我们简单地回写关闭或自动关闭标签(如果有的话).

这种方法不安全的原因之一是在注释或属性字符串中可能会出现关闭或自动关闭标记.但我很难建议您使用XML解析器,因为您的XML解析器也无法解析XML.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!