如何使用Python中的Mechanize获取嵌套标签中的HTML属性？

发布时间：2020-12-20 13:25:36 所属栏目：Python 来源：网络整理

导读：所有.我在使用 Python中的Mechanize获取嵌套HTML中的链接时遇到了麻烦.这是我当前的代码(我已经尝试了一切;这只是最新的副本,它无法正常工作)(并原谅我的变量名称(东西,东西))： soup = BeautifulSoup(resultsPage)if not soup.find(attrs={'class' : 'pagin

所有.我在使用 Python中的Mechanize获取嵌套HTML中的链接时遇到了麻烦.这是我当前的代码(我已经尝试了一切;这只是最新的副本,它无法正常工作)(并原谅我的变量名称(东西,东西))：

soup = BeautifulSoup(resultsPage)

if not soup.find(attrs={'class' : 'paging'}):
    print "Only one producted listed!"
else:   
    stuff = soup.find('div',attrs={'class' : 'paging'}).ul.li
    for thing in stuff:
        print thing

这是我正在看的HTML：

<div class="paging">
<ul>
    <li><
    </li>
    <li class='on'>
        1-10
    </li>
    <li  class=''>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl01_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=2">11-20</a>
    </li>
    <li  class=''>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl02_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=3">21-30</a>
    </li>
    <li  class=''>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl03_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=4">31-40</a>
    </li>
    <li  class=''>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl04_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=5">41-50</a>
    </li>
    <li  class=''>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_rptPageNavigators_ctl05_hlPage" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=6">51-60</a>
    </li>
    <li>
        <a id="ctl00_SPWebPartManager1_g_83a79912_01d8_4726_8a95_2953baaad0ec_ctl01_ucProductInfoPageNavigatorGroupTop_lnkNext" href="http://www.kraftrecipes.com/products/pages/productinfosearchresults.aspx?catalogtype=1&amp;brandid=22&amp;searchtext=jell-o&amp;pageno=7">>></a>
    </li>
</ul>

我需要确定是否有< li>标签中包含超链接;如果有,我需要存储它们以便稍后点击.这是代码来自的页面,万一你好奇：http://www.kraftrecipes.com/Products/ProductInfoSearchResults.aspx?CatalogType=1&BrandId=22&SearchText=Jell-O&PageNo=1我正在努力抓住食品网站获取产品信息,我需要能够浏览搜索结果.

我有另一个快速的问题.将标签和搜索链接在一起是不是很糟糕？

ingredients = soup.find(attrs={'class' : "TitleAndDescription"}).div.find(text=re.compile("Ingredients")).next

我只是在学习Python,但这看起来很像kludge-y,我想知道你们的想法.这是我正在抓取的HTML示例：

<table>
    <tr>
        <td>
            <div id="contHeader" class="TitleAndDescription">
                <h1>JELL-O - GELATIN DESSERT - RASPBERRY</h1>
                <div class="textArea">
                    <strong>Ingredients:</strong> SUGAR,GELATIN,ADIPIC ACID (FOR TARTNESS),CONTAINS LESS THAN 2% OF ARTIFICIAL FLAVOR,DISODIUM PHOSPHATE AND SODIUM CITRATE (CONTROL ACIDITY),FUMARIC ACID (FOR TARTNESS),RED 40.<br/>
                    <strong>Size:</strong> 6 OZ<br/><strong>Upc:</strong> 4300020052<br/>
                    <br/>
                    <!--<br/>-->
                    <br/>
                </div>
            </div>
            ...
        </td>
        ...
    </tr>
    ...
</table>

抱歉,文字墙.如果您需要更多信息,请与我们联系.

谢谢.

解决方法

python的“HTMLParser”模块可能是解决问题的方法之一.在 http://docs.python.org/library/htmlparser.html查找更多详细信息

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!