Python爬虫利器之Beautiful Soup实例测试
发布时间:2020-12-17 16:59:09 所属栏目:Python 来源:网络整理
导读:#?-*-?coding:?UTF-8?-*-from?bs4?import?BeautifulSoupimport?rehtml_doc?="""htmlheadtitleThe?Dormouse's?story/title/headbodyp?class="title"bThe?Dormouse's?story/b/pp?class="story"Once?upon?a?time?there?were?three?little?sisters;?and?their?na
#?-*-?coding:?UTF-8?-*- from?bs4?import?BeautifulSoup import?re html_doc?=""" <html><head><title>The?Dormouse's?story</title></head> <body> <p?class="title"><b>The?Dormouse's?story</b></p> <p?class="story">Once?upon?a?time?there?were?three?little?sisters;?and?their?names?were <a?href="http://example.com/elsie"?class="sister"?id="link1">Elsie</a>,<a?href="http://example.com/lacie"?class="sister"?id="link2">Lacie</a>?and <a?href="http://example.com/tillie"?class="sister"?id="link3">Tillie</a>; and?they?lived?at?the?bottom?of?a?well.</p> <p?class="story">...</p> """ soup?=?BeautifulSoup(html_doc,'html.parser',from_encoding='utf8') print?"获取所有链接" links?=?soup.find_all('a') for?link?in?links: ????#link.name?节点的名字 ????#link['href']?节点的href属性 ????#link.get_text()?节点的文本 ????print?link.name,link['href'],link.get_text() print?"只获取含有lacie链接" link_node?=?soup.find('a',href='http://example.com/lacie') print?link_node.name,link_node['href'],link_node.get_text() print?"正则匹配含有tillie链接" link_node1?=?soup.find('a',href=re.compile(r'tillie')) print?link_node1.name,link_node1['href'],link_node1.get_text() print?"获取p段落文字" p_node?=?soup.find('p',class_="title") print?p_node.name,p_node.get_text() 输出结果: 获取所有链接 a?http://example.com/elsie?Elsie a?http://example.com/lacie?Lacie a?http://example.com/tillie?Tillie 只获取含有lacie链接 a?http://example.com/lacie?Lacie 正则匹配含有tillie链接 a?http://example.com/tillie?Tillie 获取p段落文字 p?The?Dormouse's?story (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |