加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

在python中浏览网站,抓取并发布

发布时间:2020-12-16 21:28:28 所属栏目:Python 来源:网络整理
导读:stackoverflow上已有很多好的资源,但我仍然遇到问题.我访问过这些来源: how to submit query to .aspx page in python Submitting a post request to an aspx page Scrapping aspx webpage with Python using BeautifulSoup http://www.pythonforbeginners.
stackoverflow上已有很多好的资源,但我仍然遇到问题.我访问过这些来源:

> how to submit query to .aspx page in python
> Submitting a post request to an aspx page
> Scrapping aspx webpage with Python using BeautifulSoup
> http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet

我正试图访问http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx并选择一个教区.我相信这会强制发布一个帖子,并允许我选择一年,再次发布,并允许更多选择.我按照上述来源以不同的方式编写了我的脚本,并且未能成功提交网站以允许我输入一年.

我目前的代码

import urllib
from bs4 import BeautifulSoup
import mechanize

headers = [
    ('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Origin','http://www.indiapost.gov.in'),('User-Agent','Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML,like Gecko)  Chrome/24.0.1312.57 Safari/537.17'),('Content-Type','application/x-www-form-urlencoded'),('Referer','http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'),('Accept-Encoding','gzip,deflate,sdch'),('Accept-Language','en-US,en;q=0.8'),]

br = mechanize.Browser()
br.addheaders = headers

url = 'http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx'

response = br.open(url)
# first HTTP request without form data
soup = BeautifulSoup(response)
# parse and retrieve two vital form values
viewstate = soup.findAll("input",{"type": "hidden","name": "__VIEWSTATE"})
eventvalidation = soup.findAll("input","name": "__EVENTVALIDATION"})

formData = (
    ('__EVENTVALIDATION',eventvalidation[0]['value']),('__VIEWSTATE',viewstate[0]['value']),('__VIEWSTATEENCRYPTED',''),)



try:
    fout = open('C:GIStmp.htm','w')
except:
    print('Could not open output filen')

fout.writelines(response.readlines())
fout.close()

我也在shell中试过这个,我输入的内容加上我收到的内容(经过修改以减少批量)可以找到http://pastebin.com/KAW5VtXp

无论如何,我尝试更改Parish下拉列表中的值并发布我将被带到网站管理员登录页面.

我接近这个正确的方法吗?任何想法都会非常有帮助.

谢谢!

解决方法

我最终使用了硒.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.latax.state.la.us/Menu_ParishTaxRolls/TaxRolls.aspx")
elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddParish")
elem.send_keys("TERREBONNE PARISH")
elem.send_keys(Keys.RETURN)

elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$ddYear")
elem.send_keys("2013")
elem.send_keys(Keys.RETURN)

elem = driver.find_element_by_id("ctl00_ContentPlaceHolderMain_rbSearchField_1")
elem.click()

APN = 'APN # here'
elem = driver.find_element_by_name("ctl00$ContentPlaceHolderMain$txtSearch")
elem.send_keys(APN)
elem.send_keys(Keys.RETURN)

# Access the PDF
elem = driver.find_element_by_link_text('Generate Report')
elem.click()
elements = driver.find_elements_by_tag_name('a')
elements[1].click()

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读