加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

Crawl AJAX dynamic web page using Python 2.x and 3.x

发布时间:2020-12-16 01:33:36 所属栏目:百科 来源:网络整理
导读:The term AJAX is short for Asynchronous Javascript and XML. It uses the Javascript XMLHttpRequest function to create a tunnel between the client's browser and the server to transmit information back and forth without having to refresh the

The term AJAX is short for Asynchronous Javascript and XML. It uses the Javascript XMLHttpRequest function to create a tunnel between the client's browser and the server to transmit information back and forth without having to refresh the page.

To crawl the contents created by AJAX,sometimes it's easy to identify the URL requested by the AJAX directly. Take the IE 11 as an example. First,press F12 and enter the developer tools mode. Select the "Network" tab,click the button to trigger the XMLHttpRequest,notice the URL tab and find out the URL links caused by the AJAX.



However,sometimes we cannot identify the URL caused byXMLHttpRequest directly. In this case,we have to build up the URL Request manually.

1. identify the URL with the POST protocol.



2. double click the above URL and copy the value of "User-Agent"



3. select the Request body tab and copy the values.



4. the python code:

Python 2.x

import urllib2
import urllib
import json

url = 'http://www.huxiu.com/v2_action/article_list'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)'
data = {'huxiu_hash_code' : '63b69ec3342ee8c7e6ec4cab561482c9','page':2,'last_dateline':1466664240}
data = urllib.urlencode(data)

request = urllib2.Request(url=url,data=data)
response = urllib2.urlopen(request)

result = json.loads(response.read())
print result


Python 3.x

import urllib
import json

url = 'http://www.huxiu.com/v2_action/article_list'
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0)'
data = {'huxiu_hash_code' : '63b69ec3342ee8c7e6ec4cab561482c9','last_dateline':1466664240}
data = (urllib.parse.urlencode(data)).encode('utf-8')
response = urllib.request.urlopen(url,data)

#parse json
result = json.loads(response.read().decode('utf-8'))
print (response)
print (result)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读