python3 爬虫--网页图片爬取

本文转载自 sinat_34022298 查看原文 2017/07/14 1 python3/ 爬虫/ python/ 数据/ 图片/ 网页

数据越来越爆炸的今天，数据的获取显得越为重要，Python爬虫正是简洁高效的数据获取工具。
python2中使用urllib和urllib2作为网页爬取的工具，python3之后，将两个函数库做了合并优化成为urllib库。Python3之后只需导入urllib即可。

python爬虫主要使用的库为urllib，查看文档可知，urllib库主要包含四个模块：

urllib.request : for opening and reading URLs
urllib.error : containing the exceptions raised by urllib.request
urllib.parse : for parsing URLs
urllib.robotparser : for parsing robots.txt files

—–Python3 urllib文档传送门—-

图片下载

首先读取网页图片内容，查看文档可知，打开网页的urlopen可接受两类参数：string类型的ip地址、Request对象，所以网页打开方式也有两种。

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

Open the URL url, which can be either a string or a Request object.

下面看一例简单的但图片下载程序（已知图片ip地址）：

方法一：urlopen传入String类参数，即图片地址

import urllib.request

url = 'http://placekitten.com/g/500/600'
response = urllib.request.urlopen(url)
img = response.read()

# 在程序所在文件夹下，将图片以二进制形式写入名字为name的文件中
with open('name','wb') as f:
    f.write(img)

方法二：urlopen传入Request对象

req = urllib.request.Request(url)
response = urllib.request.urlopen(req)
img = response.read()

先生成Request对象req，然后使用urlopen()读取req。
这里生成Request对象的好处是可以通过Request对象添加data、header信息（访问post网页、伪装浏览器）。

注意！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系我们删除。

猜您在找

Python3爬虫----爬取网页内的图片 python3 爬虫爬取网页图片详解 python3爬虫爬取网页图片简单示例 python3 爬虫（爬取网页、图片基础） python3下爬取网页上的图片的爬虫程序