掌握 Python HTTP 请求：高级指南 2025

在本全面指南中，你将学习到：

什么是requests，如何安装它，以及为什么它是最受欢迎的Python HTTP客户端库。
如何使用不同的HTTP方法。
它提供的处理服务器响应的方法。
它支持的请求自定义。
Pythonrequests库覆盖的高级场景。

让我们开始吧！

Requests库简介

了解什么是Requests，如何安装它，何时使用它，以及它提供的功能。

定义

Requests是一个优雅且简单的Python HTTP库。它提供了一个直观的API，用于以简洁且人性化的方式发出HTTP请求和处理响应。拥有超过50k GitHub星标和每天数百万的下载量，Requests是Python中最受欢迎的HTTP客户端。

该库提供的一些关键功能包括一个覆盖所有HTTP方法的全面API、响应处理、请求自定义、身份验证、SSL证书管理等。此外，Python Requests模块开箱即用支持HTTP/1.1。

安装

安装Requests最简单且推荐的方式是通过pip。具体来说，与Requests库相关的pip包是requests。所以，你可以使用以下命令安装HTTP客户端：

pip install requests

要在你的Python脚本中使用requests，可以使用下面的代码导入：

import requests

太棒了！现在Requests包已经安装并可以使用了。

用例

Pythonrequests库的主要用例包括：

向Web服务器发送HTTP请求：通过发送GET请求从Web服务器检索数据。
消费API：向API端点发送请求并处理它们的响应，与各种Web服务进行交互并访问它们的数据。
网络爬虫：获取与网页关联的HTML文档，然后可以使用像BeautifulSoup这样的库进行解析以提取特定信息。了解更多信息，请参阅我们的Python网络爬虫指南。
测试Web应用程序：模拟HTTP请求并验证响应，自动化测试过程并确保Web服务的正常运行。
下载文件：通过发送HTTPGET请求到相应的URL，从Web服务器检索文件，如图像、文档或其他媒体文件。

方法

查看requests库公开的方法：

方法	描述
`requests.request()`	使用指定的方法发送自定义HTTP请求到给定的URL
`requests.get()`	向指定URL发送`GET`请求
`requests.post()`	向指定URL发送`POST`请求
`requests.put()`	向指定URL发送`PUT`请求
`requests.patch()`	向指定URL发送`PATCH`请求
`requests.delete()`	向指定URL发送`DELETE`请求
`requests.head()`	向指定URL发送`HEAD`请求

如你所见，这些方法涵盖了最有用的HTTP请求方法。有关如何使用它们的更多信息，请参阅官方API文档。

现在是时候看看它们的实际应用了！

HTTP方法

了解requests Python库在处理HTTP方法如GET、POST、PUT、DELETE和HEAD时的实际应用。

GET

在HTTP中，GET方法用于请求服务器上的特定资源。以下是如何使用requests.get()发出HTTPGET请求：

import requests

# send a GET request to the specified URL

response = requests.get('https://api.example.com/data')

同样，你可以使用requests.request()实现相同的结果，如下所示：

import requests

response = requests.request('GET', 'https://api.example.com/data')

在这种情况下，你需要手动指定要使用的HTTP方法。

POST

HTTPPOST方法用于向服务器提交数据以进行进一步处理。以下是如何使用requests.post()发出POST请求：

import requests

# data to be sent in the POST request

product = {

'name': 'Limitor 500',

'description': 'The Limitor 500 is a high-performance electronic device designed to regulate power consumption in industrial settings. It offers advanced features such as real-time monitoring, adjustable settings, and remote access for efficient energy management.',

'price': 199.99,

'manufacturer': 'TechCorp Inc.',

'category': 'Electronics',

'availability': 'In Stock'

}

# send a POST request to the specified URL

response = requests.post('https://api.example.com/product', data=product)

与GET请求相比，这次你还需要通过data选项指定要发送到服务器的数据。requests会将这些数据添加到HTTP请求的主体中。

对于JSON主体，将数据对象传递给json选项而不是data：

response = requests.post('https://api.example.com/product', json=product)

同样，你可以使用requests.request()执行相同的请求，如下所示：

import requests

product = {

'name': 'Limitor 500',

'description': 'The Limitor 500 is a high-performance electronic device designed to regulate power consumption in industrial settings. It offers advanced features such as real-time monitoring, adjustable settings, and remote access for efficient energy management.',

'price': 199.99,

'manufacturer': 'TechCorp Inc.',

'category': 'Electronics',

'availability': 'In Stock'

}

response = requests.request('POST', 'https://api.example.com/product', data=product)

PUT

PUT方法用于更新或替换服务器上的资源。使用Pythonrequests模块发送PUT请求非常简单，与POST请求的模式相似。不同之处在于要使用的方法是requests.put()。同样，在requests.request()中，HTTP方法字符串将是'PUT'。

PATCH

PATCH方法用于对在线资源进行部分修改。与PUT请求类似，在Pythonrequests库中发送PATCH请求类似于POST请求。不同之处在于要使用的方法是requests.patch()，而在requests.request()中的HTTP方法字符串是'PATCH'。

DELETE

DELETE方法用于删除由给定URI标识的资源。以下是如何在requests中使用delete()方法发出HTTPDELETE请求：

import requests

# send a DELETE request for the product with id = 75

response = requests.delete('https://api.example.com/products/75')

同样，你可以使用requests.request()执行DELETE请求：

import requests

response = requests.request('DELETE', 'https://api.example.com/products/75')

HEAD

HEAD方法类似于GET，但它只请求响应的头部，而不包含实际的主体内容。因此，服务器返回的HEAD请求响应将与GET请求的响应相同，但没有主体数据。

使用requests.head()在Python中发出HTTPHEAD请求：

import requests

# send a HEAD request to the specified URL

response = requests.head('https://api.example.com/resource')

同样，你可以使用requests.request()执行HEAD请求：

import requests

response = requests.request('HEAD', 'https://api.example.com/resource')

解析Requests响应对象

现在你已经知道如何使用requests发出HTTP请求，是时候看看如何处理响应对象了。

响应对象

在发出HTTP请求后，requests将接收服务器的响应并将其映射到一个特殊的Response对象。

请看下面的Pythonrequests示例：

import requests

response = requests.get('http://lumtest.com/myip.json')

print(response)

这将返回：

<Response [200]>

response是一个Response对象，暴露了一些有用的方法和属性。接下来探索最重要的几个！

警告：requests并不总是返回响应。在出现错误（例如无效的URL）时，它会引发RequestException。可以使用下面的逻辑保护自己不受此异常影响：

try:

response = requests.get('http://lumtest.com/myip.json')

# handle the response

except requests.exceptions.RequestException as e:

print('An error occurred during the request:', e)

状态码

在HTTP中，响应状态码是服务器返回的标准化值，用于指示请求的成功、失败或任何其他状态。这些状态码非常重要，因为它们提供了请求是否成功的即时反馈，如果失败，说明出了什么问题。

它们在错误处理中特别有用，允许客户端识别和适当处理不同类型的错误。例如，4xx状态码表示客户端错误（例如无效请求），而5xx状态码表示服务器错误。

在使用requests库处理响应时，控制状态码通常是处理响应的第一步。在发出请求后，你应该始终检查响应的状态码，以确定请求是否成功。通过响应对象的status_code属性访问状态码：

response.status_code # 200

根据收到的状态码，使用条件语句适当处理不同的场景：

import requests

response = requests.get('http://lumtest.com/myip.json')

# check if the request was successful (status code 200)

if response.status_code == 200:

print('Successful request!')

# handle the response...

elif response.status_code == 404:

print('Resource not found!')

else:

print(f'Request failed with status code: {response.status_code}')

在大多数情况下，你只需要区分请求成功与错误响应。requests通过自定义__bool()__重载简化了这个过程。具体来说，你可以在条件表达式中直接使用Response对象。状态码在200到399之间时，它将评估为True，否则为False。

换句话说，可以使用以下逻辑检查请求的成功结果：

if response:

print('Successful request!')

# handle the response...

else:

print(f'Request failed with status code: {response.status_code}')

响应头

通过headers属性访问服务器响应的头部：

import requests

response = requests.get('http://lumtest.com/myip.json')

response_headers = response.headers

print(response_headers)

这将打印：

{'Server': 'nginx', 'Date': 'Thu, 09 May 2024 12:51:08 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '279', 'Connection': 'keep-alive', 'Cache-Control': 'no-store', 'Access-Control-Allow-Origin': '*'}

如你所见，response.headers返回一个类似字典的对象。这意味着你可以通过键访问头部值。例如，假设你想访问响应的Content-Type头部，可以如下操作：

response_headers['Content-Type'] # 'application/json; charset=utf-8'

由于HTTP规范定义头部为不区分大小写的，requests允许你在访问它们时不用担心它们的大小写：

response_headers['content-type'] # 'application/json; charset=utf-8'

响应内容

requests提供不同的属性和方法来访问响应的负载：

response.content：以字节形式返回响应内容。
response.text：以Unicode字符串形式返回响应内容。
response.json()：以字典形式返回响应的JSON编码内容。

请看以下示例：

import requests

response = requests.get('http://lumtest.com/myip.json')

# access the response as bytes

response_bytes = response.content

print(type(response_bytes))

print(response_bytes)

print()

# retrieve the response as text

response_text = response.text

print(type(response_text))

print(response_text)

print()

# retrieve the response as a JSON-encoded dictionary

response_json = response.json()

print(type(response_json))

print(response_json)

print()

http://lumtest.com/myip.json是一个返回调用者IP信息的特殊端点。上述代码段的结果如下：

<class 'bytes'>

b'{"ip":"45.85.135.110","country":"US","asn":{"asnum":62240,"org_name":"Clouvider Limited"},"geo":{"city":"Ashburn","region":"VA","region_name":"Virginia","postal_code":"20149","latitude":39.0469,"longitude":-77.4903,"tz":"America/New_York","lum_city":"ashburn","lum_region":"va"}}'

<class 'str'>

{"ip":"45.85.135.110","country":"US","asn":{"asnum":62240,"org_name":"Clouvider Limited"},"geo":{"city":"Ashburn","region":"VA","region_name":"Virginia","postal_code":"20149","latitude":39.0469,"longitude":-77.4903,"tz":"America/New_York","lum_city":"ashburn","lum_region":"va"}}

<class 'dict'>

{'ip': '45.85.135.110', 'country': 'US', 'asn': {'asnum': 62240, 'org_name': 'Clouvider Limited'}, 'geo': {'city': 'Ashburn', 'region': 'VA', 'region_name': 'Virginia', 'postal_code': '20149', 'latitude': 39.0469, 'longitude': -77.4903, 'tz': 'America/New_York', 'lum_city': 'ashburn', 'lum_region': 'va'}}

注意这三种不同的响应格式。作为一个字典，response.json()特别有用，因为它简化了数据访问：

response_json['country'] # 'US'

有关更多信息，请参阅我们关于如何在Python中解析JSON的指南。

响应Cookies

虽然HTTP cookies通过头部定义，但Response对象提供了一个特殊的cookies属性来处理它们。这返回一个http.cookiejar对象，其中包含服务器发送回的cookies。

请看下面的示例，演示如何在Pythonrequests库中访问响应对象的cookies：

import requests

# define the login credentials

credentials = {

'username': 'example_user',

'password': 'example_password'

}

# send a POST request to the login endpoint

response = requests.post('https://www.example.com/login', data=credentials)

# access the cookies set by the server

cookies = response.cookies

# print the cookies received from the server

for cookie in cookies:

print(cookie.name, ':', cookie.value)

上面的代码段可能会生成如下内容：

session_id : be400765483cf840dfbbd39

user_id : 7164

expires : Sat, 01 Jan 2025 14:30:00 GMT

使用Python Requests库进行请求自定义

HTTP请求通常涉及特殊的过滤参数和自定义头部。让我们看看如何在requests中指定它们。

查询字符串参数

查询参数，也称为URL参数，是附加在HTTP请求URL末尾的额外参数。它们为服务器提供关于请求的额外信息，通常用于如何过滤数据和自定义响应。

考虑这个URL：

https://api.example.com/data?key1=value1&key2=value2

在这个例子中，?key1=value1&key2=value2是查询字符串，而key1和key2是查询参数。

查询字符串以?开始，由用等号（=）分隔的键值对组成，并由&连接。在Python代码中编程指定此查询字符串并不总是容易的，尤其是当处理可选参数时。这就是为什么requests提供了params选项：

import requests

# define query parameters as a dictionary

params = {

'page': 1,

'limit': 10,

'category': 'electronics'

}

# send a GET request to the following URL:

# 'https://api.example.com/products?page=1&limit=10&category=electronics'

response = requests.get('https://api.example.com/products', params=params)

同样，你可以将参数作为元组列表传递给requests：

import requests

# define query parameters as a list of tuples

params = [

('page', '1'),

('limit', '10'),

('category', 'electronics')

]

response = requests.get('https://api.example.com/products', params=params)

或者作为bytes字符串：

import requests

# define query parameters as a bytes string

params = b'page=1&limit=10&category=electronics'

response = requests.get('https://api.example.com/products', params=params)

请求头

要在requests中自定义HTTP请求的头部，可以将它们作为字典传递给headers选项。例如，你可以在requests中设置自定义User-Agent字符串：

import requests

# define custom headers

custom_headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',

# other headers...

}

# send a GET request with custom headers

response = requests.get('https://api.example.com/data', headers=custom_headers)

请求Cookies

虽然HTTP cookies通过头部发送到服务器，但requests提供了一个专门的cookies选项来自定义它们。使用以下示例：

# define custom cookies

custom_cookies = {

'session_id': 'be400765483cf840dfbbd39',

'user_id': '7164'

}

# send a GET request with custom cookies

response = requests.get('https://www.example.com', cookies=custom_cookies)

注意cookies接受一个字典或http.cookiejar对象。

其他配置

requests提供了丰富的API，许多高级技术可供使用。探索一些最相关的技术！

代理设置

在requests中集成代理服务器允许你通过代理服务器路由HTTP请求。这是一种强大的机制，可以隐藏你的IP地址、绕过速率限制器或访问地理限制的内容。

你可以通过proxies选项将代理服务器与Pythonrequests库集成：

import requests

# define the proxy settings

proxy = {

'http': 'http://username:[email protected]:8080',

'https': 'https://username:[email protected]:8080'

}

# Make a request using the proxy

response = requests.get('https://www.example.com', proxies=proxy)

有关完整的教程，请参阅我们的使用Python Requests和代理指南。

基本认证

HTTP认证，更好地称为“基本认证”，是一种内置于HTTP协议中的简单认证机制。它涉及在Authorization头部中以Base64格式发送用户名和密码。

尽管你可以通过手动设置Authorization头部来实现，但requests公开了一个专门的auth选项。这个选项接受一个包含用户名和密码的元组。使用它来处理requests库中的基本认证：

import requests

# define the username and password for basic authentication

username = 'sample_username'

password = 'sample_password'

# send a GET request with basic authentication

response = requests.get('https://api.example.com/private/users', auth=(username, password))

SSL证书验证

SSL证书验证对于确保客户端和服务器之间的安全通信至关重要。同时，在某些情况下，你信任目标服务器，并且不需要强制验证。

特别是当通过代理服务器路由HTTP流量时，你可能会遇到与SSL证书相关的错误。在这种情况下，你可能需要禁用SSL证书验证。在requests中，可以通过verify选项实现：

import requests

# send a GET request to a website with SSL certificate verification disabled

response = requests.get('https://api.example.com/data', verify=False)

超时

默认情况下，requests会无限期地等待服务器响应。如果服务器负载过重或网络速度变慢，这种行为可能会成为问题。

为了避免在等待可能永远不会到达的响应时减慢应用程序速度，requests具有timeout选项。它接受一个整数或浮点数，表示等待响应的秒数：

import requests

# timeout after 2 second

response1 = requests.get("https://api.example.com/data", timeout=2)

或者，timeout接受一个包含两个元素的元组：连接超时和读取超时。如下所示：

import requests

# timeout after 2.5 seconds for connections and 4 seconds for reading response

response = requests.get("https://api.example.com/data", timeout=(2.5, 4))

如果请求在指定的连接超时内建立连接，并在读取超时内接收到数据，响应将照常返回。否则，如果请求超时，将引发Timeout异常：

import requests

from requests.exceptions import Timeout

try:

response = requests.get("https://api.example.com/data", timeout=(2.5, 4))

except Timeout:

print("The request timed out")