find() 和 find_all() 是使用 BeautifulSoup 进行网页抓取时的核心方法，可帮助你从 HTML 中提取数据。find() 方法会检索符合条件的第一个元素，例如 find("div") 会返回页面中的第一个 div 标签，如果找不到则返回 None。同时，find_all() 会找到所有符合条件的元素并以列表形式返回，对于需要提取多个元素（如所有 div 标签）非常适用。在开始使用 BeautifulSoup 进行网页抓取之前，请先确保已安装 Requests 和 BeautifulSoup。

安装依赖

pip install requests

pip install beautifulsoup4

find()

让我们先来了解一下 find()。在下面的示例中，我们会使用 Quotes To Scrape 和 Fake Store API 来查找页面上的元素。这两个网站专为教学和演示爬虫而设计，内容变化不大，非常适合练习。

通过 Class 查找

要根据 class 来查找元素，可使用 class_ 关键字。你可能会好奇为什么是 class_ 而不是 class？这是因为 class 在 Python 中是一个关键字，用来定义类。使用 class_ 可以防止与 Python 关键字冲突。

下面的示例查找了第一个 div，其 class 为 quote：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

first_quote = soup.find("div", class_="quote")
print(first_quote.text)

以下是输出结果：

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
by Albert Einstein
(about)


            Tags:
            
change
deep-thoughts
thinking
world

通过 ID 查找

在进行爬虫时，你也可能需要通过 id 来查找元素。下面的示例中，我们使用 id 参数来查找页面中的菜单。我们找到这个页面上的菜单，id 为 menu。

import requests
from bs4 import BeautifulSoup

response = requests.get("https://fakestoreapi.com")

soup = BeautifulSoup(response.text, "html.parser")

ul = soup.find("ul", id="menu")

print(ul.text)

以下是我们提取并打印在终端的菜单内容：

Home
Docs
GitHub
Buy me a coffee

通过文本查找

我们也可以根据元素的文本内容来搜索。为此需要使用 string 参数。下面的示例查找了页面上文本为 Login 的按钮：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

login_button = soup.find("a", string="Login")
print(login_button.text)

可以看到，Login 被打印到控制台：

Login

通过属性查找

我们也可以使用其他属性来进行更严格的筛选。这一次，我们依然查找页面上的第一个名言，但会寻找 span，其 itemprop 为 text。这样就能只获取名言本身，而不包含作者和标签等额外信息：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

first_clean_quote = soup.find("span", attrs={"itemprop": "text"})

print(first_clean_quote.text)

这是我们获取到的干净名言：

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”

使用多重条件查找

你可能已经注意到了，attr 参数接收 dict，而不是单一值。这样可以让我们传入多个条件进行更精准的筛选。这里，我们通过 class 和 itemprop 两个属性查找页面上的第一个作者：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

first_author = soup.find("small", attrs={"class": "author", "itemprop": "author"})
print(first_author.text)

运行后，你应该会看到 Albert Einstein 作为输出：

Albert Einstein

find_all()

现在，让我们使用 find_all() 再演示一遍同样的示例。我们依然会使用 Quotes to Scrape 和 Fake Store API。两者最主要的区别在于：find() 返回单个元素，而 find_all() 返回一个包含多个页面元素的 list。

通过 Class 查找

要通过 class 属性查找元素，可使用 class_ 参数。下面这段代码使用 find_all() 来提取页面中所有 class 为 quote 的元素：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

quotes = soup.find_all("div", class_="quote")

for quote in quotes:
    print("-------------")
    print(quote.text)

当我们提取并打印首页所有的名言时，输出如下：

-------------

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
by Albert Einstein
(about)


            Tags:
            
change
deep-thoughts
thinking
world


-------------

“It is our choices, Harry, that show what we truly are, far more than our abilities.”
by J.K. Rowling
(about)


            Tags:
            
abilities
choices


-------------

“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
by Albert Einstein
(about)


            Tags:
            
inspirational
life
live
miracle
miracles


-------------

“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
by Jane Austen
(about)


            Tags:
            
aliteracy
books
classic
humor


-------------

“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
by Marilyn Monroe
(about)


            Tags:
            
be-yourself
inspirational


-------------

“Try not to become a man of success. Rather become a man of value.”
by Albert Einstein
(about)


            Tags:
            
adulthood
success
value


-------------

“It is better to be hated for what you are than to be loved for what you are not.”
by André Gide
(about)


            Tags:
            
life
love


-------------

“I have not failed. I've just found 10,000 ways that won't work.”
by Thomas A. Edison
(about)


            Tags:
            
edison
failure
inspirational
paraphrased


-------------

“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
by Eleanor Roosevelt
(about)


            Tags:
            
misattributed-eleanor-roosevelt


-------------

“A day without sunshine is like, you know, night.”
by Steve Martin
(about)


            Tags:
            
humor
obvious
simile

通过 ID 查找

正如在 find() 中所示，id 也是一种常用的查找方式。通过 id 来提取元素的方法与之前相同。我们在下面的代码中会查找所有 ul，它们的 id 为 menu。实际上页面中只会有一个，所以结果中也只会找到一个：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://fakestoreapi.com")

soup = BeautifulSoup(response.text, "html.parser")

uls = soup.find_all("ul", id="menu")

for ul in uls:
    print("-------------")
    print(ul.text)

由于页面上只有一个菜单，所以输出与 find() 的结果完全相同：

-------------

Home
Docs
GitHub
Buy me a coffee

通过文本查找

下面我们根据文本内容来查找页面元素，使用的依旧是 string 参数。示例中，我们查找所有包含文本 Login 的 a 元素。虽然我们说的是“所有”，但页面上其实只有一个：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

login_buttons = soup.find_all("a", string="Login")

for button in login_buttons:
    print("-------------")
    print(button)

输出结果如下：

-------------
<a href="/login">Login</a>

通过属性查找

实际爬虫中，你会经常需要使用其他属性来筛选数据。还记得我们第一次示例中的输出非常混乱吗？在下面的代码中，我们会使用 itemprop 来只提取名言内容：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

clean_quotes = soup.find_all("span", attrs={"itemprop": "text"})

for quote in clean_quotes:
    print("-------------")
    print(quote.text)

你可以看到我们的输出干净许多：

-------------
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
-------------
“It is our choices, Harry, that show what we truly are, far more than our abilities.”
-------------
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
-------------
“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”
-------------
“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”
-------------
“Try not to become a man of success. Rather become a man of value.”
-------------
“It is better to be hated for what you are than to be loved for what you are not.”
-------------
“I have not failed. I've just found 10,000 ways that won't work.”
-------------
“A woman is like a tea bag; you never know how strong it is until it's in hot water.”
-------------
“A day without sunshine is like, you know, night.”

使用多重条件查找

这一次，我们会在 attrs 参数中传入更复杂的筛选条件。在下面的示例中，我们查找所有 small 元素，其 class 为 author 且 itemprop 为 author。通过给 attrs 传入多个键值对实现：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

authors = soup.find_all("small", attrs={"class": "author", "itemprop": "author"})

for author in authors:
    print("-------------")
    print(author.text)

控制台打印的作者列表如下：

-------------
Albert Einstein
-------------
J.K. Rowling
-------------
Albert Einstein
-------------
Jane Austen
-------------
Marilyn Monroe
-------------
Albert Einstein
-------------
André Gide
-------------
Thomas A. Edison
-------------
Eleanor Roosevelt
-------------
Steve Martin

进阶技巧

下面是一些进阶技巧。示例中使用 find_all()，但这些方法同样适用于 find()。重点是你想获取单个元素还是整个列表？

正则表达式 (Regex)

正则表达式在字符串匹配上非常强大。下面的示例中，我们将它和 string 参数结合起来，查找包含 einstein（忽略大小写）的所有元素：

import requests
import re
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

pattern = re.compile(r"einstein", re.IGNORECASE)

tags = soup.find_all(string=pattern)

print(f"Total Einstein quotes: {len(tags)}")

页面上一共找到了 3 条与爱因斯坦相关的内容：

Total Einstein quotes: 3

自定义函数

现在我们来写一个自定义函数，专门返回所有与 Einstein 相关的名言。下面的例子中，我们在正则表达式的基础上做了进一步扩展。我们先通过 parent 一层层向上查找名言所在的卡片，接着再找出卡片中的所有 span，其中第一个 span 存放的就是名言内容，最后打印到控制台：

import requests
import re
from bs4 import BeautifulSoup

def find_einstein_quotes(http_response):
    soup = BeautifulSoup(http_response.text, "html.parser")

    #find all einstein tags
    pattern = re.compile(r"einstein", re.IGNORECASE)
    tags = soup.find_all(string=pattern)

    for tag in tags:
        #follow the parents until we have the quote card
        full_card = tag.parent.parent.parent

        #find the spans
        spans = full_card.find_all("span")

        #print the first span, it contains the actual quote
        print(spans[0].text)


if __name__ == "__main__":
    response = requests.get("https://quotes.toscrape.com")
    find_einstein_quotes(response)

这是我们的输出：

“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”
“Try not to become a man of success. Rather become a man of value.”

额外收获：使用 CSS 选择器进行查找

BeautifulSoup 的 select 方法和 find_all() 的工作方式几乎相同，但它更灵活一些。这个方法接收一个 CSS 选择器，只要你能写出 CSS 选择器，就能用它来查找想要的元素。下面这段代码，我们使用了多个属性来找到所有作者，但把这些属性写成一个 CSS 选择器：

import requests
from bs4 import BeautifulSoup

response = requests.get("https://quotes.toscrape.com")

soup = BeautifulSoup(response.text, "html.parser")

authors = soup.select("small[class='author'][itemprop='author']")

for author in authors:
    print("-------------")
    print(author.text)

输出如下：

-------------
Albert Einstein
-------------
J.K. Rowling
-------------
Albert Einstein
-------------
Jane Austen
-------------
Marilyn Monroe
-------------
Albert Einstein
-------------
André Gide
-------------
Thomas A. Edison
-------------
Eleanor Roosevelt
-------------
Steve Martin

结论

到这里，你已经基本了解了 find() 和 find_all() 在 BeautifulSoup 中的用法。你不需要全部方法都烂熟于心，但请记住 BeautifulSoup 提供了丰富的查找方式，灵活运用于任何网页数据的获取。在生产环境中，如果你需要高成功率的快速爬取，你可以考虑使用我们的 Residential Proxies 或 Scraping Browser（内置代理管理及 CAPTCHA 解决能力）。

立即注册并开始免费试用，找到最适合你需求的产品吧。

免费试用

BeautifulSoup 中的 find() 和 find_all(): 2025 指南

find()

通过 Class 查找

通过 ID 查找

通过文本查找

通过属性查找

使用多重条件查找

find_all()

通过 Class 查找

通过 ID 查找

通过文本查找

通过属性查找

使用多重条件查找

进阶技巧

正则表达式 (Regex)

自定义函数

额外收获：使用 CSS 选择器进行查找

结论

你也可能对此有兴趣

2025 年最强的 7 大 C# 网络爬虫库

2025年如何使用 Gospider 进行 Web 爬取

2025年最佳美国代理：前7大供应商