基于Gemini 2.0和LangGraph實(shí)現(xiàn)自主多工具AI代理 原創(chuàng)
本文旨在探討基于Gemini 2.0和LangGraph構(gòu)建一個自主型多工具AI代理的關(guān)鍵過程,并提供示例程序的完整代碼。
簡介
當(dāng)今,LLM應(yīng)用非常出色——它們可以記住大量信息、回答一般知識問題、編寫代碼、生成故事,甚至還可以糾正語法錯誤,等等。然而,它們并非沒有局限性。它們會產(chǎn)生幻覺,知識斷點(diǎn)可能從幾個月到幾年不等,并且僅限于生成文本,無法與現(xiàn)實(shí)世界互動。顯然,這限制了它們在需要實(shí)時數(shù)據(jù)、來源引用或文本生成以外功能的任務(wù)中的實(shí)用性。這正是代理和工具試圖解決的主要問題:它們通過為LLM增加附加功能來彌補(bǔ)這一差距。這些改進(jìn)使LLM能夠訪問最新的信息、與API交互、搜索,甚至能夠影響例如調(diào)節(jié)智能家居的溫度參數(shù)等的物理世界。
在本文中,我們將構(gòu)建一個簡單的LLM代理,該代理配備了四個工具,可用于回答用戶的問題。具體地講,此代理將實(shí)現(xiàn)以下功能:
- 可以使用最新的可驗(yàn)證信息回答一般知識問題。
- 可以使用四種類型的工具:DuckDuckGo搜索、獲取網(wǎng)頁內(nèi)容、維基百科搜索、獲取維基百科頁面內(nèi)容。
- 允許LLM推理用戶的輸入、消息歷史記錄、上一個工具調(diào)用及其結(jié)果,以決定是否接下來使用某個工具——如果使用的話,要使用哪個參數(shù)。
- 代理允許在每個時間步驟中同時使用多個工具。
示例程序代理組件整體架構(gòu)
實(shí)現(xiàn)過程中,我們將使用LangGraph作為代理實(shí)現(xiàn),并使用Gemini 2.0作為LLM。但是,你將能夠以最少的代碼修改把它切換到大多數(shù)你的其他類型的LLM應(yīng)用程序。本示例項(xiàng)目的完整代碼位于地址:??https://github.com/CVxTz/document_ai_agents/blob/master/document_ai_agents/document_multi_tool_agent.py???
工具
首先,讓我們構(gòu)建其中一個工具,例如Web搜索工具:
from duckduckgo_search import DDGS
from pydantic import BaseModel
class PageSummary(BaseModel):
page_title: str
page_summary: str
page_url: str
class SearchResponse(BaseModel):
page_summaries: list[PageSummary]
def search_duck_duck_go(search_query: str) -> SearchResponse:
"""
通過 duckduckgo 頁面進(jìn)行搜索。
:參數(shù)search_query: 發(fā)送到 DuckDuckGo 搜索的查詢。
每次只搜索一個項(xiàng)目,即使這意味著多次調(diào)用該工具。
:不返回任何值:
"""
max_results = 10
with DDGS() as dd:
results_generator = dd.text(
search_query,
max_results=max_results,
backend="api",
)
return SearchResponse(
page_summaries=[
PageSummary(
page_title=x["title"], page_summary=x["body"], page_url=x["href"]
)
for x in results_generator
]
)
在這里,我們的“工具”是一個簡單的Python函數(shù),它使用duckduckgo_search庫來獲取與搜索查詢相關(guān)的搜索結(jié)果。LLM將根據(jù)用戶的消息決定選擇什么查詢。此函數(shù)的輸出如下所示:
{'page_summaries': [{'page_summary': 'Stevia is a plant-based sweetener that '
'is 200 to 400 times sweeter than sugar '
'and has no calories or carbohydrates. '
'Learn about its health benefits, side '
'effects, and how to use it in cooking '
'and baking.',
'page_title': 'Stevia: Health Benefits and Risks - WebMD',
'page_url': 'https://www.webmd.com/food-recipes/what-is-stevia'},
{'page_summary': 'Stevia is a herb that can be used as a '
'zero-calorie, zero-carb sugar '
'substitute. Learn about its history, '
'safety, potential health benefits and '
'drawbacks from Cleveland Clinic experts.',
'page_title': 'Stevia: What Is It and Is It Healthy? - '
'Cleveland Clinic Health Essentials',
'page_url': 'https://health.clevelandclinic.org/what-is-stevia'},
{'page_summary': 'Stevia is a sugar substitute extracted '
'from the leaves of Stevia rebaudiana, a '
'plant native to Paraguay and Brazil. '
...
接下來,讓我們繼續(xù)看一下get_wikipedia_page函數(shù):
def get_wikipedia_page(page_title: str, max_text_size: int = 16_000):
"""
獲取維基百科頁面的全部內(nèi)容
:參數(shù)page_title: 首先調(diào)用工具“search_wikipedia”以確保該頁面存在。
:參數(shù)max_text_size: 默認(rèn)值為16000
:return:
"""
page = wikipedia.page(title=page_title, auto_suggest=False)
full_content = strip_tags(page.html())
full_page = FullPage(
page_title=page.title,
page_url=page.url,
content=full_content[:max_text_size],
)
return full_page
此函數(shù)獲取頁面的完整html,并將其刪除的html標(biāo)簽返回給調(diào)用者。輸出如下所示:
{'content': 'Sweetener and sugar substitute\n'
'This article is about the sweetener. For other uses, see Stevia '
'(disambiguation).\n'
'\n'
'Stevia (/?sti?vi?, ?st?vi?/)[1][2] is a sweet sugar substitute '
'that is about 50 to 300\xa0times sweeter than sugar.[3] It is '
'extracted from the leaves of Stevia rebaudiana, a plant native to
...
...
'^ Samuel, Priscilla; Ayoob, Keith T; Magnuson, Bernadene A; '
'W?lwer-Rieck, Ursula; Jeppesen, Per Bendix; Rogers, Peter J; '
'Rowland, Ian; Mathews, Rebecca (1 July 2018). "Stevia Leaf to '
'Stevia Sweetener: Exploring Its Science, Benefits, and Future '
'Potential". The Journal of Nutrition. 148 (7): 1186S – 1205S. '
'doi:10.1093/jn/nxy102. '
'hdl:1983/618532e2-6caa-4fbe-bd6e-6eb0cb608981. ISSN\xa00022-3166. '
'PMID\xa029982648.\n'
'\n'
'^ a b c d "Has stevia been approved by FDA to be used as a '
'sweetener?". US Foo',
'page_title': 'Stevia',
'page_url': 'https://en.wikipedia.org/wiki/Stevia'}
總的來說,我們定義了四個這樣的函數(shù):
- search_wikipedia(search_query: str)
- get_wikipedia_page(page_title: str, max_text_size: int = 16_000):
- search_duck_duck_go(search_query: str)
- get_page_content(page_title: str, page_url: str)
所有這些函數(shù)都傳遞給Gemini客戶端,如下所示:
model = genai.GenerativeModel(
"gemini-2.0-flash-exp",
tools=[
get_wikipedia_page,
search_wikipedia,
search_duck_duck_go,
get_page_content,
]
)
客戶端將從函數(shù)定義中推斷出調(diào)用參數(shù)及其類型。它還將傳遞此生成的模型中的函數(shù)的文檔字符串,因此我們需要向LLM解釋該工具如何在此文檔字符串中工作以獲得最佳結(jié)果。
LLM
我們將使用Gemini客戶端與LLM(特別是Gemini 2.0)進(jìn)行交互。首先,你需要設(shè)置一個API密鑰——可以從Google AI Studio獲取。使用此客戶端,我們將提示LLM生成響應(yīng)或創(chuàng)建函數(shù)調(diào)用(或兩者兼而有之)。然后,這些函數(shù)調(diào)用將應(yīng)用于我們上面定義的工具。
當(dāng)使用提示調(diào)用時,客戶端可能會使用常規(guī)文本或包含函數(shù)調(diào)用的內(nèi)容進(jìn)行響應(yīng),例如:
{'function_call':
{'name': 'search_wikipedia', 'args': {'search_query': 'Trey Parker'}}
}
這種類型的響應(yīng)包含工具的名稱及其參數(shù)。這允許LLM選擇要訪問哪些外部資源以及如何訪問。
代理
以下是我們代理的完整實(shí)現(xiàn)(僅70行代碼):
class ToolCallAgent:
def __init__(self, tools: list[Callable], model_name="gemini-2.0-flash-exp"):
self.model_name = model_name
self.model = genai.GenerativeModel(
self.model_name,
tools=tools,
system_instruction="You are a helpful agent that has access to different tools. Use them to answer the "
"user's query if needed. Only use information from external sources that you can cite. "
"You can use multiple tools before giving the final answer. "
"If the tool response does not give an adequate response you can use the tools again with different inputs."
"Only respond when you can cite the source from one of your tools."
"Only answer I don't know after you have exhausted all ways to use the tools to search for that information.",
)
self.tools = tools
self.tool_mapping = {tool.__name__: tool for tool in self.tools}
self.graph = None
self.build_agent()
def call_llm(self, state: AgentState):
response = self.model.generate_content(
state.messages,
request_options=RequestOptions(
retry=retry.Retry(initial=10, multiplier=2, maximum=60, timeout=300)
),
)
return {
"messages": [
type(response.candidates[0].content).to_dict(
response.candidates[0].content
)
]
}
def use_tool(self, state: AgentState):
assert any("function_call" in part for part in state.messages[-1]["parts"])
tool_result_parts = []
for part in state.messages[-1]["parts"]:
if "function_call" in part:
name = part["function_call"]["name"]
func = self.tool_mapping[name]
result = func(**part["function_call"]["args"])
tool_result_parts.append(
{
"function_response": {
"name": name,
"response": result.model_dump(mode="json"),
}
}
)
return {"messages": [{"role": "tool", "parts": tool_result_parts}]}
@staticmethod
def should_we_stop(state: AgentState) -> str:
logger.debug(
f"Entering should_we_stop function. Current message: {state.messages[-1]}"
) # Added log
if any("function_call" in part for part in state.messages[-1]["parts"]):
logger.debug(f"Calling tools: {state.messages[-1]['parts']}")
return "use_tool"
else:
logger.debug("Ending agent invocation")
return END
def build_agent(self):
builder = StateGraph(AgentState)
builder.add_node("call_llm", self.call_llm)
builder.add_node("use_tool", self.use_tool)
builder.add_edge(START, "call_llm")
builder.add_conditional_edges("call_llm", self.should_we_stop)
builder.add_edge("use_tool", "call_llm")
self.graph = builder.compile()
首先,在上述代碼中,我們定義LLM客戶端,其中包含一組工具和一個簡單的提示。
其次,我們定義第一個節(jié)點(diǎn)“call_llm”,使用消息歷史記錄調(diào)用LLM API。
第三,我們定義一個工具調(diào)用節(jié)點(diǎn),執(zhí)行LLM請求的所有函數(shù)調(diào)用。
最后,我們定義代理的狀態(tài)轉(zhuǎn)換圖:
示例項(xiàng)目所定義代理的狀態(tài)圖
代理將繼續(xù)運(yùn)行,直到LLM的新響應(yīng)不包含任何工具調(diào)用。這會在狀態(tài)圖中創(chuàng)建一個循環(huán),并允許代理的執(zhí)行優(yōu)化搜索查詢和工具使用,直到找到用戶詢問的內(nèi)容。
示例運(yùn)行
現(xiàn)在,讓我們對這個代理進(jìn)行測試運(yùn)行。我們將從一個簡單的查詢開始。
英文提問:
Is puffer fish poisonous ? if so, explain why and list some other poisonous (not venomous) fish. Don’t cite Wikipedia only.
譯文:
河豚有毒嗎?如果是,請解釋原因并列出一些其他有毒(無毒)的魚。不要只引用維基百科。
以下是與代理交流的過程(我為了簡潔而刪減了大部分文本):
User(用戶):
英文提問:
Is pufferfish poisonous? If so, explain why and list some other poisonous (not venomous) fish. Don’t cite Wikipedia only.
譯文:
河豚有毒嗎?如果是,請解釋原因并列出一些其他有毒(無毒)的魚。不要只引用維基百科。
模型(Model):
function_call:
name:search_duck_duck_go
args:
search_query:are puffer fish poisonous
工具(Tool):
function_response:
name:search_duck_duck_go
response:
page_summaries:
page_title: How The Poisonous Pufferfish Became A Delicacy (And The Deadly Mishaps …
page_summary: Pufferfish, or fugu, is a popular and expe….
page_url: https://www.tastingtable.com/1616127/poisonous-pufferfish-explained/
page_title: Are Puffer Fish Poisonous To Touch? Here’s What You Need To Know
page_summary: Puffer fish contain a deadly toxin called tetrodotoxin that can cause paralysis and death if ingested or touched. …
page_url: https://safeharborfishing.com/are-puffer-fish-poisonous-to-touch-heres-what-you-need-to-know/
….
部分譯文:
page_title:有毒的河豚如何成為美味佳肴(以及致命的事故……
page_summary:河豚,或稱河豚,是一種流行且昂貴的……
page_url:https://www.tastingtable.com/1616127/poisonous-pufferfish-explained/
page_title:觸摸河豚有毒嗎?以下是你需要了解的內(nèi)容
page_summary:河豚含有一種致命的毒素,稱為河豚毒素,如果攝入或接觸會導(dǎo)致癱瘓和死亡。……
page_url: https://safeharborfishing.com/are-puffer-fish-poisonous-to-touch-heres-what-you-need-to-know/
….
模型(Model)響應(yīng):
Yes, pufferfish are poisonous. They contain a deadly neurotoxin called tetrodotoxin, which can cause paralysis and death if ingested or touched. This toxin is not destroyed by cooking or freezing. The pufferfish’s diet, which includes invertebrates and algae like mussels and shellfish, contributes to the presence of tetrodotoxin.
Here are some other poisonous (not venomous) fish:
Northern Stargazer: These fish have poisonous spines behind their eyes.
Toadfish: Certain species of toadfish have poisonous spines.
Rabbitfish: These fish have spines containing toxins.
…
譯文:
是的,河豚有毒。它們含有一種致命的神經(jīng)毒素,稱為河豚毒素,如果攝入或接觸會導(dǎo)致癱瘓和死亡。這種毒素不會被烹飪或冷凍破壞。河豚的食物包括無脊椎動物和藻類,如貽貝和貝類,這導(dǎo)致了河豚毒素的存在。
以下是其他一些有毒(無毒)的魚:
北方觀星魚:這些魚的眼睛后面有毒刺。
蟾魚:某些種類的蟾魚有有毒的刺。
兔魚:這些魚的刺含有毒素。
…
這次交流涉及一輪工具調(diào)用,包括使用DuckDuckGo進(jìn)行網(wǎng)絡(luò)搜索,這足以讓LLM獲得所需的所有信息。
現(xiàn)在,讓我們嘗試一個更復(fù)雜的場景,其中代理需要進(jìn)行多次工具調(diào)用:
英文提問:
What is the number and season of the south park episode where they get time traveling immigrants? Who was the director of that episode? Where and when was he born ? Give me his wikipedia page link.
譯文:
南方公園第幾集和第幾季有時間旅行移民?那一集的導(dǎo)演是誰?他出生在哪里和什么時候?給我有關(guān)它的維基百科頁面鏈接。
如果沒有工具支持,模型會回答:
英文提問:
The South Park episode you\’re referring to is **Season 18, Episode 7 titled “Grounded Vindaloop”**. It does not feature time-traveling immigrants, but rather focuses…
譯文:
你提到的南方公園劇集是**第18季,第7集,標(biāo)題為“Grounded Vindaloop”**。它沒有時間旅行移民,而是關(guān)注……
顯然,這個答案是錯誤的。
下面,借助于工具支持,有如下結(jié)果:
用戶(User):
What is the number and season of the South Park episode where they get time-traveling immigrants? Who was the director of that episode? Where and when was he born? Give me his Wikipedia page link.
譯文:
南方公園第幾季出現(xiàn)了穿越時空的移民?那一集的導(dǎo)演是誰?他何時何地出生?給我它的維基百科頁面鏈接。
模型(Model)響應(yīng):
function_call:
name: search_wikipedia
args:
search_query: south park time traveling immigrants episode
工具(Tool):
function_response:
name: search_wikipedia
response:
page_summaries:
page_title: Goobacks
page_summary: “Goobacks” is the seventh episode of the eighth season of the animated television series South Park, and the 118th overall episode of the series. In production order, …
page_url: https://en.wikipedia.org/wiki/Goobacks
page_title: List of time travel works of fiction
page_summary: Time travel is a common plot element in fiction. Works where it plays a prominent role are listed below. For stories of time travel in antiquity, see the history of the time travel concept.
page_url: https://en.wikipedia.org/wiki/List_of_time_travel_works_of_fiction
…
模型(Model)響應(yīng):
The South Park episode you’re referring to is called “Goobacks.” It is the seventh episode of the eighth season.
譯文:
你提到的《南方公園》劇集名為“Goobacks”。它是第八季的第七集。
模型(Model)響應(yīng):
function_call:
name: get_wikipedia_page
args:
page_title: Goobacks
工具(Tool):
function_response:
name: get_wikipedia_page
response:
page_title: Goobacks
page_url: https://en.wikipedia.org/wiki/Goobacks
content: 7th episode of the 8th season of South Park. “Goobacks” is the seventh episode of the eighth season of the animated television series South Park, and the 118th overall episode of the series. In production order, it is the season’s sixth episode. It originally aired on Comedy Central in the United States on April 28, 2004. The episode was directed by Trey Parker.
部分譯文:
內(nèi)容:《南方公園》第八季第七集?!禛oobacks》是動畫電視連續(xù)劇《南方公園》第八季第七集,也是該系列的第118集。按制作順序,它是該季的第六集。它最初于2004年4月28日在美國喜劇中心播出。該集由Trey Parker執(zhí)導(dǎo)。
模型(Model)響應(yīng):
The episode “Goobacks” was directed by Trey Parker.
譯文:
“Goobacks”一集的導(dǎo)演是特雷·帕克 (Trey Parker)。
模型(Model)響應(yīng):
function_call:
name: search_wikipedia
args:
search_query: Trey Parker
工具(Tool):
function_response:
name: search_wikipedia
response:
page_summaries:
page_title: Trey Parker
page_summary: Randolph Severn “Trey” Parker III (born October 19, 1969) is an American actor, animator, writer, producer, director, and musician…
page_url: https://en.wikipedia.org/wiki/Trey_Parker
…
模型(Model)響應(yīng):
Trey Parker was born on October 19, 1969.
譯文:
崔·帕克出生于1969年10月19日。
模型(Model)響應(yīng):
function_call:
name: get_wikipedia_page
args:
page_title: Trey Parker
工具(Tool):
function_response:
name: get_wikipedia_page
response:
page_title: Trey Parker
page_url: https://en.wikipedia.org/wiki/Trey_Parker
content: Randolph Severn “Trey” Parker III (born October 19, 1969) is an American actor, animator, writer, producer, director, and musician. He was born in Conifer, Colorado, U.S.
部分譯文:
內(nèi)容:蘭道夫·塞文·“特雷”·帕克三世(1969年10月19日出生)是美國演員、動畫師、作家、制片人、導(dǎo)演和音樂家。他出生于美國科羅拉多州康尼弗。
模型(Model)響應(yīng):
Trey Parker was born in Conifer, Colorado, U.S. on October 19, 1969.
Here is his Wikipedia page link: https://en.wikipedia.org/wiki/Trey_Parker
譯文:
特雷·帕克于1969年10月19日出生于美國科羅拉多州康尼弗。
這是他的維基百科頁面鏈接:https://en.wikipedia.org/wiki/Trey_Parker
該代理能夠迭代地進(jìn)行工具調(diào)用(總共四次調(diào)用),以回答我們這個復(fù)雜的問題。但是,其中明確的答案不在第一個結(jié)果中,而是需要多個推理步驟和搜索查詢。
結(jié)論
我們在本文中開發(fā)了一個自主型的LLM代理,它能夠搜索網(wǎng)絡(luò)、瀏覽維基百科,并使用Gemini2.0作為其核心LLM推理檢索到的信息。該代理不僅能夠獲取相關(guān)數(shù)據(jù),還實(shí)現(xiàn)了根據(jù)初始結(jié)果優(yōu)化其搜索查詢,以找到用戶請求的確切信息。
上述實(shí)現(xiàn)為創(chuàng)建你自己的自主代理奠定了良好的基礎(chǔ)。通過將自定義工具定義為Python函數(shù)并將其集成到代理中,你可以輕松地根據(jù)你的特定需求進(jìn)行調(diào)整。另外,開發(fā)本示例代理過程中得益于LangGraph框架,其實(shí)現(xiàn)的簡單性使得整個定制過程變得更為簡單。
現(xiàn)在,你可以開始構(gòu)建適用于你自己的使用場景和應(yīng)用程序的強(qiáng)大、合適的代理應(yīng)用了。
本文完整源代碼下載地址:??https://github.com/CVxTz/document_ai_agents/blob/master/document_ai_agents/document_multi_tool_agent.py???
譯者介紹
朱先忠,51CTO社區(qū)編輯,51CTO專家博客、講師,濰坊一所高校計算機(jī)教師,自由編程界老兵一枚。
原文標(biāo)題:??Building Autonomous Multi-Tool Agents with Gemini 2.0 and LangGraph??,作者:Youness Mansar
