你要的 AI Agent 工具都在這里

作者：程序員半支煙 2024-07-02 11:16:21

本文主要聊了AI Agent的工具規(guī)范，以及常用工具。AI Agent只有借助工具才能發(fā)揮威力。

只有讓LLM（大模型）學(xué)會(huì)使用工具，才能做出一系列實(shí)用的AI Agent，才能發(fā)揮出LLM真正的實(shí)力。本篇，我們讓AI Agent使用更多的工具，比如：外部搜索、分析CSV、文生圖、執(zhí)行代碼等。

一、使用工具的必要性

LLM（大模型）如果沒有使用工具的能力，那就相當(dāng)于一個(gè)有著聰明大腦但四肢僵硬的漸凍人，什么事兒也做不了。人類之所以區(qū)別于動(dòng)物，正是因?yàn)閷W(xué)會(huì)了使用工具。因此，賦予LLM使用工具的能力至關(guān)重要。

我們需要 LLM去幫助執(zhí)行各種任務(wù)。而Tool（工具）就是LLM 在執(zhí)行任務(wù)過程中，能夠調(diào)用的外部能力。比如：需要檢索外部資料時(shí)，可以調(diào)用檢索工具；需要執(zhí)行一段代碼時(shí)，可以調(diào)用自定義函數(shù)去執(zhí)行。

二、LangChain的Tool規(guī)范

所有的工具肯定要遵守一套規(guī)范，才能讓LLM隨意調(diào)用。為此，LangChain 抽象出一個(gè)Tool 層，只要是遵守這套規(guī)范的函數(shù)就是 Tool 對(duì)象，就可以被 LLM調(diào)用。

1. Tool規(guī)范

Tool的規(guī)范也簡(jiǎn)單，只要有三個(gè)屬性就行：name、description和function。

name：工具的名稱。
description：對(duì)工具的功能描述，后續(xù)這個(gè)描述文本會(huì)添加到Prompt（提示詞）中，LLM 將根據(jù)description來決定是否調(diào)用該工具。
function：此工具實(shí)際運(yùn)行的函數(shù)。

只要遵守這個(gè)規(guī)范就行，使用形式可以有多種，下文的實(shí)踐代碼會(huì)介紹到。

2. Agent使用工具的流程

讓AI Agent使用工具，需要定義Agent和AgentExecutor。AgentExecutor維護(hù)了Tool.name到Tool的Map 結(jié)構(gòu)。

LLM根據(jù)Prompt（包含了Tool的描述）和用戶的問題，判斷是否需要調(diào)用工具，確定某個(gè)工具后，在根據(jù)Tool的名稱和調(diào)用參數(shù)，到映射Map 中獲找Tool實(shí)例，找到之后調(diào)用Tool實(shí)例的function。

三、如何使用各種Tool

自定義Tool只需要遵守以上規(guī)范就可以，下面以幾個(gè)常用的工具做示例。

下文有些工具用到了toolkits。toolkits是LangChain提供的工具包，旨在簡(jiǎn)化使用工具的成本，toolkits里提供了豐富的工具，還在不斷疊加，大部分的工具都可以在里面找到。

1. 外部搜索

使用外部搜索工具。本文使用的是serpapi，serpapi集成了Google、百度等多家搜索引擎，通過api的形式調(diào)用，非常方便。

官網(wǎng)地址：https://serpapi.com/?？梢宰孕凶?cè)，有一些免費(fèi)額度。外部搜索工具定義如下：

# 1. 使用@tool裝飾器，定義搜索工具
@tool
def search(query: str) -> str:
    """只有在需要了解實(shí)時(shí)信息 或 不知道的事情的時(shí)候 才會(huì)使用這個(gè)工具，需要傳入要搜索的內(nèi)容。"""
    serp = SerpAPIWrapper()
    result = serp.run(query)
    return result

2. 文生圖

文生圖工具是使用LangChain社區(qū)提供的DallEAPIWrapper類，本文使用OpenAI的圖片生成模型Dall-E-3，具體代碼如下：

# 2. 使用Tool工具類，定義圖片生成工具
dalle_image_generator = Tool(
    name="基于OpenAI Dall-E-3的圖片生成器",
    func=DallEAPIWrapper(model="dall-e-3").run,
    description="OpenAI DALL-E API 的包裝器。當(dāng)你需要根據(jù) 描述的文本 生成圖像時(shí) 使用此工具，需要傳入 對(duì)于圖像的描述。",
)

這里的DallEAPIWrapper(model="dall-e-3").run方法就是個(gè)函數(shù)，實(shí)際是去調(diào)用了OpenAI的接口。

3. 代碼執(zhí)行器

代碼執(zhí)行器工具，可以執(zhí)行代碼或者根據(jù)自然語言生成代碼。主要使用LangChain提供的PythonREPLTool 和 LangChain提供的toolkits。

比如create_python_agent就簡(jiǎn)化了創(chuàng)建Python解釋器工具的過程。代碼如下：

# 3. 使用toolkit，定義執(zhí)行Python代碼工具
python_agent_executor = create_python_agent(
    llm=model,
    tool=PythonREPLTool(),
    verbose=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
)

4. 分析CSV

CSV工具，用來分析csv文件。依舊是使用toolkits工具包里的create_csv_agent函數(shù)快出創(chuàng)建工具。代碼如下：

# 4. 使用toolkit，定義分析CSV文件工具
csv_agent_executor = create_csv_agent(
    llm=model,
    path="course_price.csv",
    verbose=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
    allow_dangerous_code=True,
)

5. 完整代碼

上面介紹了AI Agent的常用工具，定義好工具之后，在把工具放入到工具集中，最后在定義Agent 和 AgentExecutor就算完成了。短短幾十行代碼，就可以讓LLM使用這么多工具了。

完整代碼如下：

import os
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.agents import create_structured_chat_agent, AgentExecutor, Tool
from langchain.tools import BaseTool, StructuredTool, tool
from langchain_experimental.agents.agent_toolkits import (
    create_python_agent,
    create_csv_agent,
)
from langchain_community.utilities import SerpAPIWrapper
from langchain_experimental.tools import PythonREPLTool
from langchain_community.utilities.dalle_image_generator import DallEAPIWrapper

# 需要先安裝serpapi, pip install serpapi, 還需要到 https://serpapi.com/ 去注冊(cè)賬號(hào)

# SERPAPI_API_KEY 和 OPENAI 相關(guān)密鑰，注冊(cè)到環(huán)境變量
os.environ["SERPAPI_API_KEY"] = (
    "9dd2b2ee429ed996c75c1daf7412df16336axxxxxxxxxxxxxxx"
)
os.environ["OPENAI_API_KEY"] = "sk-a3rrW46OOxLBv9hdfQPBKFZtY7xxxxxxxxxxxxxxxx"
os.environ["OPENAI_API_BASE"] = "https://api.302.ai/v1"

model = ChatOpenAI(model_name="gpt-3.5-turbo")


# 基于reAct機(jī)制的Prompt模板
prompt = hub.pull("hwchase17/structured-chat-agent")



# 各種方式定義工具

# 1. 使用@tool裝飾器，定義搜索工具
@tool
def search(query: str) -> str:
    """只有在需要了解實(shí)時(shí)信息 或 不知道的事情的時(shí)候 才會(huì)使用這個(gè)工具，需要傳入要搜索的內(nèi)容。"""
    serp = SerpAPIWrapper()
    result = serp.run(query)
    return result


# 2. 使用Tool工具類，定義圖片生成工具
dalle_image_generator = Tool(
    name="基于OpenAI Dall-E-3的圖片生成器",
    func=DallEAPIWrapper(model="dall-e-3").run,
    description="OpenAI DALL-E API 的包裝器。當(dāng)你需要根據(jù) 描述的文本 生成圖像時(shí) 使用此工具，需要傳入 對(duì)于圖像的描述。",
)

# 3. 使用toolkit，定義執(zhí)行Python代碼工具
python_agent_executor = create_python_agent(
    llm=model,
    tool=PythonREPLTool(),
    verbose=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
)

# 4. 使用toolkit，定義分析CSV文件工具
csv_agent_executor = create_csv_agent(
    llm=model,
    path="course_price.csv",
    verbose=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
    allow_dangerous_code=True,
)

# 定義工具集合
tool_list = [
    search,
    dalle_image_generator,
    Tool(
        name="Python代碼工具",
        description="""
        當(dāng)你需要借助Python解釋器時(shí)，使用這個(gè)工具。
        比如當(dāng)你需要執(zhí)行python代碼時(shí)，
        或者，當(dāng)你想根據(jù)自然語言的描述生成對(duì)應(yīng)的代碼時(shí)，讓它生成Python代碼，并返回代碼執(zhí)行的結(jié)果。
        """,
        func=python_agent_executor.invoke,
    ),
    Tool(
        name="CSV分析工具",
        description="""
        當(dāng)你需要回答有關(guān)course_price.csv文件的問題時(shí)，使用這個(gè)工具。
        它接受完整的問題作為輸入，在使用Pandas庫計(jì)算后，返回答案。
        """,
        func=csv_agent_executor.invoke,
    ),
]


# 將工具丟給Agent
agent = create_structured_chat_agent(
    llm=model,
    tools=tool_list,
    prompt=prompt
)

# 定義AgentExecutor
agent_executor = AgentExecutor.from_agent_and_tools(
    agent=agent, 
    tools=tool_list, 
    verbose=True, # 打印詳細(xì)的 選擇工具的過程 和 reAct的分析過程
    handle_parsing_errors=True
)



# 不會(huì)使用工具
agent_executor.invoke({"input": "你是誰？"})

# 使用查詢工具
# agent_executor.invoke({"input": "南京今天的溫度是多少攝氏度？現(xiàn)在外面下雨嗎？"})

# 使用Python代碼工具
# agent_executor.invoke(
#     {
#         "input": """
#         幫我執(zhí)行```號(hào)里的python代碼，
        
#         ```python
            
#             def add(a,b):
#                 return a+b
            
#             print("hello world : ", add(100,200))
#         ```
#         """
#     }
# )

# 使用圖片生成工具
# agent_executor.invoke(
#     {
#         "input": "幫我生成一副圖片，圖片描述如下：一個(gè)非常忙碌的中國(guó)高中生在準(zhǔn)備中國(guó)的高考，夜已經(jīng)很深了，旁邊他的媽媽一邊看書一邊在陪伴他，窗外是模糊的霓虹燈。"
#     }
# )

# 使用CSV分析工具
# agent_executor.invoke({"input": "course_price數(shù)據(jù)集里，一共有哪幾個(gè)城市？用中文回答"})

一起看下使用工具后，reAct的整個(gè)過程。