使用Llama index構(gòu)建多代理 RAG
檢索增強生成(RAG)已成為增強大型語言模型(LLM)能力的一種強大技術(shù)。通過從知識來源中檢索相關(guān)信息并將其納入提示,RAG為LLM提供了有用的上下文,以產(chǎn)生基于事實的輸出。
但是現(xiàn)有的單代理RAG系統(tǒng)面臨著檢索效率低下、高延遲和次優(yōu)提示的挑戰(zhàn)。這些問題在限制了真實世界的RAG性能。多代理體系結(jié)構(gòu)提供了一個理想的框架來克服這些挑戰(zhàn)并釋放RAG的全部潛力。通過劃分職責(zé),多代理系統(tǒng)允許專門的角色、并行執(zhí)行和優(yōu)化協(xié)作。
單代理RAG
當(dāng)前的RAG系統(tǒng)使用單個代理來處理完整的工作流程——查詢分析、段落檢索、排序、摘要和提示增強。
這種單一的方法提供了一個簡單的一體化解決方案。但是對每個任務(wù)依賴一個代理會導(dǎo)致瓶頸。代理會浪費時間從大量語料庫中檢索無關(guān)緊要的段落。長上下文的總結(jié)很糟糕,并且提示無法以最佳方式集成原始問題和檢索到的信息。
這些低效率嚴(yán)重限制了實時應(yīng)用程序的RAG的可伸縮性和速度。
多代理RAG
多代理體系結(jié)構(gòu)可以克服單代理的限制。通過將RAG劃分為并發(fā)執(zhí)行的模塊化角色可以實現(xiàn):
- 檢索:專用檢索代理專注于使用優(yōu)化的搜索技術(shù)進(jìn)行有效的通道檢索。這將最小化延遲。
- 搜索:通過排除檢索因素,搜索可以在檢索代理之間并行化,以減少等待時間。
- 排名:單獨的排名代理評估檢索的豐富度,特異性和其他相關(guān)信號的傳代。這將過濾最大的相關(guān)性。
- 總結(jié):將冗長的上下文總結(jié)成簡潔的片段,只包含最重要的事實。
- 優(yōu)化提示:動態(tài)調(diào)整原始提示和檢索信息的集成。
- 靈活的體系:可以替換和添加代理來定制系統(tǒng)。可視化工具代理可以提供對工作流的洞察。
通過將RAG劃分為專門的協(xié)作角色,多代理系統(tǒng)增強了相關(guān)性,減少了延遲,并優(yōu)化了提示。這將解鎖可伸縮的高性能RAG。
劃分職責(zé)允許檢索代理結(jié)合互補技術(shù),如向量相似性、知識圖譜和互聯(lián)網(wǎng)抓取。這種多信號方法允許檢索捕獲相關(guān)性不同方面的不同內(nèi)容。
通過在代理之間協(xié)作分解檢索和排序,可以從不同的角度優(yōu)化相關(guān)性。結(jié)合閱讀和編排代理,它支持可伸縮的多角度RAG。
模塊化架構(gòu)允許工程師跨專門代理組合不同的檢索技術(shù)。
Llama index的多代理 RAG
Llama index概述了使用多代理RAG的具體示例:
- 文檔代理——在單個文檔中執(zhí)行QA和摘要。
- 向量索引——為每個文檔代理啟用語義搜索。
- 摘要索引——允許對每個文檔代理進(jìn)行摘要。
- 高階(TOP-LEVEL)代理——編排文檔代理以使用工具檢索回答跨文檔的問題。
對于多文檔QA,比單代理RAG基線顯示出真正的優(yōu)勢。由頂級代理協(xié)調(diào)的專門文檔代理提供基于特定文檔的更集中、更相關(guān)的響應(yīng)。
下面我們看看Llama index是如何實現(xiàn)的:
我們將下載關(guān)于不同城市的Wikipedia文章。每篇文章都是單獨存儲的。我們只找了18個城市,雖然不是很大,但是這已經(jīng)可以很好的演示高級文檔檢索的功能。
from llama_index import (
VectorStoreIndex,
SummaryIndex,
SimpleKeywordTableIndex,
SimpleDirectoryReader,
ServiceContext,
)
from llama_index.schema import IndexNode
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.llms import OpenAI
下面是城市的列表:
wiki_titles = [
"Toronto",
"Seattle",
"Chicago",
"Boston",
"Houston",
"Tokyo",
"Berlin",
"Lisbon",
"Paris",
"London",
"Atlanta",
"Munich",
"Shanghai",
"Beijing",
"Copenhagen",
"Moscow",
"Cairo",
"Karachi",
]
下面是下載每個城市文檔代碼:
from pathlib import Path
import requests
for title in wiki_titles:
response = requests.get(
"https://en.wikipedia.org/w/api.php",
params={
"action": "query",
"format": "json",
"titles": title,
"prop": "extracts",
# 'exintro': True,
"explaintext": True,
},
).json()
page = next(iter(response["query"]["pages"].values()))
wiki_text = page["extract"]
data_path = Path("data")
if not data_path.exists():
Path.mkdir(data_path)
with open(data_path / f"{title}.txt", "w") as fp:
fp.write(wiki_text)
加載下載的文檔
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
city_docs[wiki_title] = SimpleDirectoryReader(
input_files=[f"data/{wiki_title}.txt"]
).load_data()
定義LLM +上下文+回調(diào)管理器
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)
我們?yōu)槊總€文檔定義“文檔代理”:為每個文檔定義向量索引(用于語義搜索)和摘要索引(用于摘要)。然后將這兩個查詢引擎轉(zhuǎn)換為傳遞給OpenAI函數(shù)調(diào)用工具。
文檔代理可以動態(tài)選擇在給定文檔中執(zhí)行語義搜索或摘要。我們?yōu)槊總€城市創(chuàng)建一個單獨的文檔代理。
from llama_index.agent import OpenAIAgent
from llama_index import load_index_from_storage, StorageContext
from llama_index.node_parser import SimpleNodeParser
import os
node_parser = SimpleNodeParser.from_defaults()
# Build agents dictionary
agents = {}
query_engines = {}
# this is for the baseline
all_nodes = []
for idx, wiki_title in enumerate(wiki_titles):
nodes = node_parser.get_nodes_from_documents(city_docs[wiki_title])
all_nodes.extend(nodes)
if not os.path.exists(f"./data/{wiki_title}"):
# build vector index
vector_index = VectorStoreIndex(nodes, service_cnotallow=service_context)
vector_index.storage_context.persist(
persist_dir=f"./data/{wiki_title}"
)
else:
vector_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=f"./data/{wiki_title}"),
service_cnotallow=service_context,
)
# build summary index
summary_index = SummaryIndex(nodes, service_cnotallow=service_context)
# define query engines
vector_query_engine = vector_index.as_query_engine()
summary_query_engine = summary_index.as_query_engine()
# define tools
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name="vector_tool",
descriptinotallow=(
"Useful for questions related to specific aspects of"
f" {wiki_title} (e.g. the history, arts and culture,"
" sports, demographics, or more)."
),
),
),
QueryEngineTool(
query_engine=summary_query_engine,
metadata=ToolMetadata(
name="summary_tool",
descriptinotallow=(
"Useful for any requests that require a holistic summary"
f" of EVERYTHING about {wiki_title}. For questions about"
" more specific sections, please use the vector_tool."
),
),
),
]
# build agent
function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
verbose=True,
system_prompt=f"""\
You are a specialized agent designed to answer queries about {wiki_title}.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)
agents[wiki_title] = agent
query_engines[wiki_title] = vector_index.as_query_engine(
similarity_top_k=2
)
下面就是高階代理,它可以跨不同的文檔代理進(jìn)行編排,回答任何用戶查詢。
高階代理可以將所有文檔代理作為工具,執(zhí)行檢索。這里我們使用top-k檢索器,但最好的方法是根據(jù)我們的需求進(jìn)行自定義檢索。
# define tool for each document agent
all_tools = []
for wiki_title in wiki_titles:
wiki_summary = (
f"This content contains Wikipedia articles about {wiki_title}. Use"
f" this tool if you want to answer any questions about {wiki_title}.\n"
)
doc_tool = QueryEngineTool(
query_engine=agents[wiki_title],
metadata=ToolMetadata(
name=f"tool_{wiki_title}",
descriptinotallow=wiki_summary,
),
)
all_tools.append(doc_tool)
# define an "object" index and retriever over these tools
from llama_index import VectorStoreIndex
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping
tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
all_tools,
tool_mapping,
VectorStoreIndex,
)
from llama_index.agent import FnRetrieverOpenAIAgent
top_agent = FnRetrieverOpenAIAgent.from_retriever(
obj_index.as_retriever(similarity_top_k=3),
system_prompt=""" \
You are an agent designed to answer queries about a set of given cities.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
""",
verbose=True,
)
作為比較,我們定義了一個“簡單”的RAG管道,它將所有文檔轉(zhuǎn)儲到單個矢量索引集合中。設(shè)置top_k = 4
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)
讓我們運行一些示例查詢,對比單個文檔的QA /摘要到多個文檔的QA /摘要。
response = top_agent.query("Tell me about the arts and culture in Boston")
結(jié)果如下:
=== Calling Function ===
Calling function: tool_Boston with args: {
"input": "arts and culture"
}
=== Calling Function ===
Calling function: vector_tool with args: {
"input": "arts and culture"
}
Got output: Boston is known for its vibrant arts and culture scene. The city is home to a number of performing arts organizations, including the Boston Ballet, Boston Lyric Opera Company, Opera Boston, Boston Baroque, and the Handel and Haydn Society. There are also several theaters in or near the Theater District, such as the Cutler Majestic Theatre, Citi Performing Arts Center, the Colonial Theater, and the Orpheum Theatre. Boston is a center for contemporary classical music, with groups like the Boston Modern Orchestra Project and Boston Musica Viva. The city also hosts major annual events, such as First Night, the Boston Early Music Festival, and the Boston Arts Festival. In addition, Boston has several art museums and galleries, including the Museum of Fine Arts, the Isabella Stewart Gardner Museum, and the Institute of Contemporary Art.
========================
Got output: Boston is renowned for its vibrant arts and culture scene. It is home to numerous performing arts organizations, including the Boston Ballet, Boston Lyric Opera Company, Opera Boston, Boston Baroque, and the Handel and Haydn Society. The city's Theater District houses several theaters, such as the Cutler Majestic Theatre, Citi Performing Arts Center, the Colonial Theater, and the Orpheum Theatre.
Boston is also a hub for contemporary classical music, with groups like the Boston Modern Orchestra Project and Boston Musica Viva. The city hosts major annual events, such as First Night, the Boston Early Music Festival, and the Boston Arts Festival, which contribute to its cultural richness.
In terms of visual arts, Boston boasts several art museums and galleries. The Museum of Fine Arts, the Isabella Stewart Gardner Museum, and the Institute of Contemporary Art are among the most notable. These institutions offer a wide range of art collections, from ancient to contemporary, attracting art enthusiasts from around the world.
========================
下面我們看看上面的簡單RAG管道的結(jié)果
# baseline
response = base_query_engine.query(
"Tell me about the arts and culture in Boston"
)
print(str(response))
Boston has a rich arts and culture scene. The city is home to a variety of performing arts organizations, such as the Boston Ballet, Boston Lyric Opera Company, Opera Boston, Boston Baroque, and the Handel and Haydn Society. Additionally, there are numerous contemporary classical music groups associated with the city's conservatories and universities, like the Boston Modern Orchestra Project and Boston Musica Viva. The Theater District in Boston is a hub for theater, with notable venues including the Cutler Majestic Theatre, Citi Performing Arts Center, the Colonial Theater, and the Orpheum Theatre. Boston also hosts several significant annual events, including First Night, the Boston Early Music Festival, the Boston Arts Festival, and the Boston gay pride parade and festival. The city is renowned for its historic sites connected to the American Revolution, as well as its art museums and galleries, such as the Museum of Fine Arts, Isabella Stewart Gardner Museum, and the Institute of Contemporary Art.
可以看到我們構(gòu)建的多代理系統(tǒng)的結(jié)果要好的多。
總結(jié)
RAG系統(tǒng)必須發(fā)展多代理體系結(jié)構(gòu)以實現(xiàn)企業(yè)級性能。正如這個例子所說明的,劃分職責(zé)可以在相關(guān)性、速度、摘要質(zhì)量和及時優(yōu)化方面獲得收益。通過將RAG分解為專門的協(xié)作角色,多代理系統(tǒng)可以克服單代理的限制,并啟用可擴(kuò)展的高性能RAG。