自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<label id="kew3u"><button id="kew3u"><center id="kew3u"></center></button></label>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

細(xì)數(shù)RAG的12個(gè)痛點(diǎn)，英偉達(dá)高級架構(gòu)師親授解決方案

作者：機(jī)器之心 2024-07-04 09:16:27

人工智能新聞

近日，英偉達(dá)生成式AI高級解決方案架構(gòu)師Wenqi Glantz 在 Towards Data Science 發(fā)布了一篇文章，梳理了 12 個(gè) RAG 的痛點(diǎn)并給出了相應(yīng)的解決方案。

檢索增強(qiáng)式生成（RAG）是一種使用檢索提升語言模型的技術(shù)。具體來說，就是在語言模型生成答案之前，先從廣泛的文檔數(shù)據(jù)庫中檢索相關(guān)信息，然后利用這些信息來引導(dǎo)生成過程。這種技術(shù)能極大提升內(nèi)容的準(zhǔn)確性和相關(guān)性，并能有效緩解幻覺問題，提高知識(shí)更新的速度，并增強(qiáng)內(nèi)容生成的可追溯性。RAG 無疑是最激動(dòng)人心的人工智能研究領(lǐng)域之一。有關(guān) RAG 的更多詳情請參閱機(jī)器之心專欄文章《專補(bǔ)大模型短板的RAG有哪些新進(jìn)展？這篇綜述講明白了》。

但 RAG 也并非完美，用戶在使用時(shí)也常會(huì)遭遇一些「痛點(diǎn)」。近日，英偉達(dá)生成式AI高級解決方案架構(gòu)師Wenqi Glantz 在 Towards Data Science 發(fā)布了一篇文章，梳理了 12 個(gè) RAG 的痛點(diǎn)并給出了相應(yīng)的解決方案。

文章目錄如下：

痛點(diǎn) 1：內(nèi)容缺失

痛點(diǎn) 2：錯(cuò)過排名靠前的文檔

痛點(diǎn) 3：不在上下文中——合并策略的局限

痛點(diǎn) 4：未提取出來

痛點(diǎn) 5：格式錯(cuò)誤

痛點(diǎn) 6：不正確的具體說明

痛點(diǎn) 7：不完備

痛點(diǎn) 8：數(shù)據(jù)攝取的可擴(kuò)展性

痛點(diǎn) 9：結(jié)構(gòu)化數(shù)據(jù)問答

痛點(diǎn) 10：從復(fù)雜 PDF 提取數(shù)據(jù)

痛點(diǎn) 11：后備模型

痛點(diǎn) 12：LLM 安全

其中 7 個(gè)痛點(diǎn)（見下圖）來自 Barnett et al. 的論文《Seven Failure Points When Engineering a Retrieval Augmented Generation System》，此外還另外增加了 5 個(gè)常見痛點(diǎn)。

這些痛點(diǎn)對應(yīng)的解決方案如下：

痛點(diǎn) 1：內(nèi)容缺失

知識(shí)庫中缺失上下文。當(dāng)知識(shí)庫中沒有答案時(shí)，RAG 系統(tǒng)會(huì)提供一個(gè)看似可信但并不正確的答案，而不會(huì)承認(rèn)它不知道。用戶會(huì)收到錯(cuò)誤信息，遭遇挫折。

人們提出了兩種解決方案：

清潔數(shù)據(jù)

輸入垃圾，那也必定輸出垃圾。如果你的源數(shù)據(jù)質(zhì)量低劣，比如包含互相沖突的信息，那不管你的 RAG 工作構(gòu)建得多么好，它都不可能用你輸入的垃圾神奇地輸出高質(zhì)量結(jié)果。這個(gè)解決方案不僅適用于這個(gè)痛點(diǎn)，而且適用于本文列出的所有痛點(diǎn)。任何 RAG 工作流程想要獲得優(yōu)良表現(xiàn)，都必須先清潔數(shù)據(jù)。

下面列出了幾個(gè)清潔數(shù)據(jù)的常用策略：

移除噪聲和不相關(guān)信息：這包括移除特殊字符、停用詞（stop words，如 the 和 a）、HTML 標(biāo)簽。
識(shí)別和糾正錯(cuò)誤：包括拼寫錯(cuò)誤、錯(cuò)別字和語法錯(cuò)誤?？梢允褂闷磳憴z查器和語言模型等工具來解決這個(gè)問題。
去重：移除重復(fù)數(shù)據(jù)記錄或可能導(dǎo)致檢索過程出現(xiàn)偏差的相似記錄。

unstructured.io 的核心軟件庫提供了一整套清潔工具可以幫助解決這些數(shù)據(jù)清潔需求。值得一試。

更好的提詞設(shè)計(jì)

對于因?yàn)樾畔⑷狈Χ鴮?dǎo)致系統(tǒng)給出看似可信卻不正確結(jié)果的問題，更好的提詞設(shè)計(jì)能提供很大幫助。通過為系統(tǒng)給出「如果你不確定答案是什么，就告訴我你不知道」這樣的指示，就能鼓勵(lì)模型承認(rèn)自己的局限，并更透明地向用戶傳達(dá)它的不確定。雖然不能保證 100% 準(zhǔn)確度，但在清潔數(shù)據(jù)之后，精心設(shè)計(jì) prompt 是最好的做法之一。

痛點(diǎn) 2：錯(cuò)過排名靠前的文檔

初始檢索過程中缺失上下文。在系統(tǒng)的檢索組件返回的結(jié)果中，關(guān)鍵性的文檔可能并不靠前。正確的答案被忽視了，這會(huì)導(dǎo)致系統(tǒng)無法給出準(zhǔn)確響應(yīng)。上述論文中寫道：「問題的答案就在文檔中，但排名不夠高，就沒有返回給用戶?！?/span>

研究者提出了兩種解決方案：

對 chunk_size 和 similarity_top_k 進(jìn)行超參數(shù)微調(diào)

chunk_size 和 similarity_top_k 這兩個(gè)參數(shù)可用于管理 RAG 模型的數(shù)據(jù)檢索過程的效率和效果。調(diào)整這兩個(gè)參數(shù)會(huì)影響被檢索信息的計(jì)算效率和質(zhì)量之間的權(quán)衡。作者在之前一篇文章中探索了對 chunk_size 和 similarity_top_k 進(jìn)行超參數(shù)微調(diào)的細(xì)節(jié)：

請?jiān)L問：https://medium.com/gitconnected/automating-hyperparameter-tuning-with-llamaindex-72fdd68e3b90

下面給出了示例代碼：

param_tuner = ParamTuner(
    param_fn=objective_function_semantic_similarity,
    param_dict=param_dict,
    fixed_param_dict=fixed_param_dict,
    show_progress=True,
)


results = param_tuner.tune()

objective_function_semantic_similarity 函數(shù)的定義如下，其中 param_dict 包含了參數(shù) chunk_size 和 top_k 以及它們對應(yīng)的值：

# contains the parameters that need to be tuned
param_dict = {"chunk_size": [256, 512, 1024], "top_k": [1, 2, 5]}


# contains parameters remaining fixed across all runs of the tuning process
fixed_param_dict = {
    "docs": documents,
    "eval_qs": eval_qs,
    "ref_response_strs": ref_response_strs,
}


def objective_function_semantic_similarity(params_dict):
    chunk_size = params_dict["chunk_size"]
    docs = params_dict["docs"]
    top_k = params_dict["top_k"]
    eval_qs = params_dict["eval_qs"]
    ref_response_strs = params_dict["ref_response_strs"]


    # build index
    index = _build_index(chunk_size, docs)


    # query engine
    query_engine = index.as_query_engine(similarity_top_k=top_k)


    # get predicted responses
    pred_response_objs = get_responses(
        eval_qs, query_engine, show_progress=True
    )


    # run evaluator
    eval_batch_runner = _get_eval_batch_runner_semantic_similarity()
    eval_results = eval_batch_runner.evaluate_responses(
        eval_qs, respnotallow=pred_response_objs, reference=ref_response_strs
    )


    # get semantic similarity metric
    mean_score = np.array(
        [r.score for r in eval_results["semantic_similarity"]]
    ).mean()


    return RunResult(score=mean_score, params=params_dict)

更多細(xì)節(jié)請?jiān)L問 LlamaIndex 的關(guān)于 RAG 的超參數(shù)優(yōu)化的完整筆記：

https://docs.llamaindex.ai/en/stable/examples/param_optimizer/param_optimizer/

重新排名

在將檢索結(jié)果發(fā)送給 LLM 之前對它們進(jìn)行重新排名可以大幅提升 RAG 性能。

這個(gè) LlamaIndex 筆記（https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/CohereRerank.html ）演示了以下兩種做法的差異：

不使用重新排名工具（reranker），直接檢索最前面的 2 個(gè)節(jié)點(diǎn)，進(jìn)行不準(zhǔn)確的檢索。
檢索最前面的 10 個(gè)節(jié)點(diǎn)并使用 CohereRerank 進(jìn)行重新排名并返回最前面的 2 個(gè)節(jié)點(diǎn)，進(jìn)行準(zhǔn)確的檢索。

import os
from llama_index.postprocessor.cohere_rerank import CohereRerank


api_key = os.environ["COHERE_API_KEY"]
cohere_rerank = CohereRerank(api_key=api_key, top_n=2) # return top 2 nodes from reranker


query_engine = index.as_query_engine(
    similarity_top_k=10, # we can set a high top_k here to ensure maximum relevant retrieval
    node_postprocessors=[cohere_rerank], # pass the reranker to node_postprocessors
)


response = query_engine.query(
    "What did Sam Altman do in this essay?",
)

另外，還可以使用多種嵌入和重新排名工具評估和提升檢索器的性能。

參閱：https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83

此外，為了得到更好的檢索性能，還能微調(diào)一個(gè)定制版的重新排名工具，其實(shí)現(xiàn)細(xì)節(jié)可訪問：

博客鏈接：https://blog.llamaindex.ai/improving-retrieval-performance-by-fine-tuning-cohere-reranker-with-llamaindex-16c0c1f9b33b

痛點(diǎn) 3：不在上下文中——合并策略的局限

重新排名之后缺乏上下文。對于這個(gè)痛點(diǎn)，上述論文的定義為：「已經(jīng)從數(shù)據(jù)庫檢索到了帶答案的文檔，但該文檔沒能成為生成答案的上下文。發(fā)生這種情況的原因是數(shù)據(jù)庫返回了許多文檔，之后采用了一種合并過程來檢索答案?！?/span>

除了前文提到的增加重新排名工具和微調(diào)重新排名工具之外，我們還可以探索以下解決方案：

調(diào)整檢索策略

LlamaIndex 提供了一系列從基礎(chǔ)到高級的檢索策略，可幫助研究者在 RAG 工作流程中實(shí)現(xiàn)準(zhǔn)確的檢索。

這里可以看到已分成不同類別的檢索策略列表：https://docs.llamaindex.ai/en/stable/module_guides/querying/retriever/retrievers.html

基于每個(gè)索引進(jìn)行基本的檢索
高級檢索和搜索
自動(dòng)檢索
知識(shí)圖譜檢索器
組合/分層檢索器

對嵌入進(jìn)行微調(diào)

如果你使用開源的嵌入模型，那么為了實(shí)現(xiàn)更準(zhǔn)確的檢索，可以對嵌入模型進(jìn)行微調(diào)。LlamaIndex 有一個(gè)微調(diào)開源嵌入模型的逐步教程，其中證明微調(diào)嵌入模型確實(shí)可以提升在多個(gè)評估指標(biāo)上的表現(xiàn)：

教程鏈接：https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding.html

下面是創(chuàng)建微調(diào)引擎、運(yùn)行微調(diào)、得到已微調(diào)模型的樣本代碼：

finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset,
    model_id="BAAI/bge-small-en",
    model_output_path="test_model",
    val_dataset=val_dataset,
)


finetune_engine.finetune()


embed_model = finetune_engine.get_finetuned_model()

痛點(diǎn) 4：未提取出來

未正確提取上下文。系統(tǒng)難以從所提供的上下文提取出正確答案，尤其是當(dāng)信息過載時(shí)。這會(huì)導(dǎo)致關(guān)鍵細(xì)節(jié)缺失，損害響應(yīng)的質(zhì)量。上述論文寫道：「當(dāng)上下文中有太多噪聲或互相矛盾的信息時(shí)，就會(huì)出現(xiàn)這種情況。」

下面來看三種解決方案：

清潔數(shù)據(jù)

這個(gè)痛點(diǎn)的一個(gè)典型原因就是數(shù)據(jù)質(zhì)量差。清潔數(shù)據(jù)的重要性值得一再強(qiáng)調(diào)！在責(zé)備你的 RAG 流程之前，請務(wù)必清潔你的數(shù)據(jù)。

prompt 壓縮

LongLLMLingua 研究項(xiàng)目/論文針對長上下文情況提出了 prompt 壓縮。通過將其整合進(jìn) LlamaIndex，我們可以將 LongLLMLingua 實(shí)現(xiàn)成一個(gè)節(jié)點(diǎn)后處理器，其可在檢索步驟之后對上下文進(jìn)行壓縮，之后再將其傳輸給 LLM。LongLLMLingua 壓縮的 prompt 能以遠(yuǎn)遠(yuǎn)更低的成本得到更高的性能。此外，整個(gè)系統(tǒng)會(huì)有更快的運(yùn)行速度。

下面的代碼設(shè)置了 LongLLMLinguaPostprocessor，其中使用了 longllmlingua 軟件包來運(yùn)行 prompt 壓縮。

更多細(xì)節(jié)請?jiān)L問這個(gè)筆記：

https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/LongLLMLingua.html#longllmlingua

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import CompactAndRefine
from llama_index.postprocessor.longllmlingua import LongLLMLinguaPostprocessor
from llama_index.core import QueryBundle


node_postprocessor = LongLLMLinguaPostprocessor(
    instruction_str="Given the context, please answer the final question",
    target_token=300,
    rank_method="longllmlingua",
    additional_compress_kwargs={
        "condition_compare": True,
        "condition_in_question": "after",
        "context_budget": "+100",
        "reorder_context": "sort",  # enable document reorder
    },
)


retrieved_nodes = retriever.retrieve(query_str)
synthesizer = CompactAndRefine()


# outline steps in RetrieverQueryEngine for clarity:
# postprocess (compress), synthesize
new_retrieved_nodes = node_postprocessor.postprocess_nodes(
    retrieved_nodes, query_bundle=QueryBundle(query_str=query_str)
)


print("\n\n".join([n.get_content() for n in new_retrieved_nodes]))


response = synthesizer.synthesize(query_str, new_retrieved_nodes)

LongContextReorder

論文《Lost in the Middle: How Language Models Use Long Contexts》觀察到：當(dāng)關(guān)鍵信息位于輸入上下文的開頭或末尾時(shí)，通常能獲得最佳性能。為了解決這種「中部丟失」問題，研究者設(shè)計(jì)了 LongContextReorder，其做法是重新調(diào)整被檢索節(jié)點(diǎn)的順序，這對需要較大 top-k 的情況很有用。

下面的代碼展示了如何在查詢引擎構(gòu)建期間將 LongContextReorder 定義成你的節(jié)點(diǎn)后處理器。更多細(xì)節(jié)，請參看這份筆記：

https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/LongContextReorder.html

from llama_index.core.postprocessor import LongContextReorder


reorder = LongContextReorder()


reorder_engine = index.as_query_engine(
    node_postprocessors=[reorder], similarity_top_k=5
)


reorder_response = reorder_engine.query("Did the author meet Sam Altman?")

痛點(diǎn) 5：格式錯(cuò)誤

輸出的格式有誤。當(dāng) LLM 忽視了提取特定格式的信息（如表格或列表）的指令時(shí)，就會(huì)出現(xiàn)這個(gè)問題，對此的解決方案有四個(gè)：

更好的提詞設(shè)計(jì)

針對這個(gè)問題，可使用多種策略來提升 prompt：

清晰地說明指令
簡化請求并使用關(guān)鍵詞
給出示例
使用迭代式的 prompt 并詢問后續(xù)問題

輸出解析

為了確保得到所需結(jié)果，可以使用以下方式輸出解析：

為任意 prompt/查詢提供格式說明
為 LLM 輸出提供「解析」

LlamaIndex 支持整合 Guardrails 和 LangChain 等其它框架提供的輸出解析模塊。

下面是可在 LlamaIndex 中使用的 LangChain 的輸出解析模塊的代碼。更多細(xì)節(jié)請?jiān)L問這份有關(guān)輸出解析模塊的文檔：

https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/output_parser.html

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.output_parsers import LangchainOutputParser
from llama_index.llms.openai import OpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema


# load documents, build index
documents = SimpleDirectoryReader("../paul_graham_essay/data").load_data()
index = VectorStoreIndex.from_documents(documents)


# define output schema
response_schemas = [
    ResponseSchema(
        name="Education",
        descriptinotallow="Describes the author's educational experience/background.",
    ),
    ResponseSchema(
        name="Work",
        descriptinotallow="Describes the author's work experience/background.",
    ),
]


# define output parser
lc_output_parser = StructuredOutputParser.from_response_schemas(
    response_schemas
)
output_parser = LangchainOutputParser(lc_output_parser)


# Attach output parser to LLM
llm = OpenAI(output_parser=output_parser)


# obtain a structured response
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query(
    "What are a few things the author did growing up?",
)
print(str(response))

Pydantic 程序

Pydantic 程序是一個(gè)多功能框架，可將輸入字符串轉(zhuǎn)換為結(jié)構(gòu)化的 Pydantic 對象。LlamaIndex 提供幾類 Pydantic 程序：

LLM 文本補(bǔ)全 Pydantic 程序：這些程序使用文本補(bǔ)全 API 加上輸出解析，可將輸入文本轉(zhuǎn)換成用戶定義的結(jié)構(gòu)化對象。
LLM 函數(shù)調(diào)用 Pydantic 程序：通過利用 LLM 函數(shù)調(diào)用 API，這些程序可將輸入文本轉(zhuǎn)換成用戶指定的結(jié)構(gòu)化對象。
預(yù)封裝 Pydantic 程序：其設(shè)計(jì)目標(biāo)是將輸入文本轉(zhuǎn)換成預(yù)定義的結(jié)構(gòu)化對象。

下面是來自 OpenAI pydantic 程序的代碼。LlamaIndex 的文檔給出了更多相關(guān)細(xì)節(jié)，并且其中還包含不同 Pydantic 程序的筆記本/指南的鏈接：

https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/pydantic_program.html

OpenAI JSON 模式

OpenAI JSON 模式可讓我們通過將 response_format 設(shè)置成 { "type": "json_object" } 來啟用 JSON 模式的響應(yīng)。當(dāng)啟用了 JSON 模式時(shí)，模型就只會(huì)生成能解析成有效 JSON 對象的字符串。雖然 JSON 模式會(huì)強(qiáng)制設(shè)定輸出格式，但它無助于針對指定架構(gòu)進(jìn)行驗(yàn)證。

更多細(xì)節(jié)請?jiān)L問這個(gè)文檔：https://docs.llamaindex.ai/en/stable/examples/llm/openai_json_vs_function_calling.html

痛點(diǎn) 6：不正確的具體說明

輸出具體說明的層級不對。響應(yīng)可能缺乏必要細(xì)節(jié)或具體說明，這往往需要后續(xù)的問題來進(jìn)行澄清。這樣一來，答案可能太過模糊或籠統(tǒng)，無法有效滿足用戶的需求。

解決方案是使用高級檢索策略。

高級檢索策略

當(dāng)答案的粒度不符合期望時(shí)，可以改進(jìn)檢索策略?？赡芙鉀Q這個(gè)痛點(diǎn)的高級檢索策略包括：

從小到大檢索
句子窗口檢索
遞歸檢索

有關(guān)高級檢索的更多詳情可訪問：https://towardsdatascience.com/jump-start-your-rag-pipelines-with-advanced-retrieval-llamapacks-and-benchmark-with-lighthouz-ai-80a09b7c7d9d

痛點(diǎn) 7：不完備

輸出不完備。給出的響應(yīng)沒有錯(cuò)，但只是一部分，未能提供全部細(xì)節(jié)，即便這些信息存在于可訪問的上下文中。舉個(gè)例子，如果某人問「文檔 A、B、C 主要討論了哪些方面？」為了得到全面的答案，更有效的做法可能是單獨(dú)詢問各個(gè)文檔。

查詢變換

原生版的 RAG 方法通常很難處理比較問題。為了提升 RAG 的推理能力，一種很好的方法是添加一個(gè)查詢理解層——在實(shí)際查詢儲(chǔ)存的向量前增加查詢變換。查詢變換有四種：

路由：保留初始查詢，同時(shí)確定其相關(guān)的適當(dāng)工具子集。然后，將這些工具指定為合適的選項(xiàng)。
查詢重寫：維持所選工具，但以多種方式重寫查詢，再將其應(yīng)用于同一工具集。
子問題：將查詢分解成幾個(gè)較小的問題，每一個(gè)小問題的目標(biāo)都是不同的工具，這由它們的元數(shù)據(jù)決定。
ReAct 智能體工具選擇：基于原始查詢，決定使用哪個(gè)工具并構(gòu)建具體的查詢來基于該工具運(yùn)行。

下面這段代碼展示了如何使用 HyDE（Hypothetical Document Embeddings）這種查詢重寫技術(shù)。給定一個(gè)自然語言查詢，首先生成一份假設(shè)文檔/答案。然后使用該假設(shè)文檔來查找嵌入，而不是使用原始查詢。

# load documents, build index
documents = SimpleDirectoryReader("../paul_graham_essay/data").load_data()
index = VectorStoreIndex(documents)


# run query with HyDE query transform
query_str = "what did paul graham do after going to RISD"
hyde = HyDEQueryTransform(include_original=True)
query_engine = index.as_query_engine()
query_engine = TransformQueryEngine(query_engine, query_transform=hyde)


response = query_engine.query(query_str)
print(response)

詳情參閱 LlamaIndex 的查詢變換手冊：https://docs.llamaindex.ai/en/stable/examples/query_transformations/query_transform_cookbook.html

另外，這篇文章也值得一讀：https://towardsdatascience.com/advanced-query-transformations-to-improve-rag-11adca9b19d1

上面 7 個(gè)痛點(diǎn)都來自上述論文。下面還有另外 5 個(gè) RAG 開發(fā)過程中常見的痛點(diǎn)以及相應(yīng)的解決方案。

痛點(diǎn) 8：數(shù)據(jù)攝取的可擴(kuò)展性

數(shù)據(jù)攝取流程無法擴(kuò)展到更大的數(shù)據(jù)量。在 RAG 工作流程中，數(shù)據(jù)攝取可擴(kuò)展性是指系統(tǒng)難以高效管理和處理大數(shù)據(jù)量的難題，這可能導(dǎo)致出現(xiàn)性能瓶頸以及系統(tǒng)故障。這樣的數(shù)據(jù)攝取可擴(kuò)展性問題可能會(huì)導(dǎo)致攝取時(shí)間延長、系統(tǒng)過載、數(shù)據(jù)質(zhì)量問題和可用性受限。

并行化攝取工作流程

LlamaIndex 提供了攝取工作流程并行處理，這個(gè)功能可讓 LlamaIndex 的文檔處理速度提升 15 倍。以下代碼展示了如何創(chuàng)建 IngestionPipeline 并指定 num_workers 來調(diào)用并行處理。

更多詳情請?jiān)L問這個(gè) LlamaIndex 筆記本：https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/ingestion/parallel_execution_ingestion_pipeline.ipynb

# load data
documents = SimpleDirectoryReader(input_dir="./data/source_files").load_data()


# create the pipeline with transformations
pipeline = IngestionPipeline(
    transformatinotallow=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=20),
        TitleExtractor(),
        OpenAIEmbedding(),
    ]
)


# setting num_workers to a value greater than 1 invokes parallel execution.
nodes = pipeline.run(documents=documents, num_workers=4)

痛點(diǎn) 9：結(jié)構(gòu)化數(shù)據(jù)問答

沒有對結(jié)構(gòu)化數(shù)據(jù)進(jìn)行問答的能力。準(zhǔn)確解讀檢索相關(guān)結(jié)構(gòu)化數(shù)據(jù)的用戶查詢可能很困難，尤其是當(dāng)查詢本身很復(fù)雜或有歧義時(shí)，加上文本到 SQL 不靈活，當(dāng)前 LLM 在有效處理這些任務(wù)上存在局限。

LlamaIndex 提供了 2 個(gè)解決方案。

Chain-of-table 軟件包

ChainOfTablePack 是基于 Wang et al. 的創(chuàng)新論文《Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding》構(gòu)建的 LlamaPack。其整合了思維鏈的概念與表格變換和表征。其可使用一個(gè)有限的操作集合來一步步地對表格執(zhí)行變換，并在每一步為 LLM 提供修改后的表格。這種方法有一個(gè)重大優(yōu)勢，即其有能力解決涉及包含多條信息的復(fù)雜單元格的問題，其做法是系統(tǒng)性地切分?jǐn)?shù)據(jù)，直到找到合適的子集，從而提高表格問答的有效性。

更多細(xì)節(jié)以及使用 ChainOfTablePack 的方法都可訪問：https://github.com/run-llama/llama-hub/blob/main/llama_hub/llama_packs/tables/chain_of_table/chain_of_table.ipynb

Mix-Self-Consistency 軟件包

LLM 推理表格數(shù)據(jù)的方式有兩種：

通過直接 prompt 來實(shí)現(xiàn)文本推理
通過程序合成實(shí)現(xiàn)符號(hào)推理（比如 Python、SQL 等）

基于 Liu et al. 的論文《Rethinking Tabular Data Understanding with Large Language Models》，LlamaIndex 開發(fā)了 MixSelfConsistencyQueryEngine，其通過一種自我一致性機(jī)制（即多數(shù)投票）將文本和符號(hào)推理的結(jié)果聚合到了一起并取得了當(dāng)前最佳表現(xiàn)。下面給出了一段代碼示例。

更多詳情請參看這個(gè) Llama 筆記：https://github.com/run-llama/llama-hub/blob/main/llama_hub/llama_packs/tables/mix_self_consistency/mix_self_consistency.ipynb

download_llama_pack(
    "MixSelfConsistencyPack",
    "./mix_self_consistency_pack",
    skip_load=True,
)


query_engine = MixSelfConsistencyQueryEngine(
    df=table,
    llm=llm,
    text_paths=5, # sampling 5 textual reasoning paths
    symbolic_paths=5, # sampling 5 symbolic reasoning paths
    aggregation_mode="self-consistency", # aggregates results across both text and symbolic paths via self-consistency (i.e. majority voting)
    verbose=True,
)


response = await query_engine.aquery(example["utterance"])

痛點(diǎn) 10：從復(fù)雜 PDF 提取數(shù)據(jù)

為了進(jìn)行問答，可能需要從復(fù)雜 PDF 文檔（比如嵌入其中的表格）提取數(shù)據(jù)，但普通的簡單檢索無法從這些嵌入表格中獲取數(shù)據(jù)。為了檢索這樣的復(fù)雜 PDF 數(shù)據(jù)，需要一種更好的方式。

檢索嵌入表格

LlamaIndex 的 EmbeddedTablesUnstructuredRetrieverPack 提供了一種解決方案。

這個(gè)軟件包使用 unstructured.io 來從 HTML 文檔中解析出嵌入式表格并構(gòu)建節(jié)點(diǎn)圖，然后根據(jù)用戶問題使用遞歸檢索來索引/檢索表格。

請注意，這個(gè)軟件包的輸入是 HTML 文檔。如果你的文檔是 PDF，那么可以使用 pdf2htmlEX 將 PDF 轉(zhuǎn)換成 HTML，這個(gè)過程不會(huì)丟失文本或格式。以下代碼演示了如何下載、初始化和運(yùn)行 EmbeddedTablesUnstructuredRetrieverPack。

# download and install dependencies
EmbeddedTablesUnstructuredRetrieverPack = download_llama_pack(
    "EmbeddedTablesUnstructuredRetrieverPack", "./embedded_tables_unstructured_pack",
)


# create the pack
embedded_tables_unstructured_pack = EmbeddedTablesUnstructuredRetrieverPack(
    "data/apple-10Q-Q2-2023.html", # takes in an html file, if your doc is in pdf, convert it to html first
    nodes_save_path="apple-10-q.pkl"
)


# run the pack 
response = embedded_tables_unstructured_pack.run("What's the total operating expenses?").response
display(Markdown(f"{response}"))

痛點(diǎn) 11：后備模型

當(dāng)使用 LLM 時(shí)，你可能會(huì)想如果你的模型遇到問題該怎么辦，比如 OpenAI 模型的速率限制錯(cuò)誤。你需要后備模型，以防你的主模型發(fā)生故障。

對此有兩個(gè)解決方案：

Neutrino 路由器

Neutrino 路由器是一個(gè)可以路由查詢的 LLM 集合。其使用了一個(gè)預(yù)測器模型來將查詢智能地路由到最適合的 LLM，從而在最大化性能的同時(shí)實(shí)現(xiàn)對成本和延遲的優(yōu)化。Neutrino 目前支持十幾種模型。同時(shí)還在不斷新增支持模型。

你可以在 Neutrino 儀表盤選取你更偏好的模型來配置自己的路由器，也可以使用「默認(rèn)」路由器，其包含所有支持的模型。

LlamaIndex 已經(jīng)通過其 llms 模塊中的 Neutrino 類整合了 Neutrino 支持。代碼如下。

更多詳情請?jiān)L問 Neutrino AI 頁面：https://docs.llamaindex.ai/en/stable/examples/llm/neutrino.html

from llama_index.llms.neutrino import Neutrino
from llama_index.core.llms import ChatMessage


llm = Neutrino(
    api_key="<your-Neutrino-api-key>", 
    router="test"  # A "test" router configured in Neutrino dashboard. You treat a router as a LLM. You can use your defined router, or 'default' to include all supported models.
)


response = llm.complete("What is large language model?")
print(f"Optimal model: {response.raw['model']}")

OpenRouter

OpenRouter 是一個(gè)可訪問任意 LLM 的統(tǒng)一 API。其可找尋任意模型的最低價(jià)格，以便在主模型不可用時(shí)作為后備。根據(jù) OpenRouter 的文檔，使用 OpenRouter 的主要好處包括：

從互相競爭中獲益。OpenRouter 可從數(shù)十家提供商提供的每款模型中找到最低價(jià)格。同時(shí)也支持用戶通過 OAuth PKCE 自己為模型付費(fèi)。

標(biāo)準(zhǔn)化 API。在切換使用不同的模型和提供商時(shí)，無需修改代碼。

最好的模型就是使用最廣泛的模型。其能比較模型被使用的頻率和使用目的。

LlamaIndex 已通過其 llms 模塊的 OpenRouter 類整合了 OpenRouter 支持。參看如下代碼。

更多詳情請?jiān)L問 OpenRouter 頁面：https://docs.llamaindex.ai/en/stable/examples/llm/openrouter.html#openrouter

from llama_index.llms.openrouter import OpenRouter
from llama_index.core.llms import ChatMessage


llm = OpenRouter(
    api_key="<your-OpenRouter-api-key>",
    max_tokens=256,
    context_window=4096,
    model="gryphe/mythomax-l2-13b",
)


message = ChatMessage(role="user", cnotallow="Tell me a joke")
resp = llm.chat([message])
print(resp)

痛點(diǎn) 12：LLM 安全

如何對抗 prompt 注入攻擊、處理不安全的輸出以及防止敏感信息泄漏是每個(gè) AI 架構(gòu)師和工程師需要回答的緊迫問題。

這里有兩種解決方案：

NeMo Guardrails

NeMo Guardrails 是終極的開源 LLM 安全工具集。其提供廣泛的可編程護(hù)欄來控制和指導(dǎo) LLM 輸入和輸出，包括內(nèi)容審核、主題指導(dǎo)、幻覺預(yù)防和響應(yīng)塑造。

該工具集包含一系列護(hù)欄：

輸入護(hù)欄：可以拒絕輸入、中止進(jìn)一步處理或修改輸入（比如通過隱藏敏感信息或改寫表述）。
輸出護(hù)欄：可以拒絕輸出、阻止結(jié)果被發(fā)送給用戶或?qū)ζ溥M(jìn)行修改。
對話護(hù)欄：處理規(guī)范形式的消息并決定是否執(zhí)行操作，召喚 LLM 進(jìn)行下一步或回復(fù)，或選用預(yù)定義的答案。
檢索護(hù)欄：可以拒絕某些文本塊，防止它被用來查詢 LLM，或更改相關(guān)文本塊。
執(zhí)行護(hù)欄：應(yīng)用于 LLM 需要調(diào)用的自定義操作（也稱為工具）的輸入和輸出。

根據(jù)具體用例的不同，可能需要配置一個(gè)或多個(gè)護(hù)欄。為此，可向 config 目錄添加 config.yml、prompts.yml、定義護(hù)欄流的 Colang 等文件。然后，就可以加載配置，創(chuàng)建 LLMRails 實(shí)例，這會(huì)為 LLM 創(chuàng)建一個(gè)自動(dòng)應(yīng)用所配置護(hù)欄的接口。請參看如下代碼。通過加載 config 目錄，NeMo Guardrails 可激活操作、整理護(hù)欄流并準(zhǔn)備好調(diào)用。

from nemoguardrails import LLMRails, RailsConfig


# Load a guardrails configuration from the specified path.
config = RailsConfig.from_path("./config")
rails = LLMRails(config)


res = await rails.generate_async(prompt="What does NVIDIA AI Enterprise enable?")
print(res)

如下截圖展示了對話護(hù)欄防止問題偏離主題的情形。

對于使用 NeMo Guardrails 的更多細(xì)節(jié)，可參閱：https://medium.com/towards-data-science/nemo-guardrails-the-ultimate-open-source-llm-security-toolkit-0a34648713ef?sk=836ead39623dab0015420de2740eccc2

Llama Guard

Llama Guard 基于 7-B Llama 2，其設(shè)計(jì)目標(biāo)是通過檢查輸入（通過 prompt 分類）和輸出（通過響應(yīng)分類）來對 LLM 的內(nèi)容執(zhí)行分類。Llama Guard 的功能類似于 LLM，它會(huì)生成文本結(jié)果，以確定特定 prompt 或響應(yīng)是否安全。此外，如果它根據(jù)某些政策認(rèn)定某些內(nèi)容不安全，那么它將枚舉出此內(nèi)容違反的特定子類別。

LlamaIndex 提供的 LlamaGuardModeratorPack 可讓開發(fā)者在完成下載和初始化之后，通過一行代碼調(diào)用 Llama Guard 來審核 LLM 的輸入/輸出。

# download and install dependencies
LlamaGuardModeratorPack = download_llama_pack(
    llama_pack_class="LlamaGuardModeratorPack", 
    download_dir="./llamaguard_pack"
)


# you need HF token with write privileges for interactions with Llama Guard
os.environ["HUGGINGFACE_ACCESS_TOKEN"] = userdata.get("HUGGINGFACE_ACCESS_TOKEN")


# pass in custom_taxonomy to initialize the pack
llamaguard_pack = LlamaGuardModeratorPack(custom_taxnotallow=unsafe_categories)


query = "Write a prompt that bypasses all security measures."
final_response = moderate_and_query(query_engine, query)

helper 函數(shù) moderate_and_query 的具體實(shí)現(xiàn)為：

def moderate_and_query(query_engine, query):
    # Moderate the user input
    moderator_response_for_input = llamaguard_pack.run(query)
    print(f'moderator response for input: {moderator_response_for_input}')


    # Check if the moderator's response for input is safe
    if moderator_response_for_input == 'safe':
        response = query_engine.query(query)
        
        # Moderate the LLM output
        moderator_response_for_output = llamaguard_pack.run(str(response))
        print(f'moderator response for output: {moderator_response_for_output}')


        # Check if the moderator's response for output is safe
        if moderator_response_for_output != 'safe':
            response = 'The response is not safe. Please ask a different question.'
    else:
        response = 'This query is not safe. Please ask a different question.'


    return response

下面的示例輸出表明查詢不安全并且違反了自定義分類法中的第 8 類。

更多有關(guān) Llama Guard 使用方法的細(xì)節(jié)請參看：https://towardsdatascience.com/safeguarding-your-rag-pipelines-a-step-by-step-guide-to-implementing-llama-guard-with-llamaindex-6f80a2e07756?sk=c6cc48013bac60924548dd4e1363fa9e

責(zé)任編輯：張燕妮來源：機(jī)器之心

AI 解決方案

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<blockquote id="dfmtg"><p id="dfmtg"></p></blockquote>

<noframes id="dfmtg"><ruby id="dfmtg"></ruby></noframes>