自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<style id="ahicp"></style>

<sub id="ahicp"><i id="ahicp"></i></sub>

<bdo id="ahicp"></bdo>

<em id="ahicp"><rt id="ahicp"></rt></em>

^{<thead id="ahicp"></thead>}

<s id="ahicp"><li id="ahicp"></li></s>

<sup id="ahicp"><rt id="ahicp"></rt></sup>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

AI.x社區(qū)

登錄/注冊(cè)
51CTO

中國(guó)優(yōu)質(zhì)的IT技術(shù)網(wǎng)站

51CTO博客

專業(yè)IT技術(shù)創(chuàng)作平臺(tái)

51CTO學(xué)堂

IT職業(yè)在線教育平臺(tái)

從“無(wú)法找到答案”到“一問(wèn)一個(gè)準(zhǔn)”! Contextual Embedding讓chunk自帶上下文，精準(zhǔn)召回，效果立竿見(jiàn)影！原創(chuàng)

發(fā)布于 2025-3-25 10:23

瀏覽

1收藏

背景

最近，公司的一個(gè)項(xiàng)目經(jīng)理找我聊了個(gè)頭疼的問(wèn)題：他們給外部交付的項(xiàng)目POC效果不太理想，他發(fā)現(xiàn)從向量庫(kù)中檢索不到想要的信息。起初，我建議他換個(gè)更好的embedding模型，別再用??text-embedding-ada-002???了。結(jié)果他反饋說(shuō)，試了??text-embedding-3-large???和??bge-m3??，效果也沒(méi)啥顯著提升。

我仔細(xì)看了他們的數(shù)據(jù)，發(fā)現(xiàn)他們上傳了大量用戶的文檔，并對(duì)文檔進(jìn)行了切分，分成一個(gè)個(gè)??chunk???，然后召回這些??chunk???送給LLM生成回答。問(wèn)題就在于他們切分chunk的方式用的是RecursiveCharacterTextSplitter，單獨(dú)看一個(gè)切分后的??chunk???，根本不知道它在講什么。比如，有個(gè)??chunk???提到了??opening hours???，但因?yàn)檫f歸切分的原因，缺少了主體信息。結(jié)果，即使召回了這個(gè)??chunk??，LLM也會(huì)回復(fù)“從提供的上下文中無(wú)法找到答案”。

我給了他一個(gè)建議：可以試試??contextual-embedding???。引入這個(gè)方案不需要太多開(kāi)發(fā)成本，而且配合??prompt cache??，還能有效減少LLM調(diào)用的開(kāi)銷。

什么是contextual embedding

在傳統(tǒng)的RAG中，文檔通常被分成更小的塊以進(jìn)行有效的檢索。雖然這種方法對(duì)于許多應(yīng)用程序都很有效，但當(dāng)單個(gè)塊缺乏足夠的上下文時(shí)，它可能會(huì)導(dǎo)致問(wèn)題。Contextual Embedding 通過(guò)使用LLM給每段chunk補(bǔ)充上下文信息，用戶更精準(zhǔn)召回和更高質(zhì)量的回答。

舉個(gè)簡(jiǎn)單的例子，比如有段chunk的內(nèi)容如下：

The company's revenue grew by 3% over the previous quarter.

當(dāng)我們提問(wèn)："What was the revenue growth for ACME Corp in Q2 2023?"，雖然這段chunk是真實(shí)答案，但是卻檢索不到。這是因?yàn)樵糲hunk，是兩個(gè)"The"對(duì)象，導(dǎo)致不管使用embedding還是BM25都抓不出來(lái)它。但是如果我們通過(guò)某種手段，給轉(zhuǎn)換成下面這種contextualized_chunk，把上下文信息給注入到chunk里:

This chunk is from an SEC filing on ACME corp's performance in Q2 2023; the previous quarter's revenue was $314 million. The company's revenue grew by 3% over the previous quarter.

那么還是剛才的問(wèn)題，就一問(wèn)一個(gè)準(zhǔn)了。

原理

從“無(wú)法找到答案”到“一問(wèn)一個(gè)準(zhǔn)”! Contextual Embedding讓chunk自帶上下文，精準(zhǔn)召回，效果立竿見(jiàn)影！-AI.x社區(qū)

我們通過(guò)一個(gè)特定的提示生成每個(gè)分塊的簡(jiǎn)潔上下文，生成的上下文通常為 50-100 個(gè) token，然后索引之前將其添加到分塊中。對(duì)應(yīng)的prompt示例:

system prompt

Here is the whole document: 
<document> 
{{WHOLE_DOCUMENT}} 
</document>

user prompt

Here is the chunk we want to situate within the whole document:
<chunk> 
{{CHUNK_CONTENT}} 
</chunk> 

Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.

實(shí)戰(zhàn)案例

這里我們還是以sentosa的一個(gè)網(wǎng)頁(yè)為例：https://www.sentosa.com.sg/en/places-to-stay/amara-sanctuary-sentosa/。

從這個(gè)網(wǎng)頁(yè)中，我們切分出了一段??chunk??，內(nèi)容如下：

description: Bed: King 
Room Size: 37 sqm 
Maximum Occupants: 2 Adults or 2 Adults and 1 child age 11 and below 
Room Essentials  
Flat-screen TV with cable channel access  
Individually controlled air-conditioning  
Spacious bathroom with separate bathtub and shower facilities  
Luxury bathroom amenities  
Bathrobes and hair dryer  
Electronic safe  
Tea and coffee making facilities  
Iron and ironing board  
Baby cot is available on request (subject to availability)

name: Couple Suite

name: Courtyard Suite

name: Junior Suite

name: Verandah Suite

Opening Hours: 
Check in: from 3pm  
Check out: until 12pm

單獨(dú)看這段??chunk???，我們只能看出它在描述一些房間信息，但具體是哪些房間的信息，卻并不清楚。于是，我們使用??gpt-4o-mini???為這段??chunk??生成了上下文，結(jié)果如下：

This chunk provides detailed information about the room types and amenities available at Amara Sanctuary Sentosa, including the Deluxe Room specifications, other suite options, opening hours, accessibility features, and pet-friendly services, enhancing the overall description of the resort's accommodations.

接下來(lái)，我們將原始的??chunk???和生成的上下文結(jié)合起來(lái)(使用\n\n 連接chunk)，形成一個(gè)新的??chunk??：

description: Bed: King 
Room Size: 37 sqm 
Maximum Occupants: 2 Adults or 2 Adults and 1 child age 11 and below 
Room Essentials  
Flat-screen TV with cable channel access  
Individually controlled air-conditioning  
Spacious bathroom with separate bathtub and shower facilities  
Luxury bathroom amenities  
Bathrobes and hair dryer  
Electronic safe  
Tea and coffee making facilities  
Iron and ironing board  
Baby cot is available on request (subject to availability)

name: Couple Suite

name: Courtyard Suite

name: Junior Suite

name: Verandah Suite




Opening Hours: 
Check in: from 3pm  
Check out: until 12pm


This chunk provides detailed information about the room types and amenities available at Amara Sanctuary Sentosa, including the Deluxe Room specifications, other suite options, opening hours, accessibility features, and pet-friendly services, enhancing the overall description of the resort's accommodations.

這樣一來(lái)，當(dāng)我們?cè)僭儐?wèn)關(guān)于“Amara Sanctuary Sentosa的Deluxe Room”相關(guān)問(wèn)題時(shí)，LLM就能準(zhǔn)確回答上來(lái)了。這種方法不僅提升了信息的連貫性，還大大減少了LLM的誤判率。

prompt cache

對(duì)于OpenAI模型，當(dāng)你的提示（prompt）長(zhǎng)度超過(guò)1,024個(gè)token時(shí)，API調(diào)用將自動(dòng)受益于Prompt Caching功能。(deepseek也支持prompt cache)如果你重復(fù)使用具有相同前綴的提示，系統(tǒng)會(huì)自動(dòng)應(yīng)用Prompt Caching折扣，而你無(wú)需對(duì)API集成做任何修改。緩存通常在5-10分鐘的不活動(dòng)后被清除，并且無(wú)論如何都會(huì)在最后一次使用后的一小時(shí)內(nèi)被移除。

當(dāng)我們對(duì)某個(gè)文檔進(jìn)行切分，生成多個(gè)??chunk???時(shí)，通常需要為每個(gè)??chunk??生成上下文信息。如果每次調(diào)用都傳入全部文檔信息，會(huì)導(dǎo)致重復(fù)計(jì)算，增加LLM的調(diào)用成本。這時(shí)，我們可以將全部文檔信息放在system prompt中，利用Prompt Cache來(lái)節(jié)省費(fèi)用。

以下是我調(diào)用LLM的Response中的??usage??字段，展示了Prompt Cache的實(shí)際效果：

CompletionUsage(
    completion_tokens=24, 
    prompt_tokens=1584, 
    total_tokens=1608, 
    completion_tokens_details=CompletionTokensDetails(
        accepted_prediction_tokens=0, 
        audio_tokens=0, 
        reasoning_tokens=0, 
        rejected_prediction_tokens=0
    ), 
    prompt_tokens_details=PromptTokensDetails(
        audio_tokens=0, 
        cached_tokens=1536  # 這里顯示有1,536個(gè)token被緩存
    )
)

從上面的數(shù)據(jù)可以看出：

prompt_tokens: 1,584個(gè)token被用于提示。
cached_tokens: 1,536個(gè)token被緩存，這意味著這部分token的計(jì)算成本被節(jié)省了下來(lái)。
completion_tokens: 24個(gè)token用于生成回答。

通過(guò)將文檔信息放在system prompt中，我們成功利用Prompt Cache減少了重復(fù)計(jì)算，顯著降低了LLM的調(diào)用成本。

總結(jié)

傳統(tǒng)的文檔切分方法（如RecursiveCharacterTextSplitter）可能會(huì)導(dǎo)致chunk缺乏足夠的上下文信息，從而影響檢索效果。通過(guò)引入Contextual Embedding，我們能夠?yàn)槊總€(gè)chunk補(bǔ)充上下文信息，顯著提升檢索的精準(zhǔn)度和回答的質(zhì)量。

總的來(lái)說(shuō)，Contextual Embedding和Prompt Cache的結(jié)合，為RAG系統(tǒng)提供了一種低成本、高效率的優(yōu)化方案。尤其是在項(xiàng)目時(shí)間緊張、資源有限的情況下，這種方案能夠快速提升系統(tǒng)的表現(xiàn)。

本文轉(zhuǎn)載自公眾號(hào)AI 博物院作者：longyunfeigu

原文鏈接：??https://mp.weixin.qq.com/s/I8muNOkLenngFn9I9U2ZQg??

?著作權(quán)歸作者所有，如需轉(zhuǎn)載，請(qǐng)注明出處，否則將追究法律責(zé)任

標(biāo)簽

贊

收藏 1

回復(fù)

舉報(bào)

社區(qū)頭條

回復(fù)

相關(guān)推薦

「有效上下文」提升20倍！DeepMind發(fā)布ReadAgent框架

duhorse ? 2789瀏覽 ? 0回復(fù)
LLM超長(zhǎng)上下文查詢-性能評(píng)估實(shí)戰(zhàn)

ermulong ? 2709瀏覽 ? 0回復(fù)
直接擴(kuò)展到無(wú)限長(zhǎng)，谷歌Infini-Transformer終結(jié)上下文長(zhǎng)度之爭(zhēng)

輕薄滴假象 ? 2260瀏覽 ? 0回復(fù)
百萬(wàn)上下文RAG，Agent還能這么玩

ermulong ? 3204瀏覽 ? 0回復(fù)
LLM超長(zhǎng)上下文查詢-性能評(píng)估實(shí)戰(zhàn)

ermulong ? 2528瀏覽 ? 0回復(fù)
長(zhǎng)上下文 還是 RAG？ Google:我全都要！

探索AGI ? 2127瀏覽 ? 0回復(fù)
在長(zhǎng)上下文LLM的時(shí)代，RAG是否仍然必要？

sbf_2000 ? 2264瀏覽 ? 0回復(fù)
智能決策進(jìn)化之路：從長(zhǎng)上下文LLM到自主RAG系統(tǒng)

Halo咯咯 ? 2982瀏覽 ? 0回復(fù)
引入上下文檢索(Contextual Retrieval)：提升AI模型的精準(zhǔn)度與效率

Halo咯咯 ? 2118瀏覽 ? 0回復(fù)
HiQA：一種用于多文檔問(wèn)答的層次化上下文增強(qiáng)RAG

大模型自然語(yǔ)言處理 ? 1882瀏覽 ? 0回復(fù)
長(zhǎng)上下文語(yǔ)言模型評(píng)估體系探析

Baihai_IDP ? 2366瀏覽 ? 0回復(fù)
Claude的MCP（模型上下文協(xié)議）簡(jiǎn)介

Halo咯咯 ? 4498瀏覽 ? 0回復(fù)
Reyes：一個(gè)從0到1開(kāi)始訓(xùn)練的多模態(tài)大模型（技術(shù)報(bào)告）

大模型自然語(yǔ)言處理 ? 1841瀏覽 ? 0回復(fù)
為什么大語(yǔ)言模型難以處理長(zhǎng)上下文？從 Transformer 到 Mamba

Baihai_IDP ? 2569瀏覽 ? 0回復(fù)
AI 編程必備：用 Cline 的四個(gè)命令實(shí)現(xiàn)無(wú)縫上下文管理

凝固的雨_1 ? 5588瀏覽 ? 0回復(fù)
谷歌提出Titans：突破算力限制，擴(kuò)展上下文

Aceryt ? 1589瀏覽 ? 0回復(fù)
基于多模態(tài)大語(yǔ)言模型的上下文目標(biāo)檢測(cè)

AIRoobt ? 1841瀏覽 ? 0回復(fù)
微軟LongRoPE v2：幾乎無(wú)損的上下文擴(kuò)展！

NLP前沿1 ? 1486瀏覽 ? 0回復(fù)
圖解「模型上下文協(xié)議（MCP）」：從與傳統(tǒng) API 的比較入手

Baihai_IDP ? 1344瀏覽 ? 0回復(fù)

這個(gè)用戶很懶，還沒(méi)有個(gè)人簡(jiǎn)介

帖子

聲望

粉絲

關(guān)注

最近發(fā)布

阿里Qwen3一夜封神！開(kāi)源模型跑出3倍推理速度，OpenAI沉默 1天前發(fā)布
讓AI讀懂PPT圖表！RAG系統(tǒng)從60分到95分的進(jìn)化之路，LlamaParse+多模態(tài)實(shí)戰(zhàn)全解析 2天前發(fā)布

熱門推薦

Dify從入門到高階系列二：手把手教學(xué)！超詳細(xì)的Dify知識(shí)庫(kù)配置全攻略 0回復(fù)

比DeepSeek快8倍！智譜AI開(kāi)源6款模型，推理速度200 tokens/秒碾壓競(jìng)品，價(jià)格僅1/30！ 0回復(fù)

大模型部署框架Ollama和vLLM怎么選？一文講透兩大框架的優(yōu)缺點(diǎn)和適用場(chǎng)景 0回復(fù)

Dify從入門到高階系列一：詳解各種工作流節(jié)點(diǎn)，如何降低LLM開(kāi)發(fā)門檻？ 0回復(fù)

大半精銳盡出！o1下線！滿血o3之后，模型本身就是Manus，最大賣點(diǎn)：替代人干真活！ 1回復(fù)

上一篇： OpenAI凌晨發(fā)布三款語(yǔ)音模型，語(yǔ)音AI Agent時(shí)代即將到來(lái)？

下一篇：告別無(wú)效搜索！手把手教你用AI工具精準(zhǔn)獲取信息

社區(qū)精華內(nèi)容

目錄

<sub id="sgzz1"></sub>

<sub id="sgzz1"><p id="sgzz1"></p></sub>