自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓

鴻蒙開發(fā)者社區(qū)

WOT技術大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考華為認證廠商認證 IT技術 PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

AI.x社區(qū)

登錄/注冊
51CTO

中國優(yōu)質(zhì)的IT技術網(wǎng)站

51CTO博客

專業(yè)IT技術創(chuàng)作平臺

51CTO學堂

IT職業(yè)在線教育平臺

三種文本相似計算方法：規(guī)則、向量與大模型裁判

AI悠閑區(qū)

發(fā)布于 2025-2-3 13:24

瀏覽

0收藏

文本相似計算

介紹

有一些工作需要評估出兩個字符串之間的相似程度。比如，要評估大模型生成的結(jié)果，與預設定的答案之間的相似程度。本文介紹三類方法用于評估兩個字符串的相似程度：規(guī)則、向量、大模型裁判。

規(guī)則：基于字符 n-gram 的相似計算，常用算法，ROUGE、BLEU;
向量：使用熱門的嵌入模型(Jina)，把字符串編碼為向量，計算兩個向量之間的相似度；
大模型裁判：使用大模型評估兩個字符串之間的相關性；

摘要

介紹了三種方法，評估兩個字符串之間的相似度：基于字符 n-gram 的規(guī)則算法，通過嵌入模型將文本編碼為向量并計算余弦相似度，以及使用大模型直接評判文本相關性。文章詳細探討了這些方法的實現(xiàn)細節(jié)及適用場景，并提供了 Python 示例代碼，幫助讀者理解和應用不同的方法來滿足具體需求。

規(guī)則

Find a metric on the Hub

本篇文章主要關注 Metric 方面的評估

Metric: measures the performance of a model on a given dataset, usually by comparing the model's predictions to some ground truth labels -- these are covered in this space.

裝包，主要依賴 nltk 這個包:

pip install transformers evaluate

眾多的自然語言處理評估方法會發(fā)布在 evaluate 這個包上。

google_bleu 網(wǎng)頁，若想瀏覽更多的例子請點擊查看，https://huggingface.co/spaces/evaluate-metric/google_bleu

從 evaluate 加載工具的時候，需要科學上網(wǎng)，解決方案如下：

梯子開啟全局代理；
嘗試把下述代理，加入到python代碼，7890是clash的端口

import os
  os.environ['HTTP_PROXY'] = 'http://127.0.0.1:7890'
  os.environ['HTTPS_PROXY'] = 'http://127.0.0.1:7890'

import evaluate
google_bleu = evaluate.load("google_bleu")

sentence1 = "the cat sat on the mat"
sentence2 = "the cat ate the mat"
result1 = google_bleu.compute(predictinotallow=[sentence1], references=[[sentence2]])
print(result1)
# result1 {'google_bleu': 0.3333333333333333}


result2 = google_bleu.compute(predictinotallow=[sentence1], references=[[sentence1]])
print(result2)
# result2 {'google_bleu': 1.0}

【注意】：references 是一個嵌套的二維列表。

references 設計為二維列表的原因是，針對同一個問題，可能有多個回答，最終的結(jié)果是返回與多個結(jié)果計算google_bleu的最大值。

predictions = ["The cat is on the mat."]
references = [["The cat is on the mat.", "There is a cat on the mat."]]
print(google_bleu.compute(predictinotallow=predictions, references=references))
>>> {'google_bleu': 1.0}

下述是中文的例子：

google_bleu.compute(
    predictinotallow=["我愛你"], 
    references=[["我愛我的祖國"]]
)
# >>> {'google_bleu': 0.0}

上述我愛你? 和我愛我的祖國如上述所示，google_bleu 不會原生支持漢字，原因在于英文可直接按照空格拆分開，但是漢語之間沒有空格。比如, ["我愛我的祖國"] 可拆分為：

["我愛我的祖國"] ，
["我愛我的祖國"] , 祖國中間沒有空格分開

顯然祖國? 作為一個詞更好，若拆分為祖和國兩個字則會丟失原來的語義信息。

google_bleu.compute(
    predictinotallow=["我 愛 你"], 
    references=[["我 愛 我 的 祖 國"]]
)
# >>> {'google_bleu': 0.16666666666666666}

google_bleu.compute(
    predictinotallow=["我 愛 你"], 
    references=[["我 愛 我 的 祖國"]]
)
# >>> {'google_bleu': 0.21428571428571427}

使用合適的中文分詞技術，可提高 google_bleu 分數(shù)。如上所示，祖國? 變成一個詞后，google_bleu 從0.16 提高到 0.21。如果想嘗試中文分詞技術，可嘗試使用pip install jieba，支持添加新詞到字典中。

向量

使用經(jīng)過訓練的嵌入模型，把文本編碼為向量，再計算兩個向量的余弦相似度。瀏覽 jina-embeddings-v2-base-zh 的介紹， https://modelscope.cn/models/jinaai/jina-embeddings-v2-base-zh

下述是一個簡單的例子：

!pip install modelscope
from modelscope import AutoModel
from numpy.linalg import norm

cos_sim = lambda a,b: (a @ b.T) / (norm(a)*norm(b))
# trust_remote_code is needed to use the encode method
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-zh', trust_remote_code=True) 
embeddings = model.encode(['How is the weather today?', '今天天氣怎么樣?'])
print(cos_sim(embeddings[0], embeddings[1]))

import numpy as np
from numpy.linalg import norm
from modelscope import AutoModel

# 定義余弦相似度計算函數(shù)
cos_sim = lambda a, b: (a @ b.T) / (norm(a) * norm(b))


# 加載模型
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-zh', trust_remote_code=True)

# 輸入字符串和候選字符串
input_string = 'How is the weather today?'
candidates = ['今天天氣怎么樣?', '我今天很高興', '天氣預報說今天會下雨', '你最喜歡的顏色是什么?']

# 計算輸入字符串的嵌入向量
input_embedding = model.encode([input_string])[0]

# 計算候選字符串的嵌入向量
candidate_embeddings = model.encode(candidates)

# 計算相似度并排序
similarities = [cos_sim(input_embedding, candidate_embedding) for candidate_embedding in candidate_embeddings]
sorted_candidates = sorted(zip(candidates, similarities), key=lambda x: x[1], reverse=True)

# 輸出排序結(jié)果
for candidate, similarity in sorted_candidates:
    print(f"({input_string} - {candidate}), Similarity: {similarity:.4f}")

上面代碼展示了，計算 input_string 與 candidates 候選字符串之間的向量余弦相似度分數(shù)，按照從高到低排序：

Downloading Model to directory: C:\Users\user_name\.cache\modelscope\hub\jinaai/jina-embeddings-v2-base-zh
(How is the weather today? - 今天天氣怎么樣?), Similarity: 0.7861
(How is the weather today? - 天氣預報說今天會下雨), Similarity: 0.5470
(How is the weather today? - 我今天很高興), Similarity: 0.4202
(How is the weather today? - 你最喜歡的顏色是什么?), Similarity: 0.1032

大模型裁判

制定一個基于規(guī)則的程序來評估輸出是非常具有挑戰(zhàn)性的。傳統(tǒng)的評估指標，基于輸出和參考答案之間的相似性（例如，ROUGE、BLEU;），對于這些問題也無效。[1] 在復雜場景下，可嘗試使用大模型進行判決。

主要針對復雜的場景，在基于規(guī)則與向量相似度均效果不顯著的情況下，可嘗試使用LLM進行判決。

提示詞參考：

JUDGE_PROMPT = """
You will be given a user_question and system_answer couple.
Your task is to provide a 'total rating' scoring how well the system_answer answers the user concerns expressed in the user_question.
Give your answer as a float on a scale of 0 to 10, where 0 means that the system_answer is not helpful at all, and 10 means that the answer completely and helpfully addresses the question.

Provide your feedback as follows:

Feedback:::
Total rating: (your rating, as a float between 0 and 10)

Now here are the question and answer.

Question: {question}
Answer: {answer}

Feedback:::
Total rating: """

參考資料

使用 LLM 作為評判者?????進行自動化和多方面的評估
https://github.com/huggingface/evaluate

本文轉(zhuǎn)載自??AI悠閑區(qū)??，作者： jieshenai ????

標簽

贊

收藏

回復

舉報

回復

相關推薦

Llama3實操增強的三種方式：RAG/Agent/Function Calling?。。?/a>

玄姐聊AGI ? 5122瀏覽 ? 0回復
你想在本地部署大模型嗎？本地部署大模型的三種工具

AI探索時代 ? 4816瀏覽 ? 0回復
生成式AI智能革命至今：人機協(xié)同的三種模式與未來軟件架構(gòu)范式演進

AIGC觀察者 ? 1.0w瀏覽 ? 0回復
對企業(yè)來說大模型商業(yè)化的三種方式

AI探索時代 ? 2922瀏覽 ? 0回復
提高深度學習模型效率的三種模型壓縮方法

51CTO內(nèi)容精選 ? 2494瀏覽 ? 0回復
提升人工智能性能的三種關鍵的LLM壓縮策略

51CTO內(nèi)容精選 ? 1841瀏覽 ? 0回復
大模型三階段訓練方法(LLaMa Factory)

一起AI技術 ? 1.1w瀏覽 ? 0回復
云計算與大模型訓練的結(jié)合

AI探索時代 ? 1732瀏覽 ? 0回復
大模型的嵌入——Embedding與向量——Ve ctor

AI探索時代 ? 2898瀏覽 ? 0回復
大模型之嵌入與向量化的區(qū)別是什么？

AI探索時代 ? 2016瀏覽 ? 0回復
RAG：七種用于向量數(shù)據(jù)庫+相似性搜索的索引方法

Halo咯咯 ? 2308瀏覽 ? 0回復
2025年大模型與Transformer架構(gòu)：技術前沿與未來趨勢報告

歐米伽未來研究所 ? 6064瀏覽 ? 0回復
三種文本相似計算方法：規(guī)則、向量與大模型裁判

AI悠閑區(qū) ? 2023瀏覽 ? 0回復
向量相似性與圖數(shù)據(jù)庫的強強聯(lián)合

Halo咯咯 ? 2112瀏覽 ? 0回復
DeepSeek的三種接入使用方法

一起AI技術 ? 2668瀏覽 ? 0回復
RAG架構(gòu)大揭秘：三種方式讓AI回答更精準，更懂你！

Halo咯咯 ? 1143瀏覽 ? 0回復
DeepSeek 部署全解析：三種方案對比與云端部署的顯著優(yōu)勢

AI算力補給站 ? 1167瀏覽 ? 0回復
智能體（Agent）的三種表現(xiàn)類型：聊天助手、工作流與對話流

九歌AI大模型 ? 1053瀏覽 ? 0回復
三種主流智能體協(xié)議對比，一文看懂MCP、ANP、A2A的概念、區(qū)別與聯(lián)系

王吉偉自頻道 ? 2193瀏覽 ? 0回復

AI悠閑區(qū)

這個用戶很懶，還沒有個人簡介

帖子

聲望

粉絲

關注

最近發(fā)布

提示詞繞過大模型安全限制 5天前發(fā)布
從零開始微調(diào)Embedding模型：基于BERT的實戰(zhàn)教程 2025-04-14 01:31:07發(fā)布

熱門推薦

大半精銳盡出！o1下線！滿血o3之后，模型本身就是Manus，最大賣點：替代人干真活！ 1回復

王炸！MCP 架構(gòu)設計深度剖析 & 使用 Spring AI + MCP 四步教你實現(xiàn) Agent 智能體開發(fā) 0回復

Dify從入門到高階系列二：手把手教學！超詳細的Dify知識庫配置全攻略 0回復

Crawl4AI：GitHub榜首40K星標！LLM專屬極速開源爬蟲神器 0回復

只需5分鐘，教你用Python搭建MCP Server 0回復

上一篇：基于 LlamaFactory 微調(diào)大模型的實體識別的評估實現(xiàn)

下一篇：三種文本相似計算方法：規(guī)則、向量與大模型裁判

社區(qū)精華內(nèi)容

目錄

^{<blockquote id="bs1eq"></blockquote>}