自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

AI.x社區(qū)

登錄/注冊
51CTO

中國優(yōu)質(zhì)的IT技術(shù)網(wǎng)站

51CTO博客

專業(yè)IT技術(shù)創(chuàng)作平臺

51CTO學(xué)堂

IT職業(yè)在線教育平臺

【多模態(tài)&RAG】多模態(tài)RAG ColPali實(shí)踐原創(chuàng)

大模型自然語言處理

發(fā)布于 2024-11-20 15:17

瀏覽

0收藏

關(guān)于??【RAG&多模態(tài)】多模態(tài)RAG-ColPali：使用視覺語言模型實(shí)現(xiàn)高效的文檔檢索??前面已經(jīng)介紹了（供參考），這次來看看ColPali實(shí)踐。

所需權(quán)重：

多模態(tài)問答模型：Qwen2-VL-72B-Instruct，https://modelscope.cn/models/Qwen/Qwen2-VL-72B-Instruct
基于 PaliGemma-3B 和 ColBERT 策略的視覺檢索器：

ColPali（LoRA）：https://huggingface.co/vidore/colpali
ColPali（基座）：https://huggingface.co/vidore/colpaligemma-3b-mix-448-base

多模態(tài)檢索問答實(shí)踐

lora的adapter_config.json字段base_model_name_or_path修改地址：ColPali（基座）存儲路徑
qwen_vl_utils下載地址：https://github.com/QwenLM/Qwen2-VL/tree/main/qwen-vl-utils/src/qwen_vl_utils
byaldi安裝方式：https://github.com/AnswerDotAI/byaldi
完整代碼

from byaldi import RAGMultiModalModel
from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
from pdf2image import convert_from_path

class DocumentQA:
    def __init__(self, rag_model_name: str, vlm_model_name: str, device: str = 'cuda', system_prompt: str = None):
        self.rag_engine = RAGMultiModalModel.from_pretrained(rag_model_name)
        self.vlm = Qwen2VLForConditionalGeneration.from_pretrained(
            vlm_model_name,
            torch_dtype=torch.bfloat16,
            attn_implementation="flash_attention_2",
            device_map=device
        )
        self.processor = AutoProcessor.from_pretrained(vlm_model_name, trust_remote_code=True)
        self.device = device
        if system_prompt is None:
            self.system_prompt = (
                "你是一位專精于計(jì)算機(jī)科學(xué)和機(jī)器學(xué)習(xí)的AI研究助理。"
                "你的任務(wù)是分析學(xué)術(shù)論文，尤其是關(guān)于文檔檢索和多模態(tài)模型的研究。"
                "請仔細(xì)分析提供的圖像和文本，提供深入的見解和解釋。"
            )
        else:
            self.system_prompt = system_prompt

    def index_document(self, pdf_path: str, index_name: str = 'index', overwrite: bool = True):
        self.pdf_path = pdf_path
        self.rag_engine.index(
            input_path=pdf_path,
            index_name=index_name,
            store_collection_with_index=False,
            overwrite=overwrite
        )
        self.images = convert_from_path(pdf_path)

    def query(self, text_query: str, k: int = 3) -> str:
        results = self.rag_engine.search(text_query, k=k)
        print("搜索結(jié)果:", results)

        if not results:
            print("未找到相關(guān)查詢結(jié)果。")
            return None

        try:
            page_num = results[0]["page_num"]
            image_index = page_num - 1
            image = self.images[image_index]
        except (KeyError, IndexError) as e:
            print("獲取頁面圖像時出錯:", e)
            return None

        messages = [
            {
                "role": "system",
                "content": self.system_prompt
            },
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": text_query},
                ],
            }
        ]

        text = self.processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )

        image_inputs, video_inputs = process_vision_info(messages)

        # 準(zhǔn)備模型輸入
        inputs = self.processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            padding=True,
            return_tensors="pt",
        )
        inputs = inputs.to(self.device)

        generated_ids = self.vlm.generate(**inputs, max_new_tokens=1024)

        generated_ids_trimmed = [
            out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
        ]
        output_text = self.processor.batch_decode(
            generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
        )

        return output_text[0]

if __name__ == "__main__":
    # 初始化 DocumentQA 實(shí)例
    document_qa = DocumentQA(
        rag_model_name="./colpali",
        vlm_model_name="./Qwen2-VL-7B-Instruct",
        device='cuda'
    )

    # 索引 PDF 文檔
    document_qa.index_document("test.pdf")

    # 定義查詢
    text_query = (
        "文中模型在哪個數(shù)據(jù)集上相比其他模型有最大的優(yōu)勢？"
        "該優(yōu)勢的改進(jìn)幅度是多少？"
    )

    # 執(zhí)行查詢并打印答案
    answer = document_qa.query(text_query)
    print("答案:", answer)

本文轉(zhuǎn)載自公眾號大模型自然語言處理作者：余俊暉

原文鏈接：??https://mp.weixin.qq.com/s/27rAyOj9QqyzyRWiWYpQCg??

?著作權(quán)歸作者所有，如需轉(zhuǎn)載，請注明出處，否則將追究法律責(zé)任

標(biāo)簽

多模態(tài)

贊

收藏

回復(fù)

舉報(bào)

回復(fù)

相關(guān)推薦

什么是多模態(tài)大模型？為什么需要多模態(tài)大模型？

AI探索時代 ? 4442瀏覽 ? 0回復(fù)
多模態(tài)與偽多模態(tài)大模型

AI探索時代 ? 2163瀏覽 ? 0回復(fù)
一種支持4種多模態(tài)RAG技術(shù)的引擎：VARAG

PaperAgent ? 3113瀏覽 ? 0回復(fù)
多模態(tài)大模型數(shù)據(jù)分析與實(shí)踐

zhcs333 ? 3752瀏覽 ? 0回復(fù)
多模態(tài)RAG應(yīng)用開發(fā)實(shí)戰(zhàn)演練

51CTO內(nèi)容精選 ? 1867瀏覽 ? 0回復(fù)
多模態(tài)大模型：基礎(chǔ)架構(gòu)

魯班模錘1 ? 2110瀏覽 ? 0回復(fù)
傳統(tǒng)RAG涼涼？多模態(tài)RAG帶來工業(yè)級革命

NLP前沿1 ? 2202瀏覽 ? 0回復(fù)
一文讀懂：從RAG到多模態(tài)RAG

kede96 ? 3727瀏覽 ? 0回復(fù)
多模態(tài)RAG-ColPali：使用視覺語言模型實(shí)現(xiàn)高效的文檔檢索

大模型自然語言處理 ? 2385瀏覽 ? 0回復(fù)
多模態(tài)RAG-VisRAG：基于視覺的檢索增強(qiáng)生成在多模態(tài)文檔上的應(yīng)用

大模型自然語言處理 ? 2424瀏覽 ? 0回復(fù)
再看多模態(tài)RAG進(jìn)行文檔問答的方案

大模型自然語言處理 ? 2079瀏覽 ? 0回復(fù)
一次多模態(tài)大模型表格識別解析探索小實(shí)踐記錄

大模型自然語言處理 ? 2186瀏覽 ? 0回復(fù)
Jina CLIP v2：為多模態(tài)RAG設(shè)計(jì)的向量模型

kede96 ? 2992瀏覽 ? 0回復(fù)
多模態(tài)RAG利器，帶你跑通Qwen2-VL-7B-Instruct大模型

小虎哦哦 ? 2700瀏覽 ? 0回復(fù)
多模態(tài)RAG利器，帶你跑通Qwen2-VL-7B-Instruct大模型

AI科技論談 ? 2915瀏覽 ? 0回復(fù)
多模態(tài)RAG構(gòu)建指南：為AI系統(tǒng)提供更多可能性

51CTO內(nèi)容精選 ? 2020瀏覽 ? 0回復(fù)
910B芯片Swift多模態(tài)模型分布式訓(xùn)練實(shí)踐

zhcs333 ? 3400瀏覽 ? 0回復(fù)
簡單有效的企業(yè)多模態(tài)RAG問答框架-MuRAR

大模型自然語言處理 ? 1767瀏覽 ? 0回復(fù)
多模態(tài)理解和生成：多模態(tài)理解與生成統(tǒng)一獎勵模型；將獎勵模型多模態(tài)情緒識別上

AI研究前瞻 ? 1635瀏覽 ? 0回復(fù)

大模型自然語言處理

這個用戶很懶，還沒有個人簡介

帖子

聲望

粉絲

關(guān)注

最近發(fā)布

Kimi-VL開源多模態(tài)大模型結(jié)構(gòu)、訓(xùn)練方法、訓(xùn)練數(shù)據(jù)淺析 2025-04-16 07:08:19發(fā)布
十大PDF解析工具在不同文檔類別中的比較研究 2025-04-07 06:31:37發(fā)布

熱門推薦

大半精銳盡出！o1下線！滿血o3之后，模型本身就是Manus，最大賣點(diǎn)：替代人干真活！ 1回復(fù)

王炸！MCP 架構(gòu)設(shè)計(jì)深度剖析 & 使用 Spring AI + MCP 四步教你實(shí)現(xiàn) Agent 智能體開發(fā) 0回復(fù)

Dify從入門到高階系列二：手把手教學(xué)！超詳細(xì)的Dify知識庫配置全攻略 0回復(fù)

Crawl4AI：GitHub榜首40K星標(biāo)！LLM專屬極速開源爬蟲神器 0回復(fù)

只需5分鐘，教你用Python搭建MCP Server 0回復(fù)

上一篇：一文詳解MHA、GQA、MQA原理

下一篇：再看多模態(tài)RAG進(jìn)行文檔問答的方案

社區(qū)精華內(nèi)容

目錄

<style id="79t4k"></style><p id="79t4k"></p>

<sup id="79t4k"><rt id="79t4k"></rt></sup>