自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

使用視覺語言模型（VLMs）進(jìn)行目標(biāo)檢測

作者：二旺 2024-11-27 16:06:12

有數(shù)百種模型和潛在應(yīng)用場景，目標(biāo)檢測在這些場景中非常有用，尤其是隨著小型語言模型的興起，所以今天我們將嘗試使用MLX上的Qwen2-VL-7B-Instruct-8bit。

在過去，你必須自己訓(xùn)練模型，收集訓(xùn)練數(shù)據(jù)，但現(xiàn)在許多基礎(chǔ)模型允許你在它們的基礎(chǔ)上進(jìn)行微調(diào)，以獲得一個能夠檢測目標(biāo)并與用戶用自然語言互動的系統(tǒng)。有數(shù)百種模型和潛在應(yīng)用場景，目標(biāo)檢測在這些場景中非常有用，尤其是隨著小型語言模型的興起，所以今天我們將嘗試使用MLX上的Qwen2-VL-7B-Instruct-8bit。

我們將使用MLX-VLM，這是由Prince Canuma（Blaizzy）創(chuàng)建的一個包，他是一位熱衷于開發(fā)和移植大型語言模型以兼容MLX的熱情開發(fā)者，這個框架為我們用戶抽象了很多代碼，使我們能夠用很少的代碼行運(yùn)行這些模型。現(xiàn)在讓我們來看下面的代碼片段。你會發(fā)現(xiàn)它非常簡單。首先，你可以從Hugging Face定義模型，框架將下載所有相關(guān)組件。這個過程非常簡單，因?yàn)檫@個庫還提供了多個實(shí)用工具（apply_chat_template），可以將OpenAI的標(biāo)準(zhǔn)提示模板轉(zhuǎn)換為小型VLMs所需的模板。

這里的一個重要注意事項(xiàng)是在編寫代碼時，這個庫中的系統(tǒng)角色出現(xiàn)了一些問題，但未來很可能可以添加。但在本例中，我們在一個用戶消息中傳遞任務(wù)和響應(yīng)格式，基本上我們將要求模型識別所有對象并返回一個坐標(biāo)列表，其中第一個頂部將是邊界框的最小x/y坐標(biāo)，后者將是最大坐標(biāo)。同時，我們包括了對象名稱，并要求模型以JSON對象的形式返回：

from mlx_vlm import load, apply_chat_template, generate
from mlx_vlm.utils import load_image


model, processor = load("mlx-community/Qwen2-VL-7B-Instruct-8bit")
config = model.config

image_path = "images/test.jpg"
image = load_image(image_path)

messages = [
    {
        "role": "user",
        "content": """detect all the objects in the image, return bounding boxes for all of them using the following format: [{
        "object": "object_name",
        "bboxes": [[xmin, ymin, xmax, ymax], [xmin, ymin, xmax, ymax], ...]
     }, ...]""",
    }
]
prompt = apply_chat_template(processor, config, messages)

output = generate(model, processor, image, prompt, max_tokens=1000, temperature=0.7)
print(output)

運(yùn)行前面的代碼后，你將收到一個JSON響應(yīng)，正確識別了兩輛卡車：

[{
    "object": "dump truck",
    "bboxes": [
        [100, 250, 380, 510]
    ]
}, {
    "object": "dump truck",
    "bboxes": [
        [550, 250, 830, 490]
    ]
}]

鑒于我們有了對象名稱和邊界框坐標(biāo)，我們可以編寫一個函數(shù)將這些結(jié)果繪制在圖像上。代碼如下：

import json
import re
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont

def draw_and_plot_boxes_from_json(json_data, image_path):
    """
    Parses the JSON data to extract bounding box coordinates,
    scales them according to the image size, draws the boxes on the image,
    and plots the image.

    Args:
        json_data (str or list): The JSON data as a string or already parsed list.
        image_path (str): The path to the image file on which boxes are to be drawn.
    """
    # If json_data is a string, parse it into a Python object
    if isinstance(json_data, str):
        # Strip leading/trailing whitespaces
        json_data = json_data.strip()
        # Remove code fences if present
        json_data = re.sub(r"^```json\s*", "", json_data)
        json_data = re.sub(r"```$", "", json_data)
        json_data = json_data.strip()
        try:
            data = json.loads(json_data)
        except json.JSONDecodeError as e:
            print("Failed to parse JSON data:", e)
            print("JSON data was:", repr(json_data))
            return
    else:
        data = json_data

    # Open the image
    try:
        img = Image.open(image_path)
    except FileNotFoundError:
        print(f"Image file not found at {image_path}. Please check the path.")
        return

    draw = ImageDraw.Draw(img)
    width, height = img.size
    # Change this part for Windows OS
    # ImageFont.FreeTypeFont(r"C:\Windows\Fonts\CONSOLA.ttf", size=25)
    font = ImageFont.truetype("/System/Library/Fonts/Menlo.ttc", size=25)  # Process and draw boxes
    for item in data:
        object_type = item.get("object", "unknown")
        for bbox in item.get("bboxes", []):
            x1, y1, x2, y2 = bbox
            # Scale down coordinates from a 1000x1000 grid to the actual image size
            x1 = x1 * width / 1000
            y1 = y1 * height / 1000
            x2 = x2 * width / 1000
            y2 = y2 * height / 1000
            # Draw the rectangle on the image
            draw.rectangle([(x1, y1), (x2, y2)], outline="blue", width=5)
            text_position = (x1, y1)
            draw.text(text_position, object_type, fill="red", font=font)

    # Plot the image using matplotlib
    plt.figure(figsize=(8, 8))
    plt.imshow(img)
    plt.axis("off")  # Hide axes ticks
    plt.show()

繪制結(jié)果如下：

總結(jié)

VLMs正在快速發(fā)展。兩年前，還沒有能夠適應(yīng)MacBook并表現(xiàn)如此出色的模型。我個人的猜測是，這些模型將繼續(xù)發(fā)展，最終達(dá)到像YOLO這樣的模型的能力。還有很長的路要走，但正如你在這篇文章中看到的，設(shè)置這個演示非常容易。在邊緣設(shè)備上開發(fā)這種應(yīng)用的潛力是無限的，我相信它們將在采礦、石油和天然氣、基礎(chǔ)設(shè)施和監(jiān)控等行業(yè)產(chǎn)生重大影響。最好的部分是我們甚至還沒有討論微調(diào)、RAG或提示工程，這只是模型能力的展示。

責(zé)任編輯：趙寧寧來源：小白玩轉(zhuǎn)Python

目標(biāo)檢測視覺語言模型 VLMs

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

^{<blockquote id="zsn8g"></blockquote>}^{<blockquote id="zsn8g"></blockquote>}

<sub id="zsn8g"></sub>

<sub id="zsn8g"><p id="zsn8g"></p></sub>

<style id="zsn8g"></style>

<sub id="zsn8g"></sub>

<style id="zsn8g"></style>