自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<cite id="nbqer"></cite>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

用 GroundingDINO 與 SAM 做分割

作者：二旺 2024-12-18 16:47:31

開發(fā) 機(jī)器視覺

本文將介紹一種新穎的方法，利用兩個(gè)基于變換器模型的零樣本圖像分割技術(shù)：GroundingDINO負(fù)責(zé)目標(biāo)檢測，而單任務(wù)注意力模型（SAM）負(fù)責(zé)語義分割。

在計(jì)算機(jī)視覺領(lǐng)域，圖像分割是一項(xiàng)核心任務(wù)，廣泛應(yīng)用于目標(biāo)識(shí)別、追蹤和分析等多個(gè)場景。本文將介紹一種新穎的方法，利用兩個(gè)基于變換器模型的零樣本圖像分割技術(shù)：GroundingDINO負(fù)責(zé)目標(biāo)檢測，而單任務(wù)注意力模型（SAM）負(fù)責(zé)語義分割。我們將詳細(xì)解讀代碼，并解釋涉及的關(guān)鍵概念。現(xiàn)在，讓我們先來了解一些重要的術(shù)語！

Grounding DINO與SAM的結(jié)合

1. 變換器模型

這類神經(jīng)網(wǎng)絡(luò)架構(gòu)在自然語言處理領(lǐng)域取得了革命性的進(jìn)展，如翻譯、摘要和文本生成等任務(wù)。它們通過多層處理輸入序列（例如單詞或字符），并通過注意力機(jī)制關(guān)注輸入的不同部分。設(shè)想一個(gè)翻譯者使用變換器模型將英文句子翻譯成其他語言。在翻譯“the quick brown fox”時(shí)，模型可能會(huì)先關(guān)注“the”，然后是“quick”，逐步將信息整合進(jìn)翻譯中。

變換器模型的設(shè)計(jì)使其能夠有效處理長距離依賴問題，并實(shí)現(xiàn)并行計(jì)算，這使得它們在處理序列數(shù)據(jù)時(shí)表現(xiàn)出色。在本文中，我們將應(yīng)用GroundingDINO和SAM這兩個(gè)變換器模型。

2. 目標(biāo)檢測與語義分割

這是計(jì)算機(jī)視覺中的兩個(gè)基礎(chǔ)任務(wù)。目標(biāo)檢測通過邊界框定位圖像中的目標(biāo)對象，而語義分割則為圖像中的每個(gè)像素分配類別標(biāo)簽。目標(biāo)檢測提供了對象的位置信息，語義分割則提供了對象與背景的詳細(xì)分割。

3. 零樣本學(xué)習(xí)

這是一種機(jī)器學(xué)習(xí)技術(shù)，允許模型在未針對特定任務(wù)進(jìn)行訓(xùn)練的情況下執(zhí)行任務(wù)。模型通過利用其他相關(guān)任務(wù)的知識(shí)來執(zhí)行新任務(wù)。在本文中，我們將利用零樣本學(xué)習(xí)技術(shù)，根據(jù)用戶提供的文本標(biāo)簽描述來分割圖像中的對象，即使模型未曾針對這些標(biāo)簽進(jìn)行過訓(xùn)練。

可以通過https://colab.research.google.com/訪問Google Colab編寫代碼：


#app.py

!pip install spaces
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
import torch
from transformers import SamModel, SamProcessor
import spaces
import numpy as np
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

sam_model = SamModel.from_pretrained("facebook/sam-vit-base").to("cuda")
sam_processor = SamProcessor.from_pretrained("facebook/sam-vit-base")

model_id = "IDEA-Research/grounding-dino-base"

dino_processor = AutoProcessor.from_pretrained(model_id)
dino_model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id).to(device)

def infer_dino(img, text_queries, score_threshold):
  queries=""
  for query in text_queries:
    queries += f"{query}. "

  width, height = img.shape[:2]

  target_sizes=[(width, height)]
  inputs = dino_processor(text=queries, images=img, return_tensors="pt").to(device)

  with torch.no_grad():
    outputs = dino_model(**inputs)
    outputs.logits = outputs.logits.cpu()
    outputs.pred_boxes = outputs.pred_boxes.cpu()
    results = dino_processor.post_process_grounded_object_detection(outputs=outputs, input_ids=inputs.input_ids,
                                                                  box_threshold=score_threshold,
                                                                  target_sizes=target_sizes)
  return results


@spaces.GPU
def query_image(img, text_queries, dino_threshold):
  text_queries = text_queries
  text_queries = text_queries.split(",")
  dino_output = infer_dino(img, text_queries, dino_threshold)
  result_labels=[]
  for pred in dino_output:
    boxes = pred["boxes"].cpu()
    scores = pred["scores"].cpu()
    labels = pred["labels"]
    box = [torch.round(pred["boxes"][0], decimals=2), torch.round(pred["boxes"][1], decimals=2),
        torch.round(pred["boxes"][2], decimals=2), torch.round(pred["boxes"][3], decimals=2)]
    for box, score, label in zip(boxes, scores, labels):
      if label != "":
        inputs = sam_processor(
                img,
                input_boxes=[[[box]]],
                return_tensors="pt"
            ).to("cuda")

        with torch.no_grad():
            outputs = sam_model(**inputs)

        mask = sam_processor.image_processor.post_process_masks(
            outputs.pred_masks.cpu(),
            inputs["original_sizes"].cpu(),
            inputs["reshaped_input_sizes"].cpu()
        )[0][0][0].numpy()
        mask = mask[np.newaxis, ...]
        result_labels.append((mask, label))
  return img, result_labels

import gradio as gr

description = "This Space combines [GroundingDINO](https://huggingface.co/IDEA-Research/grounding-dino-base), a bleeding-edge zero-shot object detection model with [SAM](https://huggingface.co/facebook/sam-vit-base), the state-of-the-art mask generation model. SAM normally doesn't accept text input. Combining SAM with OWLv2 makes SAM text promptable. Try the example or input an image and comma separated candidate labels to segment."
demo = gr.Interface(
    query_image,
    inputs=[gr.Image(label="Image Input"), gr.Textbox(label = "Candidate Labels"), gr.Slider(0, 1, value=0.05, label="Confidence Threshold for GroundingDINO")],
    outputs="annotatedimage",
    title="GroundingDINO ?? SAM for Zero-shot Segmentation",
    description=description,
    examples=[
        ["./cats.png", "cat, fishnet", 0.16],["./bee.jpg", "bee, flower", 0.16]
    ],
)
demo.launch(debug=True)

代碼解析：

(1) 代碼首先通過pip安裝必要的包，并導(dǎo)入所需的庫，包括PyTorch、GroundingDINO、SAM和Gradio。

(2) GroundingDINO是一個(gè)基于變換器的目標(biāo)檢測模型。它可以根據(jù)圖像和文本描述輸出與描述相對應(yīng)的對象的邊界框。在本代碼中，我們利用GroundingDINO根據(jù)用戶指定的文本標(biāo)簽來定位圖像中的對象。

(3) 單任務(wù)注意力模型（SAM）是另一個(gè)基于變換器的模型，用于圖像到圖像的翻譯任務(wù)，如語義分割。SAM模型可以根據(jù)圖像和文本描述生成與描述中對象相對應(yīng)的分割掩碼。在本文中，我們將使用SAM根據(jù)GroundingDINO提供的邊界框進(jìn)行對象的語義分割。

(4) 代碼根據(jù)可用性設(shè)置運(yùn)行代碼的設(shè)備（GPU或CPU）。

(5) 加載SAM模型和GroundingDINO模型，并將它們的處理器轉(zhuǎn)移到GPU以加快計(jì)算速度。

(6) infer_dino()函數(shù)接受圖像、文本查詢（候選標(biāo)簽）和置信度閾值作為輸入，并使用GroundingDINO模型處理輸入，識(shí)別具有邊界框的對象檢測。

(7) query_image()函數(shù)用@spaces.GPU裝飾器裝飾，表示它將在GPU上運(yùn)行。這個(gè)函數(shù)接受圖像、文本查詢和置信度閾值作為輸入。

(8) query_image()首先將文本查詢分割成單獨(dú)的標(biāo)簽，并將其傳遞給infer_dino()函數(shù)以獲取對象檢測和邊界框。

(9) 對于每個(gè)對象檢測，它使用SAM模型生成掩碼，即將對象的邊界框傳遞給SAM模型，并為每個(gè)對象生成一個(gè)掩碼。

(10) 最后，函數(shù)返回帶有生成的掩碼和相應(yīng)標(biāo)簽的圖像。

(11) 代碼定義了一個(gè)Gradio演示，接受圖像、候選標(biāo)簽和置信度閾值作為輸入，并返回帶有生成的掩碼和標(biāo)簽的注釋圖像，同時(shí)提供示例輸入以供演示。

(12) 啟動(dòng)Gradio演示，并顯示用戶界面。

運(yùn)行代碼后，我們將獲得Gradio空間鏈接：

結(jié)果展示（紅色涂抹）

完整代碼：https://github.com/jyotidabass/GroundingSAM-Gradio-App/blob/main/GroundingSAM.ipynb

責(zé)任編輯：趙寧寧來源：小白玩轉(zhuǎn)Python

計(jì)算機(jī)視覺圖像分割

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<style id="zvj5e"><rp id="zvj5e"></rp></style>

<xmp id="zvj5e"></xmp>

<thead id="zvj5e"></thead>

<style id="zvj5e"></style>