使用零樣本目標(biāo)檢測識別物體 | 附代碼

作者：二旺 2024-11-20 16:51:00

在這篇文章中，我們將探討如何使用Hugging Face的transformers庫來使用零樣本目標(biāo)檢測在冰箱圖像中識別物體

在這篇文章中，我們將探討如何使用Hugging Face的transformers庫來使用零樣本目標(biāo)檢測在冰箱圖像中識別物體。這種方法允許我們在不需要針對這些物體進(jìn)行特定預(yù)訓(xùn)練的情況下識別各種物品。

以下是如何工作的代碼的逐步指南。在這種情況下，我們使用Google的OWL-ViT模型，該模型非常適合目標(biāo)檢測任務(wù)。該模型作為管道加載，允許我們將其作為目標(biāo)檢測器使用，設(shè)置非常簡單。

# 導(dǎo)入必要的庫
from transformers import pipeline

在這里，transformers庫用于目標(biāo)檢測，利用Hugging Face的零樣本目標(biāo)檢測模型。零樣本模型是目標(biāo)檢測任務(wù)的強(qiáng)大工具，因?yàn)樗鼈儾恍枰獙γ總€(gè)對象的特定數(shù)據(jù)集進(jìn)行訓(xùn)練，而是能夠開箱即用地理解各種對象的上下文。

# 從Hugging Face模型中心加載特定檢查點(diǎn)
checkpoint = “google/owlv2-base-patch16-ensemble”
detector = pipeline(model=checkpoint, task=”zero-shot-object-detection”)

加載和顯示圖像

# 導(dǎo)入圖像處理庫
import skimage
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# 加載并顯示圖像
image = Image.open(‘/content/image2.jpg’)
plt.imshow(image)
plt.axis(‘off’)
plt.show()
image = Image.fromarray(np.uint8(image)).convert(“RGB”)

在這里，我們使用廣泛用于Python圖像處理的PIL庫從指定路徑加載圖像（image2.jpg）。然后我們使用matplotlib顯示它。

檢測物體

模型已加載，圖像已準(zhǔn)備就緒，我們繼續(xù)進(jìn)行檢測。

# 定義候選標(biāo)簽并在圖像上運(yùn)行檢測器
predictions = detector(
    image,
    candidate_labels=[“fanta”, “cokacola”, “bottle”, “egg”, “bowl”, “donut”, “milk”, “jar”, “curd”, “pickle”, “refrigerator”, “fruits”, “vegetables”, “bread”,”yogurt”],
)
predictions

[{'score': 0.4910733997821808,
  'label': 'bottle',
  'box': {'xmin': 419, 'ymin': 1825, 'xmax': 574, 'ymax': 2116}},
 {'score': 0.45601949095726013,
  'label': 'bottle',
  'box': {'xmin': 1502, 'ymin': 795, 'xmax': 1668, 'ymax': 1220}},
 {'score': 0.4522128999233246,
  'label': 'bottle',
  'box': {'xmin': 294, 'ymin': 1714, 'xmax': 479, 'ymax': 1924}},
 {'score': 0.4485340714454651,
  'label': 'milk',
  'box': {'xmin': 545, 'ymin': 811, 'xmax': 770, 'ymax': 1201}},
 {'score': 0.44276902079582214,
  'label': 'bottle',
  'box': {'xmin': 1537, 'ymin': 958, 'xmax': 1681, 'ymax': 1219}},
 {'score': 0.4287840723991394,
  'label': 'bottle',
  'box': {'xmin': 264, 'ymin': 1726, 'xmax': 459, 'ymax': 2104}},
 {'score': 0.41883620619773865,
  'label': 'bottle',
  'box': {'xmin': 547, 'ymin': 632, 'xmax': 773, 'ymax': 1203}},
 {'score': 0.15758953988552094,
  'label': 'jar',
  'box': {'xmin': 1141, 'ymin': 1628, 'xmax': 1259, 'ymax': 1883}},
 {'score': 0.15696804225444794,
  'label': 'egg',
  'box': {'xmin': 296, 'ymin': 1034, 'xmax': 557, 'ymax': 1131}},
 {'score': 0.15674084424972534,
  'label': 'egg',
  'box': {'xmin': 292, 'ymin': 1109, 'xmax': 552, 'ymax': 1212}},
 {'score': 0.1565699428319931,
  'label': 'coke',
  'box': {'xmin': 294, 'ymin': 1714, 'xmax': 479, 'ymax': 1924}},
 {'score': 0.15651869773864746,
  'label': 'milk',
  'box': {'xmin': 417, 'ymin': 1324, 'xmax': 635, 'ymax': 1450}}]

在零樣本檢測中，我們提供了一個(gè)候選標(biāo)簽列表，或在圖像中尋找的可能物品，例如常見的冰箱物品：“fanta”，“milk”，“yogurt”等。然后模型嘗試在圖像中定位這些物體，提供它們的邊界框和置信度分?jǐn)?shù)。

可視化檢測結(jié)果

為了可視化檢測到的物體，我們在它們周圍繪制矩形框，并用檢測到的標(biāo)簽和置信度分?jǐn)?shù)標(biāo)記它們。

from PIL import ImageDraw
draw = ImageDraw.Draw(image)
for prediction in predictions:
    box = prediction[“box”]
    label = prediction[“l(fā)abel”]
    score = prediction[“score”]
    xmin, ymin, xmax, ymax = box.values()
    draw.rectangle((xmin, ymin, xmax, ymax), outline=”red”, width=1)
    draw.text((xmin, ymin), f”{label}: {round(score,2)}”, fill=”white”)
image

代碼創(chuàng)建了一個(gè)ImageDraw實(shí)例，允許我們在圖像上疊加矩形框和文本。對于每個(gè)檢測到的物體，我們提取其邊界框坐標(biāo)（xmin，ymin，xmax，ymax），標(biāo)簽和置信度分?jǐn)?shù)。在檢測到的物體周圍繪制矩形框，并將標(biāo)簽和分?jǐn)?shù)添加為文本。

提取檢測到的物體

get_detected_objects函數(shù)允許我們僅從預(yù)測中提取檢測到的物體的標(biāo)簽，以便更容易地訪問物體名稱。

# 提取檢測到的物體的函數(shù)
def get_detected_objects(predictions):
    detected_objects = [pred[“l(fā)abel”] for pred in predictions]
    return detected_objects

# 打印檢測到的物體列表
detected_objects = get_detected_objects(predictions)
print(“Detected Objects:”, detected_objects)

輸出：

Detected Objects: [‘bottle’, ‘bottle’, ‘bottle’, ‘milk’, ‘bottle’, ‘bottle’, ‘bottle’, ‘coke’, ‘jar’, ‘milk’, ‘refrigerator’, ‘jar’, ‘jar’, ‘refrigerator’, ‘bottle’, ‘jar’, ‘yogurt’, ‘yogurt’, ‘refrigerator’, ‘bottle’, ‘jar’, ‘vegetables’, ‘bottle’, ‘jar’, ‘coke’, ‘jar’, ‘yogurt’, ‘coke’, ‘yogurt’, ‘milk’, ‘coke’, ‘egg’, ‘egg’, ‘bottle’, ‘vegetables’, ‘milk’, ‘coke’, ‘fruits’, ‘vegetables’, ‘milk’, ‘jar’, ‘jar’, ‘bottle’, ‘yogurt’, ‘refrigerator’, ‘milk’, ‘milk’, ‘coke’, ‘bottle’, ‘coke’, ‘egg’, ‘yogurt’, ‘bottle’, ‘milk’, ‘refrigerator’, ‘bottle’, ‘bottle’, ‘egg’, ‘bottle’, ‘milk’, ‘egg’, ‘bottle’, ‘milk’, ‘curd’, ‘coke’, ‘bowl’, ‘vegetables’, ‘milk’, ‘milk’, ‘coke’, ‘egg’, ‘bottle’, ‘curd’, ‘egg’, ‘egg’, ‘yogurt’, ‘egg’, ‘bottle’, ‘egg’, ‘jar’, ‘egg’, ‘egg’, ‘coke’, ‘milk’, ‘vegetables’, ‘curd’, ‘bottle’, ‘jar’, ‘egg’, ‘yogurt’, ‘milk’, ‘egg’, ‘fruits’, ‘yogurt’, ‘jar’, ‘milk’, ‘milk’, ‘curd’, ‘fruits’, ‘curd’, ‘yogurt’, ‘yogurt’, ‘yogurt’, ‘egg’, ‘coke’, ‘egg’, ‘refrigerator’, ‘cokacola’, ‘curd’, ‘jar’, ‘bottle’, ‘refrigerator’, ‘bottle’, ‘milk’, ‘milk’, ‘coke’, ‘curd’, ‘yogurt’, ‘fruits’, ‘yogurt’, ‘vegetables’, ‘yogurt’, ‘coke’, ‘cokacola’, ‘egg’, ‘milk’, ‘milk’, ‘egg’, ‘coke’, ‘coke’, ‘curd’, ‘cokacola’, ‘jar’, ‘jar’, ‘bottle’, ‘curd’, ‘coke’, ‘yogurt’, ‘curd’, ‘fruits’, ‘refrigerator’, ‘milk’, ‘fruits’, ‘cokacola’, ‘milk’, ‘cokacola’, ‘egg’, ‘yogurt’, ‘pickle’, ‘fruits’, ‘coke’, ‘pickle’, ‘egg’, ‘fruits’, ‘refrigerator’, ‘refrigerator’, ‘bottle’, ‘curd’, ‘egg’, ‘egg’, ‘bottle’, ‘refrigerator’, ‘egg’, ‘jar’, ‘jar’, ‘bottle’, ‘pickle’, ‘egg’, ‘jar’, ‘cokacola’, ‘yogurt’, ‘milk’, ‘curd’, ‘bottle’, ‘milk’, ‘milk’, ‘cokacola’, ‘bottle’]

這段代碼僅從預(yù)測中檢索標(biāo)簽，并打印檢測到的物體列表。

擴(kuò)展檢測標(biāo)簽

我們可以通過調(diào)整候選標(biāo)簽來執(zhí)行進(jìn)一步的檢測，例如添加其他飲料或品牌。

# 使用額外的標(biāo)簽再次運(yùn)行檢測器
predictions = detector(
    image,
    candidate_labels=[“fanta”, “cokacola”, “pepsi”, “mountain dew”, “sprite”, “pepper”, “sangria”, “vitamin water”, “beer”],
)

通過這種方式，我們擴(kuò)展了候選標(biāo)簽列表，允許我們搜索冰箱中常見的其他物品和品牌。

from PIL import ImageDraw
draw = ImageDraw.Draw(image)
for prediction in predictions:
    box = prediction[“box”]
    label = prediction[“l(fā)abel”]
    score = prediction[“score”]
    xmin, ymin, xmax, ymax = box.values()
    draw.rectangle((xmin, ymin, xmax, ymax), outline=”red”, width=1)
    draw.text((xmin, ymin), f”{label}: {round(score,2)}”, fill=”white”)
image

圖像中檢測到的物體

結(jié)論

這個(gè)代碼示例展示了零樣本目標(biāo)檢測在動(dòng)態(tài)環(huán)境中識別物體的強(qiáng)大功能，比如冰箱內(nèi)部。通過指定自定義標(biāo)簽，你可以將檢測定制到廣泛的應(yīng)用中，而無需為每個(gè)特定任務(wù)重新訓(xùn)練模型。Hugging Face的transformers庫和像Google的OWL-ViT這樣的預(yù)訓(xùn)練模型，使得實(shí)施強(qiáng)大的目標(biāo)檢測變得非常簡單，幾乎不需要設(shè)置。

責(zé)任編輯：趙寧寧來源：小白玩轉(zhuǎn)Python