自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<p id="luktv"><li id="luktv"></li></p>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

Florence-2 with OpenVINO & FiftyOne：圖像分析中的現(xiàn)實(shí)世界應(yīng)用

作者：貓咪很乖 2024-10-18 17:08:53

開發(fā) 后端

在本文中，我們將深入探討一個(gè)實(shí)際的現(xiàn)實(shí)世界用例，結(jié)合兩個(gè)強(qiáng)大工具的優(yōu)勢(shì)，以最大化Florence-2模型的效率和易用性。

在本文中，我們將深入探討一個(gè)實(shí)際的現(xiàn)實(shí)世界用例，結(jié)合兩個(gè)強(qiáng)大工具的優(yōu)勢(shì)，以最大化Florence-2模型的效率和易用性。我們將首先使用OpenVINO將原始的PyTorch模型轉(zhuǎn)換為優(yōu)化的壓縮格式，使其能夠在僅使用CPU的機(jī)器上高效運(yùn)行。

為了進(jìn)一步提高其實(shí)用性并解鎖額外功能，我們將利用FiftyOne——一個(gè)用于探索和策劃圖像數(shù)據(jù)集的多功能工具——幫助我們?cè)诂F(xiàn)實(shí)世界場景中充分利用模型的預(yù)測(cè)。

目錄

每個(gè)組件的簡要介紹
從Pexels獲取免費(fèi)圖片
FiftyOne數(shù)據(jù)集
OpenVINO Florence-2模型
將Florence-2預(yù)測(cè)添加到我們的FiftyOne數(shù)據(jù)集中
探索結(jié)果
參考文獻(xiàn)、有用鏈接和源代碼

我盡量保持簡單，以便我們開始探索這個(gè)流程的潛力。下圖顯示了數(shù)據(jù)流，從我們收集的圖像（沒有標(biāo)簽的RGB圖像）到一個(gè)已經(jīng)具有一些有用功能和Florence-2模型預(yù)測(cè)的FiftyOne數(shù)據(jù)集。

流程概述

每個(gè)組件的簡要介紹

讓我們非常簡要地討論一下提到的每個(gè)組件，F(xiàn)lorence-2模型、OpenVINO和FiftyOne。

Florence-2是一個(gè)尖端的視覺基礎(chǔ)模型，能夠使用簡單的文本提示處理廣泛的計(jì)算機(jī)視覺和視覺-語言任務(wù)。與傳統(tǒng)模型在多項(xiàng)任務(wù)上掙扎不同，F(xiàn)lorence-2可以輕松地在圖像描述、目標(biāo)檢測(cè)和分割等任務(wù)之間切換。它通過使用一個(gè)龐大的數(shù)據(jù)集實(shí)現(xiàn)這一點(diǎn)，該數(shù)據(jù)集包含126百萬圖像上的54億視覺注釋，使其能夠理解復(fù)雜的視覺信息。這使得Florence-2成為開發(fā)者和研究人員的有力工具，提供了在多種應(yīng)用中零樣本學(xué)習(xí)和微調(diào)的高級(jí)功能。

“OpenVINO是一個(gè)開源工具包，用于優(yōu)化和部署從云端到邊緣的深度學(xué)習(xí)模型。它加速了各種用例的深度學(xué)習(xí)推理，例如生成性AI、視頻、音頻和語言，支持PyTorch、TensorFlow、ONNX等流行框架的模型。轉(zhuǎn)換和優(yōu)化模型，并在Intel?硬件和環(huán)境的混合部署，在本地和設(shè)備上，在瀏覽器或云端。”

FiftyOne為優(yōu)化圖像數(shù)據(jù)集分析流程提供了構(gòu)建塊。包括可視化復(fù)雜標(biāo)簽、評(píng)估模型預(yù)測(cè)、識(shí)別失敗模式、查找注釋錯(cuò)誤等。這是一個(gè)非常好的工具，強(qiáng)烈推薦你查看他們的官方網(wǎng)站。

從Pexels獲取免費(fèi)圖片

首先，我們需要一些圖片來開始工作。在這個(gè)例子中，我將從pexels.com收集具有通用許可的圖片，為了更有效地下載它們，我使用了一個(gè)名為pexel-downloader的Python包，不過你可以使用任何地方的任何圖片池。

安裝pexel-downloader：

pip install pexel-downloader

我已經(jīng)下載了一些“奧林匹克運(yùn)動(dòng)”的圖片，使用pexel-downloader的代碼如下：

from pexel_downloader import PexelDownloader

if __name__ == '__main__':
    downloader = PexelDownloader(api_key="<YOUR-PEXELS-API-KEY>")

    query = "olympics sports"
    save_dir = "./dataset/images"
    downloader.download_images(query=query,
                               num_images=100,
                               save_directory=save_dir,
                               size='medium')

這將下載100張圖片并將它們保存到“./dataset/images”文件夾中。

FiftyOne數(shù)據(jù)集

一旦我們有了圖片文件夾，我們就可以創(chuàng)建我們的初始FiftyOne數(shù)據(jù)集。安裝FiftyOne和創(chuàng)建五十一數(shù)據(jù)集的代碼片段：

pip install fiftyone

import fiftyone as fo

images_dir   = "./datasets/images"
dataset_name = "sports-dataset"
dataset = fo.Dataset.from_images_dir(images_dir,
                                     name=dataset_name,
                                     persistent=True)

# You can launch the FiftyOne UI from here or later using the CLI program
# to launch it from here you can just do
session = fo.launch_app(dataset)
session.wait(-1)

# to launch from your terminal just do
# fiftyone app launch <dataset-name>

當(dāng)Fiftyone應(yīng)用程序運(yùn)行時(shí)，你現(xiàn)在可以從瀏覽器（默認(rèn)localhost:5151）探索你的數(shù)據(jù)集。

沒有標(biāo)簽的我們運(yùn)動(dòng)數(shù)據(jù)集的FiftyOne UI

OpenVINO Florence-2模型

下一步是使用OpenVINO優(yōu)化Florence-2模型。幸運(yùn)的是，我們可以依賴英特爾OpenVINO團(tuán)隊(duì)的出色工作。他們已經(jīng)創(chuàng)建了一個(gè)全面的演示和代碼，展示了如何從Hugging Face轉(zhuǎn)換Florence-2 PyTorch模型。你可以從我的Google Drive獲取轉(zhuǎn)換后的模型，只需將其下載到完整代碼項(xiàng)目目錄的主文件夾中（本文的完整Github代碼見文章末尾）。

將Florence-2預(yù)測(cè)添加到我們的FiftyOne數(shù)據(jù)集中

最后，讓我們將OpenVINO Florence-2預(yù)測(cè)添加到我們的FiftyOne數(shù)據(jù)集中！為此，我們只使用目標(biāo)檢測(cè)和圖像描述。此外，讓我們探索另一個(gè)非常有用的特性，即圖像嵌入空間探索工具，為此，讓我們將佛羅倫薩圖像編碼器的輸出保存為我們的圖像嵌入。完整的代碼如下所述，基本上，它加載我們已經(jīng)創(chuàng)建的Fiftyone數(shù)據(jù)集，并從磁盤加載我們的模型（使用openvino-notebook示例中的OVFlorence2Model）。

import click
import fiftyone.brain as fob
import fiftyone as fo
from ov_florence2_helper import OVFlorence2Model
from transformers import AutoProcessor
from PIL import Image

import numpy as np

def normalize_bbox(bbox, image_height, image_width):
    x1, y1, x2, y2 = bbox
    return (x1 / image_width, y1 / image_height,
            (x2 - x1) / image_width, (y2 - y1) / image_height)

def run_inference(sample_collection, model_path):
    processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
    model = OVFlorence2Model(model_path, "AUTO")

    for sample in sample_collection.iter_samples(autosave=True, progress=True):
        try:
            # Load image
            image = Image.open(sample.filepath)
            width, height = image.width, image.height

            # Extract image-features (embedding)
            inputs = processor(text="<OD>", images=image, return_tensors="pt")
            image_features = model.encode_image(inputs["pixel_values"])

            # Object detection and caption inference in a single loop
            detections, caption = [], None
            for task in ["<OD>", "<CAPTION>"]:
                if task == "<CAPTION>":
                    inputs = processor(text=task, images=image, return_tensors="pt")
                generated_ids = model.generate(input_ids=inputs["input_ids"],
                                            pixel_values=inputs["pixel_values"],
                                            max_new_tokens=1024,
                                            do_sample=False,
                                            image_features=image_features,
                                            num_beams=3)
                generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
                parsed_answer  = processor.post_process_generation(generated_text, task=task, image_size=(width, height))

                if task == "<OD>":
                    for idx, bbox in enumerate(parsed_answer[task]['bboxes']):
                        label = parsed_answer[task]["labels"][idx]
                        normalized_bbox = normalize_bbox(bbox, height, width)
                        detections.append(fo.Detection(label=label, bounding_box=normalized_bbox))
                else:
                    caption = parsed_answer[task]

            # Add predictions to sample
            sample["detections"] = fo.Detections(detections=detections)
            sample["caption"] = caption
            sample["florence2_image_feats"] = image_features.reshape(-1) # flatting image features
        except Exception as e:
            continue

@click.command()
@click.option("--dataset-name",
              "--name",
              required=True,
              prompt="Name of the dataset?")
@click.option("--model-path",
              "-m",
              required=False,
              default="Florence-2-base")
def main(dataset_name, model_path):
    assert fo.dataset_exists(dataset_name), f"Dataset {dataset_name} does not exist yet."
    dataset = fo.load_dataset(dataset_name)
    run_inference(dataset, model_path)

    ###################################################################
    # Get 2D embedding space visualization from florence2-image-feats #
    ###################################################################
    # recovery embeddings (image features) from the sample field "florence2_image_feats", populated during "run_inference"
    florence_embeddings = dataset.values(field_or_expr="florence2_image_feats")
    florence_embeddings = np.array(florence_embeddings).reshape(len(dataset), -1)

    print("[INFO] Computing 2D visualization using embeddings")
    fob.compute_visualization(dataset,
                              embeddings=florence_embeddings,
                              method="umap",
                              brain_key="florence2_embegginds_viz")

if __name__ == '__main__':
    main()

我們添加了模型的三個(gè)期望內(nèi)容，目標(biāo)檢測(cè)（邊界框）、標(biāo)題（文本）和圖像嵌入（編碼器的圖像特征）。

sample["detections"] = fo.Detections(detections=detections)
sample["caption"] = caption
sample["florence2_image_feats"] = image_features.reshape(-1) # flatting image features

在遍歷所有樣本并添加預(yù)測(cè)之后，我們可以使用“florence2_image_feats”創(chuàng)建嵌入空間的2D可視化。下面的代碼片段顯示了如何使用FiftyOne大腦模塊（fiftyone.brain）的內(nèi)置函數(shù)來實(shí)現(xiàn)這一點(diǎn)。

###################################################################
# Get 2D embedding space visualization from florence2-image-feats #
###################################################################
# recovery embeddings (image features) from the sample field "florence2_image_feats", populated during "run_inference"
florence_embeddings = dataset.values(field_or_expr="florence2_image_feats")
florence_embeddings = np.array(florence_embeddings).reshape(len(dataset), -1)

print("[INFO] Computing 2D visualization using embeddings")
fob.compute_visualization(dataset,
                          embeddings=florence_embeddings,
                          method="umap",
                          brain_key="florence2_embegginds_viz")

探索結(jié)果

帶有標(biāo)題的樣本示例：“兩個(gè)擊劍運(yùn)動(dòng)員在舞臺(tái)上進(jìn)行動(dòng)作”

標(biāo)題：“一群花樣游泳運(yùn)動(dòng)員在游泳池中”

讓我們也檢查一下嵌入空間以及嵌入接近的樣本之間的關(guān)系（“聚類意義”）。

一個(gè)包含游泳池/水的圖像群

體育場

參考資料OpenVINO的官方文檔：https://docs.openvino.ai/2024/index.html

完整代碼：https://github.com/Gabriellgpc/computer-vision-dataset-maker

責(zé)任編輯：趙寧寧來源：小白玩轉(zhuǎn)Python

FiftyOne 模型

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<blockquote id="d1z1f"><i id="d1z1f"></i></blockquote>