自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

視頻分析:圖表可視化與YoloV10架構(gòu)的計數(shù)、速度和距離估算

開發(fā)
本文我將討論YoloV10以及我的視頻分析演示項目,該項目通過圖表可視化估算物體計數(shù)、速度和距離,以商業(yè)角度出發(fā)。

計算機(jī)視覺是一個跨學(xué)科的科學(xué)領(lǐng)域,涉及如何讓計算機(jī)從數(shù)字圖像或視頻中獲得高級理解。從工程的角度來看,它旨在理解和自動化人類視覺系統(tǒng)能夠完成的任務(wù)?,F(xiàn)在,任務(wù)分類、物體檢測、分割和關(guān)鍵點檢測是主要的實時計算機(jī)視覺應(yīng)用。那么,大家認(rèn)為它是如何發(fā)展的呢?首先,我們簡要討論一下計算機(jī)視覺的主要問題。

看看上面的圖片后,您對與計算機(jī)視覺挑戰(zhàn)相關(guān)的所有關(guān)鍵術(shù)語有了清晰的理解。這是一個嚴(yán)重的問題,盡管它取決于問題的陳述。然而,這種困難通常是每個2D計算機(jī)視覺工程師都會遇到的。

六年前,我開始了我的圖像處理職業(yè)生涯,進(jìn)而學(xué)習(xí)了計算機(jī)視覺,掌握了許多基礎(chǔ)知識,然后根據(jù)挑戰(zhàn)需求轉(zhuǎn)向基于深度學(xué)習(xí)的模型。例如,我會詳細(xì)探討和研究許多數(shù)學(xué)概念,如resnet和vgg19、unet、efficientnet等。然而,當(dāng)我與許多學(xué)生談?wù)撛缙诘挠嬎銠C(jī)視覺時,他們都只提到“YOLO”。此外,大多數(shù)人不了解YOLO的后端操作。由于YOLO的廣泛流行以及與其他模型相比的極高準(zhǔn)確性,這才是真正重要的。

每個人都希望有現(xiàn)成的解決方案——只需安裝軟件包并讓過程發(fā)生——而不必了解后端操作。解決方案就是YOLO(Ultralytics)。我將討論YoloV10以及我的視頻分析演示項目,該項目通過圖表可視化估算物體計數(shù)、速度和距離,以商業(yè)角度出發(fā)。

什么是YOLO?

基于CNN的物體檢測器主要用于推薦系統(tǒng)。YOLO(You Only Look Once)模型用于高性能的物體檢測。YOLO將圖像劃分為網(wǎng)格系統(tǒng),每個網(wǎng)格在其中檢測物體。它們可以用于基于數(shù)據(jù)流的實時物體檢測,并且需要很少的計算資源。

目前YoloV10已經(jīng)發(fā)布!YoloV10的主要特點是什么?

實時物體檢測旨在以最小的延遲準(zhǔn)確預(yù)測圖像中的物體類別和位置。YOLO系列在性能和效率之間取得了平衡,但仍存在依賴NMS和架構(gòu)效率低下的問題。YoloV10通過引入無NMS訓(xùn)練和專注于效率和準(zhǔn)確性的模型設(shè)計策略解決了這些問題。

架構(gòu)

YOLOv10通過幾個創(chuàng)新改進(jìn)了之前的YOLO模型:

  • Backbone:使用改進(jìn)的CSPNet以更好的梯度流和減少計算冗余。
  • Neck:結(jié)合PAN層進(jìn)行有效的多尺度特征融合,聚合不同尺度的特征。
  • One-to-Many Head:在訓(xùn)練期間為每個物體生成多個預(yù)測,豐富了監(jiān)督信號并提高學(xué)習(xí)準(zhǔn)確性。
  • One-to-one Head:在推理期間為每個物體輸出單一最佳預(yù)測,消除了對NMS的需求并減少了延遲。

主要特點:

  • 無NMS訓(xùn)練:利用一致的雙重分配消除NMS需求,減少推理延遲。
  • 整體模型設(shè)計:優(yōu)化組件以兼顧效率和準(zhǔn)確性,具有輕量級分類頭、空間通道解耦下采樣和排名引導(dǎo)塊設(shè)計。
  • 增強(qiáng)的模型能力:采用大核卷積和部分自注意模塊,在不顯著增加計算成本的情況下提高性能。

YOLOV10的特別之處

YOLOv10引入了一種突破性的方法進(jìn)行實時物體檢測,消除了非最大抑制(NMS)的需求,并優(yōu)化了模型組件以獲得卓越的性能。通過利用一致的雙重分配和整體的效率-準(zhǔn)確性驅(qū)動的模型設(shè)計,YOLOv10實現(xiàn)了最先進(jìn)的準(zhǔn)確性,并減少了計算開銷。其架構(gòu)包括增強(qiáng)的主干和頸部組件,以及創(chuàng)新的一對多和一對一頭。憑借針對不同應(yīng)用需求的模型變體,YOLOv10在準(zhǔn)確性和效率方面設(shè)定了新標(biāo)準(zhǔn),超越了之前的YOLO版本和其他當(dāng)代檢測器。例如,YOLOv10-S在COCO數(shù)據(jù)集上比RT-DETR-R18快1.8倍,具有相似的AP,而YOLOv10-B的延遲減少了46%,參數(shù)減少了25%,與YOLOv9-C性能相同。

視頻分析:圖表可視化中的計數(shù)、速度、距離估算

在這個項目中,我開發(fā)了一個系統(tǒng),用戶可以即時獲得特定物體的計數(shù)、速度和距離估算,并通過圖形進(jìn)行可視化。這種能力為企業(yè)提供了即時且可操作的洞察,帶來了巨大的好處。

我使用了Ultralytics的模型,特別是YOLOv8s和YOLOv8n,以其高準(zhǔn)確性、效率和低延遲的物體檢測而著稱。這些模型對于簡化終端分析非常重要,使整個過程更加順暢和有效。

開發(fā)體驗非常愉快,并展示了通過先進(jìn)技術(shù)解決商業(yè)挑戰(zhàn)的巨大潛力。雖然YoloV8和YoloV10模型也能產(chǎn)生良好的結(jié)果,但YoloV10在準(zhǔn)確性和延遲方面表現(xiàn)更好。


'''
Final code: Video Analytics Specific Object

Guidance

1. User Input: Specific object
2. Specific Object detection, Speed and distance estimation
3. Graph Analytics: Pie, Area, Multi-Class line

!pip install ultralytics

'''

#Helper function
def create_pie_chart(data):
    fig, ax = plt.subplots(figsize=(4, 3))  # Aspect ratio of 4:3
    ax.pie(data.values(), labels=data.keys(), autopct='%1.1f%%')
    ax.legend()
    ax.set_title("Total Percentage of Individual Class Perspective")
    plt.close(fig)
    return fig

def create_area_plot(class_counts_over_time):
    fig, ax = plt.subplots(figsize=(4, 3))  # Aspect ratio of 4:3
    sorted_keys = sorted(class_counts_over_time.keys())
    for cls in sorted_keys:
        ax.fill_between(range(len(class_counts_over_time[cls])), class_counts_over_time[cls], label=cls, alpha=0.6)
    ax.legend()
    ax.set_title("Distribution of Each Class Over Time")
    ax.set_xlabel("Frame Count")
    ax.set_ylabel("Count")
    plt.close(fig)
    return fig

def create_multiple_line_plot(speed_data, distance_data, frame_count):
    fig, ax = plt.subplots(figsize=(4, 3))  # Aspect ratio of 4:3
    for track_id in speed_data.keys():
        ax.plot(range(frame_count), speed_data[track_id], label=f"Speed {track_id}")
    for track_id in distance_data.keys():
        ax.plot(range(frame_count), distance_data[track_id], label=f"Distance {track_id}")
    ax.legend()
    ax.set_title("Speed and Distance Identification of Each Class")
    ax.set_xlabel("Frame Count")
    ax.set_ylabel("Value")
    plt.close(fig)
    return fig

def create_scatter_plot(data):
    fig, ax = plt.subplots(figsize=(4, 3))  # Aspect ratio of 4:3
    x = list(data.keys())
    y = list(data.values())
    ax.scatter(x, y)
    ax.set_title("Class Distribution Scatter Plot")
    ax.set_xlabel("Class")
    ax.set_ylabel("Count")
    plt.close(fig)
    return fig

def fig_to_img(fig):
    fig.canvas.draw()
    img = np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8)
    img = img.reshape(fig.canvas.get_width_height()[::-1] + (3,))
    return img

def resize_and_place_image(base_image, overlay_image, position):
    overlay_image_resized = cv2.resize(overlay_image, (w // 3, h // 3))
    x, y = position
    base_image[y:y + overlay_image_resized.shape[0], x:x + overlay_image_resized.shape[1]] = overlay_image_resized
    return base_image

def draw_visualizations(frame, data, labels, speed_data, distance_data, class_counts_over_time, frame_count):
    vis_frame = np.zeros((h, w // 3, 3), dtype=np.uint8)

    # Create Pie Chart
    if data:
        pie_chart = create_pie_chart(data)
        pie_chart_img = fig_to_img(pie_chart)
        vis_frame = resize_and_place_image(vis_frame, pie_chart_img, (0, 0))

    # Create Area Plot
    if class_counts_over_time:
        area_plot = create_area_plot(class_counts_over_time)
        area_plot_img = fig_to_img(area_plot)
        vis_frame = resize_and_place_image(vis_frame, area_plot_img, (0, h // 3))

    # Create Multiple Line Plot
    if speed_data or distance_data:
        line_plot = create_multiple_line_plot(speed_data, distance_data, frame_count)
        line_plot_img = fig_to_img(line_plot)
        vis_frame = resize_and_place_image(vis_frame, line_plot_img, (0, 2 * (h // 3)))

    combined_frame = np.hstack((frame, vis_frame))
    return combined_frame

def pad_lists_to_length(data_dict, length, default_value=0):
    for key in data_dict.keys():
        if len(data_dict[key]) < length:
            data_dict[key] += [default_value] * (length - len(data_dict[key]))

'''
Main function:

Specific input based video analytics 
(object count, speed, distance estimation..)

'''

import cv2
import math
import numpy as np
import matplotlib.pyplot as plt
from ultralytics import YOLO
from ultralytics.utils.plotting import Annotator
from ultralytics.solutions import speed_estimation

# Initialize YOLO models
object_detection_model = YOLO("yolov8s.pt")
speed_estimation_model = YOLO("yolov8n.pt")
names = speed_estimation_model.model.names

# Open video file
cap = cv2.VideoCapture("/content/drive/MyDrive/yolo/race.mp4")
assert cap.isOpened(), "Error reading video file"

# Get video properties
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

# Initialize video writer
out = cv2.VideoWriter("Distribution_speed_distance_visual_scatter_unique1hor_car_overall.avi", cv2.VideoWriter_fourcc(*"MJPG"),15, (w + w // 3, h))

frame_count = 0
data = {}
labels = []
class_counts_over_time = {}
speed_over_time = {}
distance_over_time = {}

# Center point and pixel per meter for distance calculation
center_point = (0, h)
pixel_per_meter = 10

# Line points for speed estimation
line_pts = [(0, 360), (1280, 360)]

# Initialize speed-estimation object
speed_obj = speed_estimation.SpeedEstimator(names=names, reg_pts=line_pts, view_img=False)

# Colors for text and bounding box
txt_color, txt_background, bbox_clr = ((0, 0, 0), (255, 255, 255), (255, 0, 255))

print('Example input: horse:17, person: 0,car: 2, van: 8,bus: 5,tree: 62')
# Allow user to input desired classes
user_input = input("Enter desired classes with their IDs (format: 'class1:id1,class2:id2,...'): ")
# Example input: "person:0,car:2,horse:17"
desired_classes = {}
for item in user_input.split(','):
    cls, cls_id = item.split(':')
    desired_classes[cls.strip()] = int(cls_id.strip())

print("Desired classes:", desired_classes)

while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    frame_count += 1

    # Object detection for speed estimation
    speed_tracks = speed_estimation_model.track(frame, persist=True, show=False)
    frame = speed_obj.estimate_speed(frame, speed_tracks)

    # Object detection for distance estimation
    annotator = Annotator(frame, line_width=2)
    results = object_detection_model.track(frame, persist=True)

    if results[0].boxes.id is not None:
        boxes = results[0].boxes.xyxy.cpu()
        track_ids = results[0].boxes.id.int().cpu().tolist()
        clss = results[0].boxes.cls.cpu().tolist()

        for box, track_id, cls in zip(boxes, track_ids, clss):
            cls_name = object_detection_model.names[int(cls)]
            if cls_name in desired_classes and desired_classes[cls_name] == cls:  # Filter desired classes and IDs
                if cls_name not in labels:
                    labels.append(cls_name)

                if cls_name in data:
                    data[cls_name] += 1
                else:
                    data[cls_name] = 1

                annotator.box_label(box, label=str(track_id), color=bbox_clr)
                annotator.visioneye(box, center_point)

                x1, y1 = int((box[0] + box[2]) // 2), int((box[1] + box[3]) // 2)  # Bounding box centroid

                distance = (math.sqrt((x1 - center_point[0]) ** 2 + (y1 - center_point[1]) ** 2)) / pixel_per_meter

                text_size, _ = cv2.getTextSize(f"Distance: {distance:.2f} m", cv2.FONT_HERSHEY_SIMPLEX, 1.2, 3)
                cv2.rectangle(frame, (x1, y1 - text_size[1] - 10), (x1 + text_size[0] + 10, y1), txt_background, -1)
                cv2.putText(frame, f"Distance: {distance:.2f} m", (x1, y1 - 5), cv2.FONT_HERSHEY_SIMPLEX, 1.2, txt_color, 3)

                if track_id not in distance_over_time:
                    distance_over_time[track_id] = [0] * (frame_count - 1)
                distance_over_time[track_id].append(distance)

                speed = speed_obj.speeds.get(track_id, 0) if hasattr(speed_obj, 'speeds') else 0
                if track_id not in speed_over_time:
                    speed_over_time[track_id] = [0] * (frame_count - 1)
                speed_over_time[track_id].append(speed)

                if cls_name not in class_counts_over_time:
                    class_counts_over_time[cls_name] = [0] * frame_count
                if len(class_counts_over_time[cls_name]) < frame_count:
                    class_counts_over_time[cls_name].extend([0] * (frame_count - len(class_counts_over_time[cls_name])))
                class_counts_over_time[cls_name][-1] += 1

    # Pad lists to current frame count to ensure equal lengths
    pad_lists_to_length(distance_over_time, frame_count)
    pad_lists_to_length(speed_over_time, frame_count)

    # Draw combined visualizations on the frame
    combined_frame = draw_visualizations(frame, data, labels, speed_over_time, distance_over_time, class_counts_over_time, frame_count)

    # Write the frame with visualizations
    out.write(combined_frame)

    # Clear counts for next frame
    data = {}

    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# Generate and overlay scatter plot on the final frame
final_frame = np.zeros((h, w, 3), dtype=np.uint8)
scatter_plot = create_scatter_plot(class_counts_over_time)
scatter_plot_img = fig_to_img(scatter_plot)
final_frame = resize_and_place_image(final_frame, scatter_plot_img, (0, 0))

# Save the final frame with the scatter plot
cv2.imwrite("final_frame_with_scatter_plot.png", final_frame)

cap.release()
out.release()
cv2.destroyAllWindows()

# Print overall analytics
total_counts = sum(sum(counts) for counts in class_counts_over_time.values())
print(f"Overall total count: {total_counts}")
for cls, counts in class_counts_over_time.items():
    print(f"Total count for {cls}: {sum(counts)}")

best_speed = max((max(speeds) for speeds in speed_over_time.values()), default=0)
print(f"Overall best speed: {best_speed} m/s")
best_distance = max((max(distances) for distances in distance_over_time.values()), default=0)
print(f"Overall best distance: {best_distance} meters")

輸出:

在我看來,這個模型對于識別物體非常出色,并且可以輕松地針對特定物體進(jìn)行微調(diào)。即使對于那些不熟悉AI或深度學(xué)習(xí)的人來說,這個程序也非常容易使用且廣泛應(yīng)用。一個經(jīng)常出現(xiàn)的重要問題是,為什么AI、深度學(xué)習(xí)和計算機(jī)視覺工程師熱衷于開發(fā)這樣的項目。原因是多方面的。這些項目為推動該領(lǐng)域的發(fā)展和解決現(xiàn)實問題提供了寶貴的機(jī)會。我已經(jīng)建議每個人盡快完成這項任務(wù)。該領(lǐng)域的增長路徑對于所有人都是相同的,但在物體識別和分割項目中仍需解決許多挑戰(zhàn)。

這些挑戰(zhàn)包括:

  • 光線:光線條件的變化會顯著影響物體檢測的準(zhǔn)確性。
  • 環(huán)境:不同的背景和設(shè)置會使識別過程復(fù)雜化。
  • 問題定義:識別和解決正確的問題對于這些項目的成功至關(guān)重要。
  • 實地工作:在實際條件下的實施和測試是必不可少的,但管理起來可能很困難。

參考資料:

  • https://github.com/VK-Ant/Computervision_Exploration
  • https://docs.ultralytics.com/guides/analytics/
  • https://docs.ultralytics.com/models/yolov10/#holistic-efficiency-accuracy-driven-model-design
責(zé)任編輯:趙寧寧 來源: 小白玩轉(zhuǎn)Python
相關(guān)推薦

2024-12-03 15:25:27

2021-10-11 08:04:22

Python數(shù)據(jù)行程

2015-08-20 10:04:40

可視化

2020-03-01 14:01:22

Echarts數(shù)據(jù)可視化圖表

2023-10-24 20:38:15

數(shù)據(jù)分析機(jī)器學(xué)習(xí)

2023-06-11 16:12:14

數(shù)據(jù)可視化圖表類型

2023-08-01 16:01:59

可視化Seaborn

2021-04-09 10:42:03

數(shù)據(jù)可視化框架大數(shù)據(jù)

2022-06-29 08:28:58

數(shù)據(jù)可視化數(shù)據(jù)可視化平臺

2017-08-15 18:55:57

大數(shù)據(jù)數(shù)據(jù)可視化圖表

2022-08-23 12:32:37

Python可視化圖表

2017-05-23 09:07:48

可視化圖表視覺

2022-05-30 08:37:34

可視化圖表項目開源

2017-02-07 15:54:14

數(shù)據(jù)可視化數(shù)據(jù)分析

2023-10-12 08:02:36

2022-07-13 15:54:14

Matplotlib圖表

2019-05-28 11:52:43

可視化圖表數(shù)據(jù)

2024-10-14 17:43:05

2024-08-22 12:49:02

2018-12-26 15:55:50

數(shù)據(jù)分析數(shù)據(jù)可視化圖表
點贊
收藏

51CTO技術(shù)棧公眾號