自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

YOLO、SSD 和 Faster R-CNN 三種方案實(shí)現(xiàn)物體識別的對比

人工智能 機(jī)器視覺 開發(fā)
本文我將評估并比較三種流行的物體檢測模型:YOLO(You Only Look Once)、Faster R-CNN(基于區(qū)域的卷積神經(jīng)網(wǎng)絡(luò))和SSD(單次多框檢測器)。

本文旨在開發(fā)一個(gè)能夠準(zhǔn)確檢測和分割視頻中物體的計(jì)算機(jī)視覺系統(tǒng)。我將使用最先進(jìn)的三種SoA(State-of-the-Art)方法:YOLO、SSD和Faster R-CNN,并評估它們的性能。然后,我通過視覺分析結(jié)果,突出它們的優(yōu)缺點(diǎn)。接下來,我根據(jù)評估和分析確定表現(xiàn)最佳的方法。我將提供一個(gè)鏈接,展示最佳方法在視頻中的表現(xiàn)。

1. YOLO(You Only Look Once)

YOLOv8等深度學(xué)習(xí)模型在機(jī)器人、自動(dòng)駕駛和視頻監(jiān)控等多個(gè)行業(yè)中變得至關(guān)重要。這些模型能夠?qū)崟r(shí)檢測物體,并對安全和決策過程產(chǎn)生影響。YOLOv8(You Only Look Once)利用計(jì)算機(jī)視覺技術(shù)和機(jī)器學(xué)習(xí)算法,以高速度和準(zhǔn)確性識別圖像和視頻中的物體。這使得高效且準(zhǔn)確的物體檢測成為可能,這在許多應(yīng)用中至關(guān)重要(Keylabs, 2023)。

實(shí)現(xiàn)細(xì)節(jié)

我創(chuàng)建了一個(gè)run_model函數(shù)來實(shí)現(xiàn)物體檢測和分割。該函數(shù)接收三個(gè)參數(shù)作為輸入:模型、輸入視頻和輸出視頻。它逐幀讀取視頻,并將輸入視頻的結(jié)果可視化到幀上。然后,注釋后的幀被保存到輸出視頻文件中,直到所有幀都被處理完畢或用戶按下“q”鍵停止處理。

我使用YOLO模型(yolov8n.pt,“v8”)進(jìn)行物體檢測,該模型顯示帶有檢測到的邊界框的視頻。同樣,對于物體分割,使用具有分割特定權(quán)重的YOLO模型(yolov8n-seg.pt)生成帶有分割物體的視頻。

def run_model(model, video, output_video):
    model = model
    cap = cv2.VideoCapture(video)

    # Create a VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')

    # Get frame width and height
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    out = cv2.VideoWriter(output_video, fourcc, 20.0, (frame_width, frame_height))

    if not cap.isOpened():
        print("Cannot open camera")
        exit()

    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()

        if not ret:
            print("No frame...")
            break

        # Predict on image
        results = model.track(source=frame, persist=True, tracker='bytetrack.yaml')
        frame = results[0].plot()

        # Write the frame to the output video file
        out.write(frame)

        # Display the resulting frame
        cv2.imshow("ObjectDetection", frame)

        # Terminate run when "Q" pressed
        if cv2.waitKey(1) == ord("q"):
            break

    # When everything done, release the capture
    cap.release()

    # Release the video recording
    # out.release()
    cv2.destroyAllWindows()

# Object Detection
run_model(model=YOLO('yolov8n.pt', "v8"), video=VIDEO, output_video=OUTPUT_VIDEO_YOLO_DET)

# Object Segmentation
run_model(model=YOLO('yolov8n-seg.pt', "v8"), video=VIDEO, output_video=OUTPUT_VIDEO_YOLO_SEG)

2. Faster R-CNN(基于區(qū)域的卷積神經(jīng)網(wǎng)絡(luò))

Faster R-CNN是一種最先進(jìn)的物體檢測模型。它有兩個(gè)主要組件:一個(gè)深度全卷積區(qū)域提議網(wǎng)絡(luò)和一個(gè)Fast R-CNN物體檢測器。它使用區(qū)域提議網(wǎng)絡(luò)(RPN),該網(wǎng)絡(luò)與檢測網(wǎng)絡(luò)共享全圖像卷積特征(Ren等,2015)。RPN是一個(gè)全卷積神經(jīng)網(wǎng)絡(luò),生成高質(zhì)量的提議。然后,F(xiàn)ast R-CNN使用這些提議進(jìn)行物體檢測。這兩個(gè)模型被組合成一個(gè)單一的網(wǎng)絡(luò),RPN指導(dǎo)在哪里尋找物體(Ren等,2015)。

(1) 使用Faster R-CNN進(jìn)行物體檢測

為了實(shí)現(xiàn)物體檢測,我創(chuàng)建了兩個(gè)函數(shù):get_model和detect_and_draw_boxes。get_model函數(shù)加載一個(gè)預(yù)訓(xùn)練的Faster R-CNN模型,該模型是torchvision庫的一部分,并在COCO數(shù)據(jù)集上使用ResNet-50-FPN骨干網(wǎng)絡(luò)進(jìn)行預(yù)訓(xùn)練。我將模型設(shè)置為評估模式。然后,detect_and_draw_boxes函數(shù)對單個(gè)視頻幀進(jìn)行物體檢測,并在檢測到的物體周圍繪制邊界框。它將幀轉(zhuǎn)換為張量并傳遞給模型。該模型返回預(yù)測結(jié)果,包括檢測到的物體的邊界框、標(biāo)簽和分?jǐn)?shù)。置信度分?jǐn)?shù)高于0.9的邊界框,以及指示類別和置信度分?jǐn)?shù)的標(biāo)簽被添加。


def get_model():
    # Load a pre-trained Faster R-CNN model    
    weights = FasterRCNN_ResNet50_FPN_Weights.DEFAULT
    model = fasterrcnn_resnet50_fpn(weights=weights, pretrained=True)
    model.eval()
    return model

def faster_rcnn_object_detection(model, frame):
    # Transform frame to tensor and add batch dimension
    transform = T.Compose([T.ToTensor()])
    frame_tensor = transform(frame).unsqueeze(0)

    with torch.no_grad():
        prediction = model(frame_tensor)

    bboxes, labels, scores = prediction[0]["boxes"], prediction[0]["labels"], prediction[0]["scores"]

    # num = torch.argwhere(scores > 0.9).shape[0]

    # Draw boxes and labels on the frame
    for i in range(len(prediction[0]['boxes'])):
        xmin, ymin, xmax, ymax = bboxes[i].numpy().astype('int')
        class_name = COCO_NAMES[labels.numpy()[i] -1]

        if scores[i] > 0.9:  # Only draw boxes for confident predictions
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 255, 0), 3)

            # Put label
            label = f"{class_name}: {scores[i]:.2f}"
            cv2.putText(frame, label, (xmin, ymin - 10), FONT, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    return frame

# Set up the model
model = get_model()

# Video capture setup
cap = cv2.VideoCapture(VIDEO)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')

# Get frame width and height
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(OUTPUT_VIDEO_FASTER_RCNN_DET, fourcc, 20.0, (frame_width, frame_height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("No frame...")
        break

    # Process frame
    processed_frame = faster_rcnn_object_detection(model, frame)

    # Write the processed frame to output
    out.write(processed_frame)

    # Display the frame
    cv2.imshow('Frame', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release everything is finished
cap.release()
out.release()
cv2.destroyAllWindows()

(2) 使用Faster R-CNN進(jìn)行物體分割

為了實(shí)現(xiàn)物體分割,我創(chuàng)建了函數(shù)來加載預(yù)訓(xùn)練的Mask R-CNN模型、預(yù)處理視頻幀、應(yīng)用分割并將掩碼覆蓋在幀上。首先,我使用從torchvision庫加載的預(yù)訓(xùn)練Mask R-CNN模型,該模型具有ResNet-50-FPN骨干網(wǎng)絡(luò),并將其設(shè)置為評估模式。我在COCO數(shù)據(jù)集上訓(xùn)練了該模型。然后,preprocess_frame函數(shù)對每個(gè)視頻幀進(jìn)行預(yù)處理并將其轉(zhuǎn)換為張量。接下來,apply_segmentation函數(shù)對預(yù)處理后的幀應(yīng)用分割過程,overlay_masks函數(shù)將分割掩碼覆蓋在幀上,繪制邊界框,并為置信度較高的檢測添加標(biāo)簽。這涉及通過置信度閾值過濾檢測結(jié)果、覆蓋掩碼、繪制矩形和添加文本標(biāo)簽。


# Load the pre-trained Mask R-CNN model
model = maskrcnn_resnet50_fpn(pretrained=True)
model.eval()

# Function to overlay masks and draw rectangles and labels on the frame
def faster_rcnn_object_segmentation(frame, threshold=0.9):
    # Function to preprocess the frame
    transform = T.Compose([T.ToTensor()])
    frame_tensor = transform(frame).unsqueeze(0)

    with torch.no_grad():
        predictions = model(frame_tensor)

    labels = predictions[0]['labels'].cpu().numpy()
    masks = predictions[0]['masks'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    boxes = predictions[0]['boxes'].cpu().numpy()

    overlay = frame.copy()

    for i in range(len(masks)):
        if scores[i] > threshold:
            mask = masks[i, 0]
            mask = (mask > 0.6).astype(np.uint8)
            color = np.random.randint(0, 255, (3,), dtype=np.uint8)
            overlay[mask == 1] = frame[mask == 1] * 0.5 + color * 0.5

            xmin, ymin, xmax, ymax = boxes[i].astype('int')
            class_name = COCO_NAMES[labels[i] - 1]

            # Draw rectangle
            cv2.rectangle(overlay, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)

            # Put label
            label = f"{class_name}: {scores[i]:.2f}"
            cv2.putText(overlay, label, (xmin, ymin - 10), FONT, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    return overlay

# Capture video
cap = cv2.VideoCapture(VIDEO)

fourcc = cv2.VideoWriter_fourcc(*'mp4v')

# Get frame width and height
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(OUTPUT_VIDEO_FASTER_RCNN_SEG, fourcc, 20.0, (frame_width, frame_height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("No frame...")
        break

    # Overlay masks
    processed_frame = faster_rcnn_object_segmentation(frame)

    # Write the processed frame to output
    out.write(processed_frame)

    # Display the frame
    cv2.imshow('Frame', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release everything is finished
cap.release()
out.release()
cv2.destroyAllWindows()

3. SSD(單次多框檢測器)

SSD,即單次多框檢測器,是一種使用單一深度神經(jīng)網(wǎng)絡(luò)在圖像中進(jìn)行物體檢測的方法。它將邊界框的輸出空間離散化為每個(gè)特征圖位置上具有不同縱橫比和尺度的一組默認(rèn)框。在預(yù)測過程中,網(wǎng)絡(luò)為每個(gè)默認(rèn)框中每個(gè)物體類別的存在生成分?jǐn)?shù),并調(diào)整框以更好地匹配物體形狀。SSD結(jié)合了來自不同分辨率的多個(gè)特征圖的預(yù)測,以有效處理各種大小的物體,消除了提議生成和重采樣階段的需要,從而簡化了訓(xùn)練過程并集成到檢測系統(tǒng)中(Liu等,2016)。

(1) 使用SSD進(jìn)行物體檢測

我創(chuàng)建了一個(gè)ssd_object_detection函數(shù),該函數(shù)使用預(yù)訓(xùn)練的SSD模型,處理視頻幀,應(yīng)用檢測并在檢測到的物體周圍繪制邊界框,以實(shí)現(xiàn)使用SSD(單次多框檢測器)模型的物體檢測。


# Load the pre-trained SSD model
model = ssd300_vgg16(pretrained=True)
model.eval() 

def ssd_object_detection(frame, threshold=0.5):
    # Function to preprocess the frame
    transform = T.Compose([T.ToTensor()])
    frame_tensor = transform(frame).unsqueeze(0)

    with torch.no_grad():
        predictions = model(frame_tensor)

    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    boxes = predictions[0]['boxes'].cpu().numpy()


    for i in range(len(boxes)):
        if scores[i] > threshold:

            xmin, ymin, xmax, ymax = boxes[i].astype('int')
            class_name = COCO_NAMES[labels[i] - 1]

            # Draw rectangle
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)

            # Put label
            label = f"{class_name}: {scores[i]:.2f}"
            cv2.putText(frame, label, (xmin, ymin - 10), FONT, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    return frame

# Capture video
cap = cv2.VideoCapture(VIDEO)

fourcc = cv2.VideoWriter_fourcc(*'mp4v')

# Get frame width and height
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(OUTPUT_VIDEO_SSD_DET, fourcc, 20.0, (frame_width, frame_height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("No frame...")
        break

    # Overlay masks
    processed_frame = ssd_object_detection(frame)

    # Write the processed frame to output
    out.write(processed_frame)

    # Display the frame
    cv2.imshow('Frame', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release everything is finished
cap.release()
out.release()
cv2.destroyAllWindows()

(2) 使用SSD進(jìn)行物體分割

同樣,我創(chuàng)建了ssd_object_segmentation函數(shù),該函數(shù)加載預(yù)訓(xùn)練模型,處理視頻幀,應(yīng)用分割并在檢測到的物體上繪制掩碼和標(biāo)簽,以實(shí)現(xiàn)物體分割。


# Load the pre-trained SSD model
model = ssd300_vgg16(pretrained=True)
model.eval()

def ssd_object_segmentation(frame, threshold=0.5):
    # Function to preprocess the frame
    transform = T.Compose([T.ToTensor()])
    frame_tensor = transform(frame).unsqueeze(0)

    with torch.no_grad():
        predictions = model(frame_tensor)

    labels = predictions[0]['labels'].cpu().numpy()
    scores = predictions[0]['scores'].cpu().numpy()
    boxes = predictions[0]['boxes'].cpu().numpy()

    for i in range(len(boxes)):
        if scores[i] > threshold:
            xmin, ymin, xmax, ymax = boxes[i].astype('int')
            class_name = COCO_NAMES[labels[i] - 1]

            # Extract the detected object from the frame
            object_segment = frame[ymin:ymax, xmin:xmax]

            # Convert to grayscale and threshold to create a mask
            gray = cv2.cvtColor(object_segment, cv2.COLOR_BGR2GRAY)
            _, mask = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)

            # Find contours
            contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

            # Draw the contours on the original frame
            cv2.drawContours(frame[ymin:ymax, xmin:xmax], contours, -1, (0, 255, 0), thickness=cv2.FILLED)

            # Put label above the box
            label = f"{class_name}: {scores[i]:.2f}"
            cv2.putText(frame, label, (xmin, ymin - 10), FONT, 0.5, (255, 0, 0), 2, cv2.LINE_AA)

    return frame


# Capture video
cap = cv2.VideoCapture(VIDEO)  # replace with actual video file path

fourcc = cv2.VideoWriter_fourcc(*'mp4v')

# Get frame width and height
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = cv2.VideoWriter(OUTPUT_VIDEO_SSD_SEG, fourcc, 20.0, (frame_width, frame_height))

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("No frame...")
        break

    # Overlay segmentation masks
    processed_frame = ssd_object_segmentation(frame)

    # Write the processed frame to output
    out.write(processed_frame)

    # Display the frame
    cv2.imshow('Frame', processed_frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release everything once finished
cap.release()
out.release()
cv2.destroyAllWindows()

4.評估

在本節(jié)中,我將評估并比較三種流行的物體檢測模型:YOLO(You Only Look Once)、Faster R-CNN(基于區(qū)域的卷積神經(jīng)網(wǎng)絡(luò))和SSD(單次多框檢測器)。我在CPU設(shè)備上工作,而不是CUDA。評估階段包括:

  • 每秒幀數(shù)(FPS):FPS衡量每個(gè)模型每秒處理的幀數(shù)。
  • 推理時(shí)間:推理時(shí)間表示每個(gè)模型檢測幀中物體所需的時(shí)間。
  • 模型大?。耗P痛笮”硎久總€(gè)模型占用的磁盤空間。

(1) 性能差異討論

從評估結(jié)果中,我觀察到以下內(nèi)容:

  • 速度:YOLO在FPS和推理時(shí)間方面優(yōu)于Faster R-CNN和SSD。這表明它適用于實(shí)時(shí)應(yīng)用。
  • 準(zhǔn)確性:Faster R-CNN在準(zhǔn)確性上往往優(yōu)于YOLO和SSD,表明在物體檢測任務(wù)中具有更好的準(zhǔn)確性。
  • 模型大小:YOLO的模型大小最小,這使得它在存儲容量有限的設(shè)備上具有優(yōu)勢。

(2) 最佳表現(xiàn)方法

根據(jù)評估結(jié)果和定性分析,YOLO8v是視頻序列中物體檢測和分割的最佳SoA方法。其卓越的速度、緊湊的模型大小和強(qiáng)大的性能使其成為在實(shí)際應(yīng)用中準(zhǔn)確性和效率至關(guān)重要的理想選擇。

完整項(xiàng)目代碼和視頻:https://github.com/fatimagulomova/iu-projects/blob/main/DLBAIPCV01/MainProject.ipynb

責(zé)任編輯:趙寧寧 來源: 小白玩轉(zhuǎn)Python
相關(guān)推薦

2017-09-19 16:10:50

深度學(xué)習(xí)目標(biāo)檢測模型

2024-09-23 09:10:00

R-CNN深度學(xué)習(xí)Python

2024-08-22 08:24:51

算法CNN深度學(xué)習(xí)

2017-06-10 16:48:03

神經(jīng)網(wǎng)絡(luò)目標(biāo)計(jì)數(shù)機(jī)器學(xué)習(xí)

2017-04-24 23:46:40

卷積神經(jīng)網(wǎng)絡(luò)圖像R-CNN

2022-11-21 15:18:05

模型檢測

2024-05-28 08:17:54

2011-09-05 12:43:23

Sencha Touc事件

2022-05-07 09:20:38

智能客服模塊方案

2013-03-27 10:01:53

網(wǎng)絡(luò)應(yīng)用檢測工具

2016-09-13 14:05:24

Spark集群管理模式

2012-05-31 09:50:26

開源CMS

2011-04-08 09:25:50

虛擬機(jī)

2021-03-07 10:17:40

RDMA網(wǎng)絡(luò)傳輸網(wǎng)絡(luò)協(xié)議

2016-09-30 01:10:12

R語言聚類方法

2016-06-12 09:32:43

R語言Hadoop數(shù)據(jù)處理

2019-04-15 13:52:18

微服務(wù)配置中心Apollo

2022-03-29 07:33:21

內(nèi)網(wǎng)穿透工具

2014-12-22 09:10:13

IaaSPaaSAzure

2022-06-20 08:50:16

TypeScript類型語法
點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號