自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

基于 Faster ViT 進(jìn)行圖像分類

開發(fā) 人工智能
Faster Vision Transformer(FVT)是Vision Transformer(ViT)架構(gòu)的一個變體,這是一種為計算機(jī)視覺任務(wù)設(shè)計的神經(jīng)網(wǎng)絡(luò)。

Faster Vision Transformer(FVT)是Vision Transformer(ViT)架構(gòu)的一個變體,這是一種為計算機(jī)視覺任務(wù)設(shè)計的神經(jīng)網(wǎng)絡(luò)。FVT 是原始 ViT 模型的更快、更高效版本,原始模型由 Dosovitskiy 等人在 2020 年的論文 “一幅圖像值 16x16 個詞:用于大規(guī)模圖像識別的轉(zhuǎn)換器” 中引入。

FVT 的關(guān)鍵特性

  • 高效架構(gòu):FVT 旨在比原始 ViT 模型更快、更高效。它通過減少參數(shù)數(shù)量和計算復(fù)雜性,同時保持類似的性能來實現(xiàn)這一點(diǎn)。
  • 多尺度視覺轉(zhuǎn)換器:FVT 使用多尺度視覺轉(zhuǎn)換器架構(gòu),允許它以多種尺度和分辨率處理圖像。這是通過使用層次結(jié)構(gòu)實現(xiàn)的,其中較小的轉(zhuǎn)換器用于處理圖像的較小區(qū)域。
  • 自注意力機(jī)制:FVT 使用自注意力機(jī)制,允許它對圖像的不同部分之間的復(fù)雜關(guān)系進(jìn)行建模。這是通過使用在訓(xùn)練過程中學(xué)習(xí)到的注意力權(quán)重來實現(xiàn)的。
  • 位置編碼:FVT 使用位置編碼來保留圖像的空間信息。這是通過使用學(xué)習(xí)到的位置嵌入來實現(xiàn)的,它們被添加到輸入令牌中。

首先,讓我們開始實現(xiàn)在自定義數(shù)據(jù)集上訓(xùn)練視覺轉(zhuǎn)換器。為此,我們需要通過 pip 安裝 fastervit。

pip install fastervit

讓我們導(dǎo)入我們剛剛通過 pip 安裝的 pytorch 庫以及更快視覺轉(zhuǎn)換器庫。

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import os

在這個實現(xiàn)中,我從 Kaggle 下載了損壞道路數(shù)據(jù)集。在這里檢查。然后將它們分割為訓(xùn)練和驗證數(shù)據(jù)集。之后加載數(shù)據(jù)集并應(yīng)用數(shù)據(jù)轉(zhuǎn)換。

data_dir = 'sih_road_dataset'

# Define data transformations
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

# Load datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: DataLoader(image_datasets[x], batch_size=32, shuffle=True, num_workers=4) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
class_names = image_datasets['train'].classes
接下來我們將加載更快視覺轉(zhuǎn)換器模型。
# Load the FasterViT model and modify it for your number of classes.

from fastervit import create_model

# Load FasterViT model
model = create_model('faster_vit_0_224', 
                     pretrained=True,
                     model_path="faster_vit_0.pth.tar")

# Print the model architecture
print(model)

接下來我們將加載更快視覺轉(zhuǎn)換器模型。

# Load the FasterViT model and modify it for your number of classes.

from fastervit import create_model

# Load FasterViT model
model = create_model('faster_vit_0_224', 
                     pretrained=True,
                     model_path="faster_vit_0.pth.tar")

# Print the model architecture
print(model)

當(dāng)我們打印模型時,我們可以看到末尾的頭部層,這是需要修改以進(jìn)行微調(diào)的部分。

為了針對您的自定義分類任務(wù)修改這一層,您應(yīng)該用一個具有適當(dāng)數(shù)量輸出類別的新線性層替換頭部層。

# Modify the final layer for custom classification
num_ftrs = model.head.in_features
model.head = torch.nn.Linear(num_ftrs, len(class_names))

# Move the model to GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

接下來指定優(yōu)化器和學(xué)習(xí)率,

import torch.optim as optim
from torch.optim import lr_scheduler

# Define loss function
criterion = torch.nn.CrossEntropyLoss()

# Define optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Learning rate scheduler
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

好的,現(xiàn)在一切都已定義,我們現(xiàn)在指定將用于訓(xùn)練我們模型的自定義數(shù)據(jù)集的訓(xùn)練函數(shù)。


import time
import copy

def train_model(model, criterion, optimizer, scheduler, num_epochs=5):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # Zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # Backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # Deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best val Acc: {best_acc:.4f}')

    # Load best model weights
    model.load_state_dict(best_model_wts)
    return model

下一步是啟動訓(xùn)練過程!

# Train the model
model = train_model(model, criterion, optimizer, exp_lr_scheduler, num_epochs=5)

# Save the model
torch.save(model.state_dict(), 'faster_vit_custom_model.pth')

請注意,這不是最好的模型,因為我們可以看到模型在訓(xùn)練數(shù)據(jù)集上過擬合了。本文的主要目的是演示如何實現(xiàn) Faster Vision Transformer 并在自定義數(shù)據(jù)集上訓(xùn)練它們。還有其他方法可以解決過擬合問題。

讓我們對下面的圖像進(jìn)行訓(xùn)練過的模型的快速測試:

import torch
from torchvision import transforms
from PIL import Image
from fastervit import create_model

# Define the number of classes in your custom dataset
num_classes = 4  # Replace with your actual number of classes

# Create the model architecture
model = create_model('faster_vit_0_224', pretrained=False)

# Modify the final classification layer to match the number of classes in your custom dataset
model.head = torch.nn.Linear(model.head.in_features, num_classes)

# Move the model to GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Load the trained model weights
model.load_state_dict(torch.load('faster_vit_custom_model.pth'))
model.eval()  # Set the model to evaluation mode

# Define data transformations for the input image
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

# Function to load and preprocess the image
def load_image(image_path):
    image = Image.open(image_path).convert('RGB')
    image = preprocess(image)
    image = image.unsqueeze(0)  # Add batch dimension
    return image.to(device)

# Function to make predictions
def predict(image_path, model, class_names):
    image = load_image(image_path)
    with torch.no_grad():
        outputs = model(image)
        _, preds = torch.max(outputs, 1)
        predicted_class = class_names[preds.item()]
    return predicted_class

# List of class names (ensure this matches your custom dataset's classes)
class_names = ['good', 'poor', 'satisfactory', 'very_poor']  # Replace with your actual class names

# Example usage
image_path = 'test_img.jpg'
predicted_class = predict(image_path, model, class_names)
print(predicted_class)

預(yù)測的類別是,

責(zé)任編輯:趙寧寧 來源: 小白玩轉(zhuǎn)Python
相關(guān)推薦

2024-11-21 16:06:02

2023-01-05 16:51:04

機(jī)器學(xué)習(xí)人工智能

2022-09-29 23:53:06

機(jī)器學(xué)習(xí)遷移學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)

2022-10-30 15:00:40

小樣本學(xué)習(xí)數(shù)據(jù)集機(jī)器學(xué)習(xí)

2022-06-29 09:00:00

前端圖像分類模型SQL

2023-11-30 09:55:27

鴻蒙鄰分類器

2022-06-16 10:29:33

神經(jīng)網(wǎng)絡(luò)圖像分類算法

2018-04-09 10:20:32

深度學(xué)習(xí)

2024-08-23 08:57:13

PyTorch視覺轉(zhuǎn)換器ViT

2017-11-23 14:35:36

2024-06-03 07:55:00

2022-11-11 15:07:50

深度學(xué)習(xí)函數(shù)鑒別器

2022-01-12 17:53:52

Transformer數(shù)據(jù)人工智能

2020-10-10 12:53:57

邏輯回歸機(jī)器學(xué)習(xí)分析

2022-06-05 21:16:08

機(jī)器學(xué)習(xí)Python

2018-03-26 20:49:08

圖像分類

2023-01-06 19:02:23

應(yīng)用技術(shù)

2025-01-17 10:30:00

2022-08-15 15:16:20

機(jī)器學(xué)習(xí)圖片深度學(xué)習(xí)

2023-01-11 07:28:49

TensorFlow分類模型
點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號