自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

通過PyTorch來創(chuàng)建一個文本分類的Bert模型

作者：小sen 2021-08-30 09:25:25

開發(fā) 后端

在本文中，介紹了一種稱為BERT(帶轉(zhuǎn)換器Transformers的雙向編碼Encoder 器表示)的語言模型，該模型在問答、自然語言推理、分類和通用語言理解評估或 (GLUE)等任務(wù)中取得了最先進(jìn)的性能.

2018 年，谷歌發(fā)表了一篇題為《Pre-training of deep bidirectional Transformers for Language Understanding》的論文。

在本文中，介紹了一種稱為BERT(帶轉(zhuǎn)換器Transformers的雙向編碼Encoder 器表示)的語言模型，該模型在問答、自然語言推理、分類和通用語言理解評估或 (GLUE)等任務(wù)中取得了最先進(jìn)的性能.

BERT全稱為Bidirectional Encoder Representation from Transformers[1]，是一種用于語言表征的預(yù)訓(xùn)練模型。

它基于谷歌2017年發(fā)布的Transformer架構(gòu)，通常的Transformer使用一組編碼器和解碼器網(wǎng)絡(luò)，而BERT只需要一個額外的輸出層，對預(yù)訓(xùn)練進(jìn)行fine-tune，就可以滿足各種任務(wù)，根本沒有必要針對特定任務(wù)對模型進(jìn)行修改。

BERT將多個Transformer編碼器堆疊在一起。Transformer基于著名的多頭注意力(Multi-head Attention)模塊，該模塊在視覺和語言任務(wù)方面都取得了巨大成功。

在本文中，我們將使用 PyTorch來創(chuàng)建一個文本分類的Bert模型。

筆者介今天紹一個python庫 --- simpletransformers，可以很好的解決高級預(yù)訓(xùn)練語言模型使用困難的問題。

simpletransformers使得高級預(yù)訓(xùn)練模型(BERT、RoBERTa、XLNet、XLM、DistilBERT、ALBERT、CamemBERT、XLM-RoBERTa、FlauBERT)的訓(xùn)練、評估和預(yù)測變得簡單，每條只需3行即可初始化模型。

數(shù)據(jù)集來源：https://www.kaggle.com/jrobischon/wikipedia-movie-plots

該數(shù)據(jù)集包含對來自世界各地的 34,886 部電影的描述。列描述如下：

發(fā)行年份：電影發(fā)行的年份
標(biāo)題：電影標(biāo)題
起源：電影的起源(即美國、寶萊塢、泰米爾等)
劇情：主要演員
類型：電影類型
維基頁面- 從中抓取情節(jié)描述的維基百科頁面的 URL
情節(jié)：電影情節(jié)的長篇描述

import numpy as np 
import pandas as pd 
import os, json, gc, re, random 
from tqdm.notebook import tqdm 
import torch, transformers, tokenizers 
movies_df = pd.read_csv("wiki_movie_plots_deduped.csv") 
from sklearn.preprocessing import LabelEncoder 
 
movies_df = movies_df[(movies_df["Origin/Ethnicity"]=="American") | (movies_df["Origin/Ethnicity"]=="British")] 
movies_df = movies_df[["Plot", "Genre"]] 
drop_indices = movies_df[movies_df["Genre"] == "unknown" ].index 
movies_df.drop(drop_indices, inplace=True) 
 
# Combine genres: 1) "sci-fi" with "science fiction" &  2) "romantic comedy" with "romance" 
movies_df["Genre"].replace({"sci-fi": "science fiction", "romantic comedy": "romance"}, inplace=True) 
 
# 根據(jù)頻率選擇電影類型 
shortlisted_genres = movies_df["Genre"].value_counts().reset_index(name="count").query("count > 200")["index"].tolist() 
movies_df = movies_df[movies_df["Genre"].isin(shortlisted_genres)].reset_index(drop=True) 
 
# Shuffle  
movies_df = movies_df.sample(frac=1).reset_index(drop=True) 
 
#從不同類型中抽取大致相同數(shù)量的電影情節(jié)樣本（以減少階級不平衡問題） 
movies_df = movies_df.groupby("Genre").head(400).reset_index(drop=True) 
label_encoder = LabelEncoder() 
movies_df["genre_encoded"] = label_encoder.fit_transform(movies_df["Genre"].tolist()) 
movies_df = movies_df[["Plot", "Genre", "genre_encoded"]] 
movies_df

使用 torch 加載 BERT 模型，最簡單的方法是使用 Simple Transformers 庫，以便只需 3 行代碼即可初始化、在給定數(shù)據(jù)集上訓(xùn)練和在給定數(shù)據(jù)集上評估 Transformer 模型。

from simpletransformers.classification import ClassificationModel 
 
# 模型參數(shù) 
model_args = { 
    "reprocess_input_data": True, 
    "overwrite_output_dir": True, 
    "save_model_every_epoch": False, 
    "save_eval_checkpoints": False, 
    "max_seq_length": 512, 
    "train_batch_size": 16, 
    "num_train_epochs": 4, 
} 
 
# Create a ClassificationModel 
model = ClassificationModel('bert', 'bert-base-cased', num_labels=len(shortlisted_genres), args=model_args)

訓(xùn)練模型

train_df, eval_df = train_test_split(movies_df, test_size=0.2, stratify=movies_df["Genre"], random_state=42) 
 
# Train the model 
model.train_model(train_df[["Plot", "genre_encoded"]]) 
 
# Evaluate the model 
result, model_outputs, wrong_predictions = model.eval_model(eval_df[["Plot", "genre_encoded"]]) 
print(result) 
 
{'mcc': 0.5299659404649717, 'eval_loss': 1.4970421879083518} 
CPU times: user 19min 1s, sys: 4.95 s, total: 19min 6s 
Wall time: 20min 14s

關(guān)于simpletransformers的官方文檔：https://simpletransformers.ai/docs

Github鏈接：https://github.com/ThilinaRajapakse/simpletransformers

責(zé)任編輯：姜華來源： Python之王

Bert模型 PyTorch 語言

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<blockquote id="ynpol"></blockquote>

<sub id="ynpol"></sub>

<sup id="ynpol"></sup>