谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二

作者：十三 2020-01-03 16:00:28

新聞前端

比BERT模型參數(shù)小18倍，性能還超越了它。這就是谷歌前不久發(fā)布的輕量級(jí)BERT模型——ALBERT。

本文經(jīng)AI新媒體量子位（公眾號(hào)ID:QbitAI）授權(quán)轉(zhuǎn)載，轉(zhuǎn)載請(qǐng)聯(lián)系出處。

比BERT模型參數(shù)小18倍，性能還超越了它。

這就是谷歌前不久發(fā)布的輕量級(jí)BERT模型——ALBERT。

不僅如此，還橫掃各大“性能榜”，在SQuAD和RACE測試上創(chuàng)造了新的SOTA。

而最近，谷歌開源了中文版本和Version 2，項(xiàng)目還登上了GitHub熱榜第二。

谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二">

ALBERT 2性能再次提升

在這個(gè)版本中，“no dropout”、“additional training data”、“long training time”策略將應(yīng)用到所有的模型。

與初代ALBERT性能相比結(jié)果如下。

谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二">

從性能的比較來說，對(duì)于ALBERT-base、ALBERT-large和ALBERT-xlarge，v2版要比v1版好得多。

說明采用上述三個(gè)策略的重要性。

平均來看，ALBERT-xxlarge比v1略差一些，原因有以下2點(diǎn)：

額外訓(xùn)練了1.5M步(兩個(gè)模型的唯一區(qū)別就是訓(xùn)練1.5M和3M步)；
對(duì)于v1，在BERT、Roberta和XLnet給出的參數(shù)集中做了一點(diǎn)超參數(shù)搜索；對(duì)于v2，只是采用除RACE之外的V1參數(shù)，其中使用的學(xué)習(xí)率為1e-5和0 ALBERT DR。

總的來說，Albert是BERT的輕量版，使用減少參數(shù)的技術(shù)，允許大規(guī)模的配置，克服以前的內(nèi)存限制。

谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二">

Albert使用了一個(gè)單模型設(shè)置，在 GLUE 基準(zhǔn)測試中的性能：

谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二">

Albert-xxl使用了一個(gè)單模型設(shè)置，在SQuaD和RACE基準(zhǔn)測試中的性能：

谷歌ALBERT模型V2+中文版來了，GitHub熱榜第二">

中文版下載地址

Base
https://storage.googleapis.com/albert_models/albert_base_zh.tar.gz

Large
https://storage.googleapis.com/albert_models/albert_large_zh.tar.gz

XLarge
https://storage.googleapis.com/albert_models/albert_xlarge_zh.tar.gz

Xxlarge
https://storage.googleapis.com/albert_models/albert_xxlarge_zh.tar.gz

ALBERT v2下載地址

Base
[Tar File]：
https://storage.googleapis.com/albert_models/albert_base_v2.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_base/2

Large
[Tar File]：
https://storage.googleapis.com/albert_models/albert_large_v2.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_large/2

XLarge
[Tar File]：
https://storage.googleapis.com/albert_models/albert_xlarge_v2.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_xlarge/2

Xxlarge
[Tar File]：
https://storage.googleapis.com/albert_models/albert_xxlarge_v2.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_xxlarge/2

預(yù)訓(xùn)練模型

可以使用 TF-Hub 模塊：

Base
[Tar File]：
https://storage.googleapis.com/albert_models/albert_base_v1.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_base/1

Large
[Tar File]：
https://storage.googleapis.com/albert_models/albert_large_v1.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_large/1

XLarge
[Tar File]：
https://storage.googleapis.com/albert_models/albert_xlarge_v1.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_xlarge/1

Xxlarge
[Tar File]：
https://storage.googleapis.com/albert_models/albert_xxlarge_v1.tar.gz
[TF-Hub]：
https://tfhub.dev/google/albert_xxlarge/1

TF-Hub模塊使用示例：

tags=set()ifis_training:tags.add("train")albert_module=hub.Module("https://tfhub.dev/google/albert_base/1",tags=tags,trainable=True)albert_inputs=dict(input_ids=input_ids,input_mask=input_mask,segment_ids=segment_ids)albert_outputs=albert_module(inputs=albert_inputs,signature="tokens",as_dict=True)#Ifyouwanttousethetoken-leveloutput,use#albert_outputs["sequence_output"]instead.output_layer=albert_outputs["pooled_output"]

預(yù)訓(xùn)練說明

要預(yù)訓(xùn)練ALBERT，可以使用run_pretraining.py：

pipinstall-ralbert/requirements.txtpython-malbert.run_pretraining\--input_file=...\--output_dir=...\--init_checkpoint=...\--albert_config_file=...\--do_train\--do_eval\--train_batch_size=4096\--eval_batch_size=64\--max_seq_length=512\--max_predictions_per_seq=20\--optimizer='lamb'\--learning_rate=.00176\--num_train_steps=125000\--num_warmup_steps=3125\--save_checkpoints_steps=5000

GLUE上的微調(diào)

要對(duì) GLUE 進(jìn)行微調(diào)和評(píng)估，可以參閱該項(xiàng)目中的run_glue.sh文件。

底層的用例可能希望直接使用run_classifier.py腳本。

run_classifier.py可對(duì)各個(gè) GLUE 基準(zhǔn)測試任務(wù)進(jìn)行微調(diào)和評(píng)估。

比如 MNLI：

pipinstall-ralbert/requirements.txtpython-malbert.run_classifier\--vocab_file=...\--data_dir=...\--output_dir=...\--init_checkpoint=...\--albert_config_file=...\--spm_model_file=...\--do_train\--do_eval\--do_predict\--do_lower_case\--max_seq_length=128\--optimizer=adamw\--task_name=MNLI\--warmup_step=1000\--learning_rate=3e-5\--train_step=10000\--save_checkpoints_steps=100\--train_batch_size=128

可以在run_glue.sh中找到每個(gè)GLUE任務(wù)的default flag。

從TF-Hub模塊開始微調(diào)模型：

albert_hub_module_handle==https://tfhub.dev/google/albert_base/1

在評(píng)估之后，腳本應(yīng)該報(bào)告如下輸出：

*****Evalresults*****global_step=...loss=...masked_lm_accuracy=...masked_lm_loss=...sentence_order_accuracy=...sentence_order_loss=...

在SQuAD上微調(diào)

要對(duì) SQuAD v1上的預(yù)訓(xùn)練模型進(jìn)行微調(diào)和評(píng)估，請(qǐng)使用 run SQuAD v1.py 腳本:

pipinstall-ralbert/requirements.txtpython-malbert.run_squad_v1\--albert_config_file=...\--vocab_file=...\--output_dir=...\--train_file=...\--predict_file=...\--train_feature_file=...\--predict_feature_file=...\--predict_feature_left_file=...\--init_checkpoint=...\--spm_model_file=...\--do_lower_case\--max_seq_length=384\--doc_stride=128\--max_query_length=64\--do_train=true\--do_predict=true\--train_batch_size=48\--predict_batch_size=8\--learning_rate=5e-5\--num_train_epochs=2.0\--warmup_proportion=.1\--save_checkpoints_steps=5000\--n_best_size=20\--max_answer_length=30

對(duì)于 SQuAD v2，使用 run SQuAD v2.py 腳本：

pipinstall-ralbert/requirements.txtpython-malbert.run_squad_v2\--albert_config_file=...\--vocab_file=...\--output_dir=...\--train_file=...\--predict_file=...\--train_feature_file=...\--predict_feature_file=...\--predict_feature_left_file=...\--init_checkpoint=...\--spm_model_file=...\--do_lower_case\--max_seq_length=384\--doc_stride=128\--max_query_length=64\--do_train\--do_predict\--train_batch_size=48\--predict_batch_size=8\--learning_rate=5e-5\--num_train_epochs=2.0\--warmup_proportion=.1\--save_checkpoints_steps=5000\--n_best_size=20\--max_answer_length=30