Unsloth揭秘:如何將模型微調(diào)效率提升2-5倍 精華
一、Unsloth 簡(jiǎn)介
Unsloth 是一個(gè)專門為模型微調(diào)而設(shè)計(jì)的框架,它旨在解決模型微調(diào)過程中常見的訓(xùn)練速度慢、顯存占用高等問題。通過一系列創(chuàng)新的技術(shù)和優(yōu)化策略,Unsloth 能夠顯著提高模型微調(diào)的效率,使得開發(fā)者能夠在更短的時(shí)間內(nèi)獲得更好的模型性能。
二、Unsloth 的主要優(yōu)勢(shì)
1. 快速的訓(xùn)練速度
在對(duì)主流模型(如 llama - 3、qwen2、mistral 等)進(jìn)行微調(diào)時(shí),Unsloth 展現(xiàn)出了令人矚目的訓(xùn)練速度提升。相比其他傳統(tǒng)的微調(diào)方法,它的速度可以提高 2 至 5 倍。這意味著開發(fā)者能夠更快地完成模型的訓(xùn)練過程,大大縮短了開發(fā)周期。例如,在處理大規(guī)模文本數(shù)據(jù)時(shí),Unsloth 能夠迅速收斂,減少了訓(xùn)練時(shí)間,讓開發(fā)者能夠更快地看到模型的效果。
2. 低顯存占用
顯存占用是模型微調(diào)過程中一個(gè)關(guān)鍵的問題,尤其是對(duì)于一些資源有限的設(shè)備。Unsloth 巧妙地解決了這個(gè)問題,它最大能夠減少約 70%的顯存使用量。這使得即使在顯存有限的硬件上,如一些中低端的 GPU 設(shè)備,也能夠順利進(jìn)行模型微調(diào)訓(xùn)練。這一優(yōu)勢(shì)為更多開發(fā)者提供了機(jī)會(huì),讓他們能夠在不同的硬件環(huán)境下開展工作,而不必?fù)?dān)心硬件資源的限制。
三、Unsloth 的技術(shù)特點(diǎn)
1. 強(qiáng)大的兼容性
Unsloth 支持多種硬件設(shè)置,涵蓋了從 Nvidia Tesla T4 到 H100 等不同型號(hào)的 GPU。不僅如此,它還擴(kuò)展到了 AMD 和英特爾 GPU 的兼容性,這為使用不同硬件的開發(fā)者提供了極大的便利。無論你使用的是哪種 GPU 設(shè)備,都可以嘗試使用 Unsloth 進(jìn)行模型微調(diào)。這種廣泛的兼容性使得 Unsloth 能夠在不同的硬件平臺(tái)上發(fā)揮出其優(yōu)勢(shì),為開發(fā)者提供了更多的選擇。
2. 優(yōu)化的內(nèi)存使用
Unsloth 采用了智能權(quán)重上投等開創(chuàng)性技術(shù),在 QLoRA 過程中減少了權(quán)重上投的必要性,從而有效地優(yōu)化了內(nèi)存使用。通過這種方式,它能夠更好地利用硬件資源,提高模型訓(xùn)練的效率。此外,Unsloth 還能夠迅速利用 BFloat16,提高 16 位訓(xùn)練的穩(wěn)定性,進(jìn)一步加快了 QLoRA 的微調(diào)過程。這種對(duì)內(nèi)存和計(jì)算資源的精細(xì)管理,使得 Unsloth 在處理大規(guī)模模型和數(shù)據(jù)時(shí)表現(xiàn)出色。
四、Unsloth 的使用體驗(yàn)
1.安裝 Unsloth
安裝 Unsloth 相對(duì)簡(jiǎn)單,你可以通過以下命令進(jìn)行安裝:`pip install "unsloth(cu121 - torch230)@git + https://github.com/unslothai/unsloth.git"`。當(dāng)然,具體的安裝命令可能會(huì)因環(huán)境和需求的不同而有所差異。在安裝過程中,建議參考官方文檔,以確保安裝的順利進(jìn)行。
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
執(zhí)行如下:
2.鏡像設(shè)置
由于網(wǎng)絡(luò)原因,可能無法訪問huggingface上的資源,可以使用國(guó)內(nèi)的鏡像站。???https://hf-mirror.com??
1)安裝依賴
pip install -U huggingface_hub
2)設(shè)置環(huán)境變量
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'
3.模型加載
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
"unsloth/Meta-Llama-3.1-8B-bnb-4bit", # Llama-3.1 2x faster
"unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
"unsloth/Meta-Llama-3.1-70B-bnb-4bit",
"unsloth/Meta-Llama-3.1-405B-bnb-4bit", # 4bit for 405b!
"unsloth/Mistral-Small-Instruct-2409", # Mistral 22b 2x faster!
"unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
"unsloth/Phi-3.5-mini-instruct", # Phi-3.5 2x faster!
"unsloth/Phi-3-medium-4k-instruct",
"unsloth/gemma-2-9b-bnb-4bit",
"unsloth/gemma-2-27b-bnb-4bit", # Gemma 2x faster!
"unsloth/Llama-3.2-1B-bnb-4bit", # NEW! Llama 3.2 models
"unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
"unsloth/Llama-3.2-3B-bnb-4bit",
"unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
] # More models at https://huggingface.co/unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Llama-3.2-3B-Instruct", # or choose "unsloth/Llama-3.2-1B-Instruct"
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
# token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
加載如下:
4.LoRA 配置
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
5.數(shù)據(jù)集準(zhǔn)備
使用 Maxime Labonne 的 ShareGPT 風(fēng)格的 FineTome-100k 數(shù)據(jù)集。
??https://huggingface.co/datasets/mlabonne/FineTome-100k??
將 ("from", "value")格式,替換為("role", "content") 格式
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)
def formatting_prompts_func(examples):
convos = examples["conversations"]
texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
return { "text" : texts, }
pass
from datasets import load_dataset
dataset = load_dataset("mlabonne/FineTome-100k", split = "train")
數(shù)據(jù)集讀取
我們現(xiàn)在使用`standardize_sharegpt`將sharegpt風(fēng)格的數(shù)據(jù)集轉(zhuǎn)換為HuggingFace的通用格式。
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)
抽查第5條記錄的數(shù)據(jù)格式
dataset[5]["conversations"]
輸出:
[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
'role': 'user'},
{'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
'role': 'assistant'}]
查看第5條記錄,模板格式化后的效果
dataset[5]["text"]
輸出:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
6.模型訓(xùn)練
配置訓(xùn)練參數(shù)
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
# num_train_epochs = 1, # Set this for 1 full training run.
max_steps = 60,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
使用 Unsloth 的方法只在助手輸出上進(jìn)行訓(xùn)練,而忽略用戶用戶的inputs
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
trainer,
instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)
檢查的掩碼處理后的,輸入的input_ids
tokenizer.decode(trainer.train_dataset[5]["input_ids"])
輸出:
'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nHow do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
檢查的掩碼處理后,輸入的labels
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])
輸出:
' \n\nAstronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|eot_id|>'
我們可以看到系統(tǒng)和指令提示已成功屏蔽!
開始模型訓(xùn)練
trainer_stats = trainer.train()
訓(xùn)練效果如下:
7.模型推理
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
tokenizer,
chat_template = "llama-3.1",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
outputs = model.generate(input_ids = inputs, max_new_tokens = 64, use_cache = True,
temperature = 1.5, min_p = 0.1)
tokenizer.batch_decode(outputs)
輸出:
['<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nCutting Knowledge Date: December 2023\nToday Date: 26 July 2024\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nContinue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nThe next two terms would be 13 and 21.\n\nFibonacci Sequence: 1, 1, 2, 3, 5, 8, 13, 21.<|eot_id|>']
8.保存微調(diào)模型
model.save_pretrained("lora_model") # Local saving
tokenizer.save_pretrained("lora_model")
9.加載微調(diào)模型并推理
if False:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
messages = [
{"role": "user", "content": "Describe a tall tower in the capital of France."},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = True,
add_generation_prompt = True, # Must add for generation
return_tensors = "pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
use_cache = True, temperature = 1.5, min_p = 0.1)
推理結(jié)果如下:
The Eiffel Tower is a famous tall structure located in Paris, the capital of France. It was built for the 1889 World's Fair and stands at a height of 324 meters (1,063 feet) high. The Eiffel Tower has become a symbol of Paris and is often referred to as the Iron Lady. Its construction was designed by Gustave Eiffel, a French engineer, and it was intended to be a temporary structure. However, it has remained standing for over a century and has become an iconic landmark in the city.<|eot_id|>
五、Unsloth 在實(shí)際項(xiàng)目中的應(yīng)用
Unsloth 的高效性和靈活性使其在眾多領(lǐng)域都有著廣泛的應(yīng)用前景。
在自然語(yǔ)言處理任務(wù)中,如文本分類、情感分析、機(jī)器翻譯等,Unsloth 可以幫助開發(fā)者快速微調(diào)預(yù)訓(xùn)練模型,以適應(yīng)不同的數(shù)據(jù)集和任務(wù)需求。通過減少訓(xùn)練時(shí)間和顯存占用,開發(fā)者可以更高效地進(jìn)行實(shí)驗(yàn)和優(yōu)化,提高模型的性能。
在對(duì)話系統(tǒng)開發(fā)中,Unsloth 能夠讓開發(fā)者快速訓(xùn)練出個(gè)性化的對(duì)話模型。通過對(duì)大規(guī)模對(duì)話數(shù)據(jù)的微調(diào),模型可以更好地理解用戶的輸入,并生成更加自然和準(zhǔn)確的回復(fù)。這對(duì)于構(gòu)建智能客服、聊天機(jī)器人等應(yīng)用具有重要意義。
此外,在內(nèi)容生成領(lǐng)域,如文章寫作、故事創(chuàng)作等方面,Unsloth 也可以發(fā)揮其優(yōu)勢(shì)。開發(fā)者可以利用 Unsloth 微調(diào)語(yǔ)言模型,使其能夠根據(jù)給定的主題或提示生成高質(zhì)量的文本內(nèi)容。
六、總結(jié)與展望
Unsloth 作為一個(gè)強(qiáng)大的預(yù)訓(xùn)練模型微調(diào)框架,為開發(fā)者提供了高效、便捷的模型微調(diào)解決方案。它的快速訓(xùn)練速度、低顯存占用以及廣泛的兼容性等優(yōu)勢(shì),使其在人工智能領(lǐng)域具有重要的地位。通過合理地使用 Unsloth,開發(fā)者可以更加輕松地將預(yù)訓(xùn)練模型應(yīng)用到實(shí)際項(xiàng)目中,推動(dòng)人工智能技術(shù)的發(fā)展和應(yīng)用。
當(dāng)然,Unsloth 也在不斷發(fā)展和完善中。未來,我們可以期待它在更多方面的創(chuàng)新和突破,為模型微調(diào)帶來更多的驚喜和可能性。同時(shí),我們也希望更多的開發(fā)者能夠關(guān)注和使用 Unsloth,共同探索人工智能的無限潛力。
