DeepSeek 協(xié)程異步API 調(diào)用與llamafactory本地vllm部署推理
簡介
使用協(xié)程調(diào)用DeepSeek的API,發(fā)現(xiàn)效果并不明顯,沒有加速的效果。
但如是本地部署DeepSeek,本地部署需要支持異步調(diào)用,我使用 llamafactory 部署,發(fā)現(xiàn)協(xié)程加速的效果還是很顯著的。
代碼實戰(zhàn)
調(diào)用官方API
DeepSeek官方文檔 https://api-docs.deepseek.com/zh-cn/
python 的調(diào)用代碼如下,該調(diào)用方式為同步調(diào)用速度很慢。
# Please install OpenAI SDK first: `pip3 install openai`
from openai import OpenAI
client = OpenAI(api_key="<DeepSeek API Key>", base_url="https://api.deepseek.com")
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"},
],
stream=False
)
print(response.choices[0].message.content)
import os
from tqdm import tqdm
from dotenv import load_dotenv
# 加載 .env 文件的密鑰
load_dotenv()
api_key = os.getenv("deepseek_api")
queries = [
"What is AI?",
"How does deep learning work?",
"Explain reinforcement learning.",
"人工智能的應用領域有哪些?",
"大模型是如何進行預訓練的?",
"什么是自監(jiān)督學習,它有哪些優(yōu)勢?",
"Transformer 結構的核心組件是什么?",
"GPT 系列模型是如何生成文本的?",
"強化學習在游戲 AI 中的應用有哪些?",
"目前 AI 領域面臨的主要挑戰(zhàn)是什么?"
]
answer1 = []
for query in tqdm(queries):
# 官方提供的API調(diào)用方式
response = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello"},
],
stream=False,
)
content = response.choices[0].message.content
answer1.append(content)
為了防止在分享代碼的時候,導致 API Key 泄露,我把key保存到 .env 文件中,通過??load_dotenv?
?加載密鑰。
協(xié)程異步調(diào)用
import asyncio
from typing import List
# from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
# 初始化模型
llm = ChatOpenAI(
model_name="deepseek-chat",
# model_name="deepseek-reasoner",
openai_api_key=api_key,
openai_api_base="https://api.deepseek.com/v1",
)
async def call_deepseek_async(query: str, progress) -> str:
messages = [
SystemMessage(cnotallow="You are a helpful assistant"),
HumanMessage(cnotallow=query),
]
response = await llm.ainvoke(messages)
progress.update(1)
return response.content
async def batch_call_deepseek(queries: List[str], concurrency: int = 5) -> List[str]:
semaphore = asyncio.Semaphore(concurrency)
progress_bar = tqdm(total=len(queries), desc="Async:")
async def limited_call(query: str):
async with semaphore:
return await call_deepseek_async(query, progress_bar)
tasks = [limited_call(query) for query in queries]
return await asyncio.gather(*tasks)
# for python script
# responses = asyncio.run(batch_call_deepseek(queries, cnotallow=10))
# for jupyter
response = await batch_call_deepseek(queries, cnotallow=10)
注意:異步調(diào)用需要使用 await 等待。
下述是tqdm 另外的一種,協(xié)程進度條的寫法:
from tqdm.asyncio import tqdm_asyncio
results = await tqdm_asyncio.gather(*tasks)
上述的異步協(xié)程代碼,我調(diào)用DeepSeek的API,沒有加速效果,我懷疑官方進行了限速。
我使用本地llamafactory部署的DeepSeek,上述異步協(xié)程的效果加速明顯。
llamafactory vllm本地部署 deepseek的腳本,只支持 linux 系統(tǒng)。
??deepseek_7B.yaml?
? 文件內(nèi)容:
model_name_or_path: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
template: deepseek3
infer_backend: vllm
vllm_enforce_eager: true
trust_remote_code: true
linux 部署腳本:
nohup llamafactory-cli api deepseek_7B.yaml > deepseek_7B.log 2>&1 &
異步協(xié)程 方法二
下述是 ChatGPT 生成的另外一種異步協(xié)程寫法。
(下述方法我沒有在本地部署的API上測試過,僅供大家參考)
import asyncio
from tqdm.asyncio import tqdm_asyncio
answer = []
async def fetch(query):
response = await client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": query},
],
stream=False,
)
return response.choices[0].message.content
async def main():
tasks = [fetch(query) for query in queries]
results = await tqdm_asyncio.gather(*tasks)
answer.extend(results)
asyncio.run(main())
vllm_infer
如果你是linux系統(tǒng),那么相比API調(diào)用,最快的方式就是vllm推理。
你需要使用下述腳本,
???https://github.com/hiyouga/LLaMA-Factory/blob/main/scripts/vllm_infer.py??
python vllm_infer.py \
--model_name_or_path deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
--template deepseek3 \
--dataset industry_cls \
--dataset_dir ../../data/llamafactory_dataset/ \
--save_name output/generated_predictions.jsonl
llamafactory 可以指定自定義的數(shù)據(jù)集地址,你需要構建相應格式的數(shù)據(jù)集文件。
數(shù)據(jù)集文件夾下的文件:
本文轉(zhuǎn)載自??AI悠閑區(qū)??,作者:jieshenai
