Meta官方的Prompt工程指南:Llama 2這樣用更高效
隨著大型語(yǔ)言模型(LLM)技術(shù)日漸成熟,提示工程(Prompt Engineering)變得越來(lái)越重要。一些研究機(jī)構(gòu)發(fā)布了 LLM 提示工程指南,包括微軟、OpenAI 等等。
最近,Llama 系列開(kāi)源模型的提出者 Meta 也針對(duì) Llama 2 發(fā)布了一份交互式提示工程指南,涵蓋了 Llama 2 的快速工程和最佳實(shí)踐。
以下是這份指南的核心內(nèi)容。
Llama 模型
2023 年,Meta 推出了 Llama 、Llama 2 模型。較小的模型部署和運(yùn)行成本較低,而更大的模型能力更強(qiáng)。
Llama 2 系列模型參數(shù)規(guī)模如下:
Code Llama 是一個(gè)以代碼為中心的 LLM,建立在 Llama 2 的基礎(chǔ)上,也有各種參數(shù)規(guī)模和微調(diào)變體:
部署 LLM
LLM 可以通過(guò)多種方式部署和訪問(wèn),包括:
自托管(Self-hosting):使用本地硬件來(lái)運(yùn)行推理,例如使用 llama.cpp 在 Macbook Pro 上運(yùn)行 Llama 2。優(yōu)勢(shì):自托管最適合有隱私 / 安全需要的情況,或者您擁有足夠的 GPU。
云托管:依靠云提供商來(lái)部署托管特定模型的實(shí)例,例如通過(guò) AWS、Azure、GCP 等云提供商來(lái)運(yùn)行 Llama 2。優(yōu)勢(shì):云托管是最適合自定義模型及其運(yùn)行時(shí)的方式。
托管 API:通過(guò) API 直接調(diào)用 LLM。有許多公司提供 Llama 2 推理 API,包括 AWS Bedrock、Replicate、Anyscale、Together 等。優(yōu)勢(shì):托管 API 是總體上最簡(jiǎn)單的選擇。
托管 API
托管 API 通常有兩個(gè)主要端點(diǎn)(endpoint):
1. completion:生成對(duì)給定 prompt 的響應(yīng)。
2. chat_completion:生成消息列表中的下一條消息,為聊天機(jī)器人等用例提供更明確的指令和上下文。
token
LLM 以稱(chēng)為 token 的塊的形式來(lái)處理輸入和輸出,每個(gè)模型都有自己的 tokenization 方案。比如下面這句話:
Our destiny is written in the stars.
Llama 2 的 tokenization 為 ["our", "dest", "iny", "is", "writing", "in", "the", "stars"]??紤] API 定價(jià)和內(nèi)部行為(例如超參數(shù))時(shí),token 顯得尤為重要。每個(gè)模型都有一個(gè) prompt 不能超過(guò)的最大上下文長(zhǎng)度,Llama 2 是 4096 個(gè) token,而 Code Llama 是 100K 個(gè) token。
Notebook 設(shè)置
作為示例,我們使用 Replicate 調(diào)用 Llama 2 chat,并使用 LangChain 輕松設(shè)置 chat completion API。
首先安裝先決條件:
pip install langchain replicate
from typing import Dict, List
from langchain.llms import Replicate
from langchain.memory import ChatMessageHistory
from langchain.schema.messages import get_buffer_string
import os
# Get a free API key from https://replicate.com/account/api-tokens
os.environ ["REPLICATE_API_TOKEN"] = "YOUR_KEY_HERE"
LLAMA2_70B_CHAT = "meta/llama-2-70b-chat:2d19859030ff705a87c746f7e96eea03aefb71f166725aee39692f1476566d48"
LLAMA2_13B_CHAT = "meta/llama-2-13b-chat:f4e2de70d66816a838a89eeeb621910adffb0dd0baba3976c96980970978018d"
# We'll default to the smaller 13B model for speed; change to LLAMA2_70B_CHAT for more advanced (but slower) generations
DEFAULT_MODEL = LLAMA2_13B_CHAT
def completion (
prompt: str,
model: str = DEFAULT_MODEL,
temperature: float = 0.6,
top_p: float = 0.9,
) -> str:
llm = Replicate (
model=model,
model_kwargs={"temperature": temperature,"top_p": top_p, "max_new_tokens": 1000}
)
return llm (prompt)
def chat_completion (
messages: List [Dict],
model = DEFAULT_MODEL,
temperature: float = 0.6,
top_p: float = 0.9,
) -> str:
history = ChatMessageHistory ()
for message in messages:
if message ["role"] == "user":
history.add_user_message (message ["content"])
elif message ["role"] == "assistant":
history.add_ai_message (message ["content"])
else:
raise Exception ("Unknown role")
return completion (
get_buffer_string (
history.messages,
human_prefix="USER",
ai_prefix="ASSISTANT",
),
model,
temperature,
top_p,
)
def assistant (content: str):
return { "role": "assistant", "content": content }
def user (content: str):
return { "role": "user", "content": content }
def complete_and_print (prompt: str, model: str = DEFAULT_MODEL):
print (f'==============\n {prompt}\n==============')
response = completion (prompt, model)
print (response, end='\n\n')
Completion API
complete_and_print ("The typical color of the sky is:")
complete_and_print ("which model version are you?")
Chat Completion 模型提供了與 LLM 互動(dòng)的額外結(jié)構(gòu),將結(jié)構(gòu)化消息對(duì)象數(shù)組而不是單個(gè)文本發(fā)送到 LLM。此消息列表為 LLM 提供了一些可以繼續(xù)進(jìn)行的「背景」或「歷史」信息。
通常,每條消息都包含角色和內(nèi)容:
具有系統(tǒng)角色的消息用于開(kāi)發(fā)人員向 LLM 提供核心指令。
具有用戶(hù)角色的消息通常是人工提供的消息。
具有助手角色的消息通常由 LLM 生成。
response = chat_completion (messages=[
user ("My favorite color is blue."),
assistant ("That's great to hear!"),
user ("What is my favorite color?"),
])
print (response)
# "Sure, I can help you with that! Your favorite color is blue."
LLM 超參數(shù)
LLM API 通常會(huì)采用影響輸出的創(chuàng)造性和確定性的參數(shù)。在每一步中,LLM 都會(huì)生成 token 及其概率的列表??赡苄宰钚〉?token 會(huì)從列表中「剪切」(基于 top_p),然后從剩余候選者中隨機(jī)(溫度參數(shù) temperature)選擇一個(gè) token。換句話說(shuō):top_p 控制生成中詞匯的廣度,溫度控制詞匯的隨機(jī)性,溫度參數(shù) temperature 為 0 會(huì)產(chǎn)生幾乎確定的結(jié)果。
def print_tuned_completion (temperature: float, top_p: float):
response = completion ("Write a haiku about llamas", temperature=temperature, top_p=top_p)
print (f'[temperature: {temperature} | top_p: {top_p}]\n {response.strip ()}\n')
print_tuned_completion (0.01, 0.01)
print_tuned_completion (0.01, 0.01)
# These two generations are highly likely to be the same
print_tuned_completion (1.0, 1.0)
print_tuned_completion (1.0, 1.0)
# These two generations are highly likely to be different
prompt 技巧
詳細(xì)、明確的指令會(huì)比開(kāi)放式 prompt 產(chǎn)生更好的結(jié)果:
complete_and_print (prompt="Describe quantum physics in one short sentence of no more than 12 words")
# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously.
我們可以給定使用規(guī)則和限制,以給出明確的指令。
- 風(fēng)格化,例如:
- 向我解釋一下這一點(diǎn),就像兒童教育網(wǎng)絡(luò)節(jié)目中教授小學(xué)生一樣;
- 我是一名軟件工程師,使用大型語(yǔ)言模型進(jìn)行摘要。用 250 字概括以下文字;
- 像私家偵探一樣一步步追查案件,給出你的答案。
- 格式化
使用要點(diǎn);
以 JSON 對(duì)象形式返回;
使用較少的技術(shù)術(shù)語(yǔ)并用于工作交流中。
- 限制
- 僅使用學(xué)術(shù)論文;
- 切勿提供 2020 年之前的來(lái)源;
- 如果你不知道答案,就說(shuō)你不知道。
以下是給出明確指令的例子:
complete_and_print ("Explain the latest advances in large language models to me.")
# More likely to cite sources from 2017
complete_and_print ("Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.")
# Gives more specific advances and only cites sources from 2020
零樣本 prompting
一些大型語(yǔ)言模型(例如 Llama 2)能夠遵循指令并產(chǎn)生響應(yīng),而無(wú)需事先看過(guò)任務(wù)示例。沒(méi)有示例的 prompting 稱(chēng)為「零樣本 prompting(zero-shot prompting)」。例如:
complete_and_print ("Text: This was the best movie I've ever seen! \n The sentiment of the text is:")
# Returns positive sentiment
complete_and_print ("Text: The director was trying too hard. \n The sentiment of the text is:")
# Returns negative sentiment
少樣本 prompting
添加所需輸出的具體示例通常會(huì)產(chǎn)生更加準(zhǔn)確、一致的輸出。這種方法稱(chēng)為「少樣本 prompting(few-shot prompting)」。例如:
def sentiment (text):
response = chat_completion (messages=[
user ("You are a sentiment classifier. For each message, give the percentage of positive/netural/negative."),
user ("I liked it"),
assistant ("70% positive 30% neutral 0% negative"),
user ("It could be better"),
assistant ("0% positive 50% neutral 50% negative"),
user ("It's fine"),
assistant ("25% positive 50% neutral 25% negative"),
user (text),
])
return response
def print_sentiment (text):
print (f'INPUT: {text}')
print (sentiment (text))
print_sentiment ("I thought it was okay")
# More likely to return a balanced mix of positive, neutral, and negative
print_sentiment ("I loved it!")
# More likely to return 100% positive
print_sentiment ("Terrible service 0/10")
# More likely to return 100% negative
Role Prompting
Llama 2 在指定角色時(shí)通常會(huì)給出更一致的響應(yīng),角色為 LLM 提供了所需答案類(lèi)型的背景信息。
例如,讓 Llama 2 對(duì)使用 PyTorch 的利弊問(wèn)題創(chuàng)建更有針對(duì)性的技術(shù)回答:
complete_and_print ("Explain the pros and cons of using PyTorch.")
# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve
complete_and_print ("Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.")
# Often results in more technical benefits and drawbacks that provide more technical details on how model layers
思維鏈
簡(jiǎn)單地添加一個(gè)「鼓勵(lì)逐步思考」的短語(yǔ)可以顯著提高大型語(yǔ)言模型執(zhí)行復(fù)雜推理的能力(Wei et al. (2022)),這種方法稱(chēng)為 CoT 或思維鏈 prompting:
complete_and_print ("Who lived longer Elvis Presley or Mozart?")
# Often gives incorrect answer of "Mozart"
complete_and_print ("Who lived longer Elvis Presley or Mozart? Let's think through this carefully, step by step.")
# Gives the correct answer "Elvis"
自洽性(Self-Consistency)
LLM 是概率性的,因此即使使用思維鏈,一次生成也可能會(huì)產(chǎn)生不正確的結(jié)果。自洽性通過(guò)從多次生成中選擇最常見(jiàn)的答案來(lái)提高準(zhǔn)確性(以更高的計(jì)算成本為代價(jià)):
import re
from statistics import mode
def gen_answer ():
response = completion (
"John found that the average of 15 numbers is 40."
"If 10 is added to each number then the mean of the numbers is?"
"Report the answer surrounded by three backticks, for example:```123```",
model = LLAMA2_70B_CHAT
)
match = re.search (r'```(\d+)```', response)
if match is None:
return None
return match.group (1)
answers = [gen_answer () for i in range (5)]
print (
f"Answers: {answers}\n",
f"Final answer: {mode (answers)}",
)
# Sample runs of Llama-2-70B (all correct):
# [50, 50, 750, 50, 50] -> 50
# [130, 10, 750, 50, 50] -> 50
# [50, None, 10, 50, 50] -> 50
檢索增強(qiáng)生成
有時(shí)我們可能希望在應(yīng)用程序中使用事實(shí)知識(shí),那么可以從開(kāi)箱即用(即僅使用模型權(quán)重)的大模型中提取常見(jiàn)事實(shí):
complete_and_print ("What is the capital of the California?", model = LLAMA2_70B_CHAT)
# Gives the correct answer "Sacramento"
然而,LLM 往往無(wú)法可靠地檢索更具體的事實(shí)或私人信息。模型要么聲明它不知道,要么幻想出一個(gè)錯(cuò)誤的答案:
complete_and_print ("What was the temperature in Menlo Park on December 12th, 2023?")
# "I'm just an AI, I don't have access to real-time weather data or historical weather records."
complete_and_print ("What time is my dinner reservation on Saturday and what should I wear?")
# "I'm not able to access your personal information [..] I can provide some general guidance"
檢索增強(qiáng)生成(RAG)是指在 prompt 中包含從外部數(shù)據(jù)庫(kù)檢索的信息(Lewis et al. (2020))。RAG 是將事實(shí)納入 LLM 應(yīng)用的有效方法,并且比微調(diào)更經(jīng)濟(jì)實(shí)惠,微調(diào)可能成本高昂并對(duì)基礎(chǔ)模型的功能產(chǎn)生負(fù)面影響。
MENLO_PARK_TEMPS = {
"2023-12-11": "52 degrees Fahrenheit",
"2023-12-12": "51 degrees Fahrenheit",
"2023-12-13": "51 degrees Fahrenheit",
}
def prompt_with_rag (retrived_info, question):
complete_and_print (
f"Given the following information: '{retrived_info}', respond to: '{question}'"
)
def ask_for_temperature (day):
temp_on_day = MENLO_PARK_TEMPS.get (day) or "unknown temperature"
prompt_with_rag (
f"The temperature in Menlo Park was {temp_on_day} on {day}'", # Retrieved fact
f"What is the temperature in Menlo Park on {day}?", # User question
)
ask_for_temperature ("2023-12-12")
# "Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit."
ask_for_temperature ("2023-07-18")
# "I'm not able to provide the temperature in Menlo Park on 2023-07-18 as the information provided states that the temperature was unknown."
程序輔助語(yǔ)言模型
LLM 本質(zhì)上不擅長(zhǎng)執(zhí)行計(jì)算,例如:
complete_and_print ("""
Calculate the answer to the following math problem:
((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
""")
# Gives incorrect answers like 92448, 92648, 95463
Gao et al. (2022) 提出「程序輔助語(yǔ)言模型(Program-aided Language Models,PAL)」的概念。雖然 LLM 不擅長(zhǎng)算術(shù),但它們非常擅長(zhǎng)代碼生成。PAL 通過(guò)指示 LLM 編寫(xiě)代碼來(lái)解決計(jì)算任務(wù)。
complete_and_print (
"""
# Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
""",
model="meta/codellama-34b:67942fd0f55b66da802218a19a8f0e1d73095473674061a6ea19f2dc8c053152"
)
# The following code was generated by Code Llama 34B:
num1 = (-5 + 93 * 4 - 0)
num2 = (4**4 + -7 + 0 * 5)
answer = num1 * num2
print (answer)