OpenAI o1式思維鏈,開源模型也可以有,成功案例來了
北京時間 9 月 13 日午夜,OpenAI 發(fā)布了推理性能強大的 ο1 系列模型。之后,各路研究者一直在嘗試挖掘 ο1 卓越性能背后的技術(shù)并嘗試復(fù)現(xiàn)它。當(dāng)然,OpenAI 也想了一些方法來抑制窺探,比如有多名用戶聲稱曾試圖誘導(dǎo) ο1 模型公布其思維過程,然后收到了 OpenAI 的封號威脅。
盡管如此,不過三四天時間,就已經(jīng)有研究者宣稱已經(jīng)成功復(fù)現(xiàn)/開發(fā)出了與 ο1 性能差不多的推理技術(shù),并且還不止一個!
Llamaberry:教會 AI 像聰明人一樣思考
Llamaberry 的提出者是 Martin Bowling。他開發(fā)的項目包括 RAGMiner.dev 和 Replit;其中前者可以幫助用戶毫不費力地將網(wǎng)站轉(zhuǎn)換成 Markdown、XML 或 JSON 等格式以便 RAG 和 LLM 應(yīng)用使用,而后者則是一個使用 AI 將想法變成代碼的項目。
Llamaberry 的核心思路是使用思維鏈(CoT)來實現(xiàn)推理。這個名字自然源自代表 o1 模型的 Strawberry(草莓)。
HuggingFace地址:https://huggingface.co/spaces/martinbowling/Llamaberry
什么是思維鏈?Bowling 在博客中打了個比方:「思維鏈推理就像是給 AI 一個筆記本來展示其工作過程。其中不僅僅是簡單地給出答案,而是會帶領(lǐng)我們經(jīng)歷其思維過程。」
Llamaberry 能教會 AI 透徹地思考,就像是一位人類專家攻克難題時那樣。
具體來說,Llamaberry 是一個多輪思維鏈推理系統(tǒng)的實現(xiàn),其基于運行在 Groq 上的 Llama 3.1 70B 模型。
多輪推理是關(guān)鍵
多輪推理,顧名思義,就是讓模型在給出答案之前進行多步思考,而不是一步給出答案。打個比方,這就像是看一位大廚從備菜到完成擺盤一步步地完成一道精美菜肴,而不是直接微波加熱預(yù)制菜。
舉個示例:
第 1 輪:AI 先嘗試解決當(dāng)前問題。
第 2 輪:AI 回顧第一次嘗試并盡力改進或優(yōu)化其思維過程。
第 3 輪:再進行一輪反思和改進。
綜合結(jié)果:最后,將所有這些思考綜合到一起,得到一個連貫且合理的答案。
下面展示了這個多輪過程的示意圖:
可以看到,前一輪的輸出會成為后一輪的輸入,從而讓 AI 可在每個階段不斷完善其思維。最后,所有這些思考會凝練成一個合理的最終答案。就像看著一枚莓果逐漸成熟!
如何實現(xiàn)
下面將深入 Llamaberry 的實現(xiàn)細節(jié)。
1.設(shè)置舞臺
首先,我們需要為 AI 助手設(shè)置一些基本規(guī)則,代碼如下:
initial_system_prompt = """You are an AI assistant capable of detailed, step-by-step thinking. When presented with a question or problem, break down your thought process into clear, logical steps. For each step, explain your reasoning. Conclude with a final answer. Use the following markdown structure:
## Reasoning
1. [First step]
**Explanation:** [Detailed explanation of this step]
2. [Second step]
**Explanation:** [Detailed explanation of this step]
...
## Answer
[Final answer]
Be comprehensive and show your reasoning clearly."""
這就是提供給 AI 大廚的菜譜。它知道需要逐步分解其思考過程并解釋每個步驟,并且以 Markdown 格式將它們顯示出來。
2.思考過程
在每一輪推理中,都需要讓 AI 對問題進行思考。但在第一輪結(jié)束后,還需要求它思考之前已經(jīng)思考過的東西。這就像問朋友,「嘿,還記得你之前說過什么嗎?讓我們再想一想。」
下面是每一輪的生成方式:
async def generate_turn(query: str, previous_turns: list = None) -> str:
is_first_turn = previous_turns is None or len(previous_turns) == 0
if is_first_turn:
messages = [{
"role": "system",
"content": initial_system_prompt
}, {
"role": "user",
"content": query
}]
else:
previous_content = "\n\n".join(previous_turns)
messages = [{
"role": "system",
"content": followup_system_prompt
}, {
"role":
"user",
"content":
f"Original Query: {query}\n\nPrevious Turns:\n{previous_content}\n\nProvide the next turn of reasoning."
}]
return await call_llm(messages)
該函數(shù)是多輪推理的核心。它決定我們是在第一輪還是在后續(xù)輪,并以此構(gòu)建提示詞。對于后續(xù)輪,它會將之前所有推理輪都包含在上下文中,讓 AI 可以基于之前的思考進行構(gòu)建。
3.綜合結(jié)果
經(jīng)過三輪思考之后,再讓 AI 檢視其所有思考并得出一個合理的最終答案。這就像是讓專家討論問題,然后編寫最終報告。
async def synthesize_turns(query: str, turns: list) -> str:
turns_text = "\n\n".join(
[f"Turn {i+1}:\n{turn}" for i, turn in enumerate(turns)])
messages = [{
"role": "system",
"content": synthesis_prompt
}, {
"role":
"user",
"content":
f"Original Query: {query}\n\nTurns of Reasoning:\n{turns_text}"
}]
return await call_llm(messages)
這個綜合步驟至關(guān)重要,是 AI 綜合所有思考內(nèi)容凝練出最終答案的一步。
單輪推理 vs 多輪推理
Martin Bowling 在其博客中簡單比較了單輪和多輪推理。
這里他采用了一個簡單示例,即詢問 AI:「人工智能對就業(yè)市場有什么潛在影響?」
單輪推理:
AI: Artificial intelligence will likely have significant impacts on the job market, including:
1. Automation of routine tasks
2. Creation of new job roles in AI development and maintenance
3. Potential job displacement in certain industries
4. Increased demand for skills in data analysis and machine learning
While AI may eliminate some jobs, it will also create new opportunities and increase productivity in many fields.
使用 Llamaberry 執(zhí)行多輪推理:
第 1 輪:
AI: Let's approach this step-by-step:
1. Identify current AI capabilities
Explanation: AI can perform data analysis, pattern recognition, and automate routine tasks.
2. Analyze vulnerable job sectors
Explanation: Manufacturing, customer service, and data entry are likely to be affected.
3. Consider new job opportunities
Explanation: AI development, data science, and AI ethics consultation may see growth.
4. Examine potential economic effects
Explanation: Increased productivity might lead to economic growth, but wealth distribution could be a challenge.
Initial Answer: AI will likely cause significant shifts in the job market, with both job losses in some sectors and new opportunities in others.
第 2 輪:
AI: Let's refine our thinking:
1. Reassess the timeline of AI impact
Explanation: The effects won't be immediate but will unfold over years or decades.
2. Consider the adaptability of the workforce
Explanation: Retraining and education will be crucial for workers to transition to new roles.
3. Analyze potential societal changes
Explanation: The nature of work itself might change, with more emphasis on creativity and emotional intelligence.
4. Examine global differences
Explanation: AI's impact will vary across different countries and economies.
Refined Answer: AI's impact on the job market will be complex and gradual, requiring proactive measures in education and economic policy to ensure a smooth transition.
第 3 輪:
AI: Let's delve deeper:
1. Explore the concept of human-AI collaboration
Explanation: Many jobs may evolve to involve working alongside AI rather than being replaced by it.
2. Consider the ethical implications
Explanation: Issues like algorithmic bias in hiring processes need to be addressed.
3. Analyze the potential for new economic models
Explanation: Universal Basic Income or reduced working hours might become more prominent discussions.
4. Examine historical parallels
Explanation: Previous technological revolutions, like the Industrial Revolution, can offer insights into potential outcomes.
Further Refined Answer: While AI will undoubtedly transform the job market, its impact will likely be nuanced, involving collaboration between humans and AI, ethical considerations, and potentially new economic paradigms.
綜合答案:
AI: After careful consideration across multiple turns of reasoning, we can synthesize our thoughts on the potential impacts of artificial intelligence on the job market:
1. Job Transformation: AI will lead to significant changes in the job market, but these changes will be gradual and complex. While some jobs may be automated, many others will be transformed to involve collaboration between humans and AI. Industries like manufacturing, customer service, and data entry are likely to see the most immediate impacts.
2. New Opportunities: The rise of AI will create new job categories, particularly in fields like AI development, data science, AI ethics consultation, and human-AI interaction design. There will likely be an increased demand for skills in creativity, emotional intelligence, and complex problem-solving that AI cannot easily replicate.
3. Economic and Social Implications: The integration of AI into the workforce has the potential to increase productivity and economic growth. However, it also raises concerns about wealth distribution and may necessitate discussions about new economic models, such as Universal Basic Income or reduced working hours.
4. Education and Adaptation: The changing job market will require a strong emphasis on continuous learning and adaptation. Educational systems and workforce training programs will need to evolve to prepare people for the new reality of working alongside AI.
5. Ethical Considerations: As AI becomes more prevalent in hiring and work processes, addressing issues like algorithmic bias and ensuring fair opportunities for all workers will be crucial.
6. Global Variations: The impact of AI on the job market will not be uniform across the globe. Different countries and economies will experience varying effects based on their economic structures, technological readiness, and policy responses.
7. Historical Context: While AI represents a significant shift, it's important to view it in the context of previous technological revolutions. Like the Industrial Revolution, the AI revolution will likely lead to short-term disruptions but potentially long-term benefits if managed properly.
In conclusion, while AI will undoubtedly have profound impacts on the job market, these impacts will be multifaceted and evolve over time. The key to navigating this transition successfully lies in proactive measures in education, economic policy, and ethical governance, ensuring that the benefits of AI are broadly shared while mitigating potential negative effects.
可以看到,多輪方法能實現(xiàn)更加精細、全面的分析。這能從多個角度考慮問題,優(yōu)化其思考,最后得到更加深思熟慮和全面詳細的答案。
Bowling 表示,這樣的多輪推理在思考深度、自我提升、透明度、靈活性等方面都有巨大優(yōu)勢,同時其還采用了 Markdown 格式,使其結(jié)果更容易閱讀和用于后續(xù)流程。
因此,這種技術(shù)可讓 AI 進入更多實際應(yīng)用場景,比如醫(yī)療診斷、法律分析、科學(xué)研究、商業(yè)策略、教育。當(dāng)然,在使用 Llamaberry 等多輪推理系統(tǒng)時,也需要注意它們的局限和潛在的道德倫理問題,比如偏見、計算成本、過度依賴和隱私問題。
如何使用 Llamaberry?
Llamaberry 也很容易使用,點擊幾下就能擁有你自己的多輪推理系統(tǒng)。步驟如下:
- 前往 Replit,點擊該鏈接獲取 Llamaberry 模板:https://replit.com/@MartinBowling/Llamaberry-Powered-By-Groq?v=1
- 創(chuàng)建模板分支:點擊 Fork 按鈕創(chuàng)建你自己的 Llamaberry 項目副本。
- 獲取你的 Groq API Key:注冊 Groq 賬戶,獲取 API Key。
- 設(shè)置環(huán)境:在你的分支 Replit 項目中,找到「Secrets」選項卡。添加一個新密鑰,密鑰為 GROQ_API_KEY,值是你的 Groq API 密鑰。
- 運行項目:單擊 Replit 界面頂部的 Run 按鈕。這將啟動 Llamaberry 應(yīng)用。
- 開始實驗:應(yīng)用運行起來后,你將看到一個 Gradio 界面。你可以在其中輸入問題并查看 Llamaberry 多輪推理的實際效果!并且輸出是簡潔漂亮的 Markdown 格式!
了解了 Llamaberry,下面來看另一個號稱實現(xiàn)了類 o1 推理鏈的項目:g1。
g1:實現(xiàn)類似 ο1 的推理鏈
g1 這個項目來自 Benjamin Klieger,他是 Groq 的一位研究者。也因此,g1 同樣基于 Groq,并且其也使用了 Llama 3.1 70b 模型。
不同于 Llamaberry 使用的多輪思維鏈推理,g1 的策略是角色扮演、思維鏈提示 、格式化以及另一些提示技巧。并且,g1 開源了。
項目地址:https://github.com/bklieger-groq/g1
開發(fā)者宣稱 g1 有 70% 的時間能成功數(shù)出 Strawberry 中有多少個 R,同時無需任何微調(diào)或少樣本技術(shù)。下面是其一次執(zhí)行過程:
開發(fā)者 Klieger 表示,g1 和 ο1 一樣能讓 LLM 有能力「思考」和解決之前的領(lǐng)先模型難以應(yīng)對的邏輯問題。但不同之處在于,g1 會大方地展示所有推理 token。同時,他也強調(diào)了 g1 和 ο1 在技術(shù)上的差異,其中后者使用了大規(guī)模強化學(xué)習(xí)來執(zhí)行思維鏈推理。而 g1 則是通過發(fā)掘提示詞工程的潛力來幫助 LLM 解決簡單的邏輯問題,讓現(xiàn)有的開源模型也能受益于動態(tài)推理鏈和優(yōu)化般的探索界面。
g1 的工作方式
由 Llama 3.1 70b 支持的 g1 會創(chuàng)建一種動態(tài)的思維鏈。
在每個步驟中,LLM 可以選擇是繼續(xù)進行另一個推理步驟,還是提供最終答案。每個步驟都有標(biāo)題,并且對用戶可見。
系統(tǒng)提示詞中還會包含給 LLM 的提示。其提示策略如下:
You are an expert AI assistant that explains your reasoning step by step. For each step, provide a title that describes what you're doing in that step, along with the content. Decide if you need another step or if you're ready to give the final answer. Respond in JSON format with 'title', 'content', and 'next_action' (either 'continue' or 'final_answer') keys. USE AS MANY REASONING STEPS AS POSSIBLE. AT LEAST 3. BE AWARE OF YOUR LIMITATIONS AS AN LLM AND WHAT YOU CAN AND CANNOT DO. IN YOUR REASONING, INCLUDE EXPLORATION OF ALTERNATIVE ANSWERS. CONSIDER YOU MAY BE WRONG, AND IF YOU ARE WRONG IN YOUR REASONING, WHERE IT WOULD BE. FULLY TEST ALL OTHER POSSIBILITIES. YOU CAN BE WRONG. WHEN YOU SAY YOU ARE RE-EXAMINING, ACTUALLY RE-EXAMINE, AND USE ANOTHER APPROACH TO DO SO. DO NOT JUST SAY YOU ARE RE-EXAMINING. USE AT LEAST 3 METHODS TO DERIVE THE ANSWER. USE BEST PRACTICES.
Example of a valid JSON response:
json
{
"title": "Identifying Key Information",
"content": "To begin solving this problem, we need to carefully examine the given information and identify the crucial elements that will guide our solution process. This involves...",
"next_action": "continue"
}
對這些提示詞的詳細解釋請參閱原項目的 Prompt Breakdown 一節(jié)。這里就不贅述了,僅給出幾個示例,比如可以在提示詞中加入「include exploration of alternative answers」(探索其它答案)和「use at least 3 methods to derive the answer」(使用至少三種方法來得出答案)。
這樣一來,通過組合思維鏈以及嘗試多種方法、探索其它答案、質(zhì)疑之前草擬的解答、考慮 LLM 的局限性等策略,就能顯著提升 LLM 的推理能力。
在數(shù) Strawberry 中有多少個 R 這個經(jīng)典問題上,無需任何訓(xùn)練,g1 就能幫助 Llama 3.1 70b 達到約 70% 的準(zhǔn)確度(n=10, How many Rs are in strawberry?)。而如果不使用提示技術(shù),Llama 3.1 70b 的準(zhǔn)確率為 0%,ChatGPT-4o 的也只有 30%。
下面展示了另一個示例:0.9 和 0.11 哪個更大?
詳細的安裝過程和代碼請參閱原項目。
最后,順便一提,另有開發(fā)者發(fā)布了 g1 的分支版 Mult1,該版本的一大改進是可使用多個 AI 提供商來創(chuàng)建類似 o1 的推理鏈,感興趣的讀者可訪問:https://github.com/tcsenpai/multi1