C# 下的LLamaSharp: 高效的本地LLM推理庫(kù),自己寫GPT
LLamaSharp 是一個(gè)跨平臺(tái)庫(kù),用于在本地設(shè)備上運(yùn)行 LLaMA/LLaVA 模型(以及其他模型)?;?llama.cpp,LLamaSharp 在 CPU 和 GPU 上的推理都非常高效。通過(guò)高級(jí) API 和 RAG 支持,您可以方便地在應(yīng)用程序中部署大型語(yǔ)言模型(LLM)。
GitHub 地址
https://github.com/SciSharp/LLamaSharp
圖片
下載代碼:
git clone https://github.com/SciSharp/LLamaSharp.git
快速開始
安裝
為了獲得高性能,LLamaSharp 與從 C++ 編譯的本地庫(kù)交互,這些庫(kù)稱為 backends。我們?yōu)?Windows、Linux 和 Mac 提供了 CPU、CUDA、Metal 和 OpenCL 的后端包。您不需要編譯任何 C++ 代碼,只需安裝后端包即可。
安裝 LLamaSharp 包:
PM> Install-Package LLamaSharp
圖片
安裝一個(gè)或多個(gè)后端包,或使用自編譯的后端:
- LLamaSharp.Backend.Cpu: 適用于 Windows、Linux 和 Mac 的純 CPU 后端。支持 Mac 的 Metal (GPU)。
- LLamaSharp.Backend.Cuda11: 適用于 Windows 和 Linux 的 CUDA 11 后端。
- LLamaSharp.Backend.Cuda12: 適用于 Windows 和 Linux 的 CUDA 12 后端。
- LLamaSharp.Backend.OpenCL: 適用于 Windows 和 Linux 的 OpenCL 后端。
(可選)對(duì)于 Microsoft semantic-kernel 集成,安裝 LLamaSharp.semantic-kernel 包。
(可選)要啟用 RAG 支持,安裝 LLamaSharp.kernel-memory 包(該包僅支持 net6.0 或更高版本),該包基于 Microsoft kernel-memory 集成。
模型準(zhǔn)備
LLamaSharp 使用 GGUF 格式的模型文件,可以從 PyTorch 格式(.pth)和 Huggingface 格式(.bin)轉(zhuǎn)換而來(lái)。獲取 GGUF 文件有兩種方式:
- 在 Huggingface 搜索模型名稱 + 'gguf',找到已經(jīng)轉(zhuǎn)換好的模型文件。
- 自行將 PyTorch 或 Huggingface 格式轉(zhuǎn)換為 GGUF 格式。請(qǐng)按照 llama.cpp readme 中的說(shuō)明使用 Python 腳本進(jìn)行轉(zhuǎn)換。
一般來(lái)說(shuō),我們推薦下載帶有量化的模型,因?yàn)樗@著減少了所需的內(nèi)存大小,同時(shí)對(duì)生成質(zhì)量的影響很小。
簡(jiǎn)單對(duì)話
LLamaSharp 提供了一個(gè)簡(jiǎn)單的控制臺(tái)演示,展示了如何使用該庫(kù)進(jìn)行推理。以下是一個(gè)基本示例:
圖片
using LLama.Common;
using LLama;
namespace appLLama
{
internal class Program
{
static void Main(string[] args)
{
Chart();
}
static async Task Chart()
{
string modelPath = @"E:\Models\llama-2-7b-chat.Q4_K_M.gguf"; // change it to your own model path.
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024, // The longest length of chat as memory.
GpuLayerCount = 5 // How many layers to offload to GPU. Please adjust it according to your GPU memory.
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
// Add chat histories as prompt to tell AI how to act.
var chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.");
chatHistory.AddMessage(AuthorRole.User, "Hello, Bob.");
chatHistory.AddMessage(AuthorRole.Assistant, "Hello. How may I help you today?");
ChatSession session = new(executor, chatHistory);
InferenceParams inferenceParams = new InferenceParams()
{
MaxTokens = 256, // No more than 256 tokens should appear in answer. Remove it if antiprompt is enough for control.
AntiPrompts = new List<string> { "User:" } // Stop generation once antiprompts appear.
};
Console.ForegroundColor = ConsoleColor.Yellow;
Console.Write("The chat session has started.\nUser: ");
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";
while (userInput != "exit")
{
await foreach ( // Generate the response streamingly.
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, userInput),
inferenceParams))
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write(text);
}
Console.ForegroundColor = ConsoleColor.Green;
userInput = Console.ReadLine() ?? "";
}
}
}
}
- 模型路徑與參數(shù)設(shè)置:指定模型路徑,以及上下文的大小和 GPU 層的數(shù)量。
- 加載模型并創(chuàng)建上下文:從文件中加載模型,并使用參數(shù)初始化上下文。
- 執(zhí)行器與對(duì)話歷史記錄:定義一個(gè) InteractiveExecutor,并設(shè)置初始的對(duì)話歷史,包括系統(tǒng)消息和用戶與助手的初始對(duì)話。
- 會(huì)話與推理參數(shù):建立對(duì)話會(huì)話 ChatSession,設(shè)置推理參數(shù),包括最大 token 數(shù)和反提示語(yǔ)。
- 用戶輸入與生成回復(fù):開始聊天會(huì)話并處理用戶輸入,使用異步方法流式地生成助手的回復(fù),并根據(jù)反提示語(yǔ)停止生成。
圖片
你會(huì)發(fā)現(xiàn)中文支持不太好,即使用了千問(wèn)的量化庫(kù)。
中文處理官方例子
我這換成了千問(wèn)的庫(kù):
using LLama.Common;
using LLama;
using System.Text;
namespace appLLama
{
internal class Program
{
static void Main(string[] args)
{
// Register provider for GB2312 encoding
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Run();
}
private static string ConvertEncoding(string input, Encoding original, Encoding target)
{
byte[] bytes = original.GetBytes(input);
var convertedBytes = Encoding.Convert(original, target, bytes);
return target.GetString(convertedBytes);
}
public static async Task Run()
{
// Register provider for GB2312 encoding
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("This example shows how to use Chinese with gb2312 encoding, which is common in windows. It's recommended" +
" to use https://huggingface.co/hfl/chinese-alpaca-2-7b-gguf/blob/main/ggml-model-q5_0.gguf, which has been verified by LLamaSharp developers.");
Console.ForegroundColor = ConsoleColor.White;
string modelPath = @"E:\LMModels\ay\Repository\qwen1_5-7b-chat-q8_0.gguf";// @"E:\Models\llama-2-7b-chat.Q4_K_M.gguf";
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5,
Encoding = Encoding.UTF8
};
using var model = LLamaWeights.LoadFromFile(parameters);
using var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
ChatSession session;
ChatHistory chatHistory = new ChatHistory();
session = new ChatSession(executor, chatHistory);
session
.WithHistoryTransform(new LLamaTransforms.DefaultHistoryTransform());
InferenceParams inferenceParams = new InferenceParams()
{
Temperature = 0.9f,
AntiPrompts = new List<string> { "用戶:" }
};
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("The chat session has started.");
// show the prompt
Console.ForegroundColor = ConsoleColor.White;
Console.Write("用戶:");
Console.ForegroundColor = ConsoleColor.Green;
string userInput = Console.ReadLine() ?? "";
while (userInput != "exit")
{
// Convert the encoding from gb2312 to utf8 for the language model
// and later saving to the history json file.
userInput = ConvertEncoding(userInput, Encoding.GetEncoding("gb2312"), Encoding.UTF8);
if (userInput == "save")
{
session.SaveSession("chat-with-kunkun-chinese");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Session saved.");
}
else if (userInput == "regenerate")
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Regenerating last response ...");
await foreach (
var text
in session.RegenerateAssistantMessageAsync(
inferenceParams))
{
Console.ForegroundColor = ConsoleColor.White;
// Convert the encoding from utf8 to gb2312 for the console output.
Console.Write(ConvertEncoding(text, Encoding.UTF8, Encoding.GetEncoding("gb2312")));
}
}
else
{
await foreach (
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, userInput),
inferenceParams))
{
Console.ForegroundColor = ConsoleColor.White;
Console.Write(text);
}
}
Console.ForegroundColor = ConsoleColor.Green;
userInput = Console.ReadLine() ?? "";
Console.ForegroundColor = ConsoleColor.White;
}
}
}
}
圖片
Winform寫 一個(gè)簡(jiǎn)單例子
圖片
Chat類:
public class Chat
{
ChatSession session;
InferenceParams inferenceParams = new InferenceParams()
{
Temperature = 0.9f,
AntiPrompts = new List<string> { "用戶:" }
};
private string ConvertEncoding(string input, Encoding original, Encoding target)
{
byte[] bytes = original.GetBytes(input);
var convertedBytes = Encoding.Convert(original, target, bytes);
return target.GetString(convertedBytes);
}
public void Init()
{
// Register provider for GB2312 encoding
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("This example shows how to use Chinese with gb2312 encoding, which is common in windows. It's recommended" +
" to use https://huggingface.co/hfl/chinese-alpaca-2-7b-gguf/blob/main/ggml-model-q5_0.gguf, which has been verified by LLamaSharp developers.");
Console.ForegroundColor = ConsoleColor.White;
string modelPath = @"E:\LMModels\ay\Repository\qwen1_5-7b-chat-q8_0.gguf";// @"E:\Models\llama-2-7b-chat.Q4_K_M.gguf";
var parameters = new ModelParams(modelPath)
{
ContextSize = 1024,
Seed = 1337,
GpuLayerCount = 5,
Encoding = Encoding.UTF8
};
var model = LLamaWeights.LoadFromFile(parameters);
var context = model.CreateContext(parameters);
var executor = new InteractiveExecutor(context);
var chatHistory = new ChatHistory();
session = new ChatSession(executor, chatHistory);
session
.WithHistoryTransform(new LLamaTransforms.DefaultHistoryTransform());
}
public async Task Run(string userInput,Action<string> callback)
{
while (userInput != "exit")
{
userInput = ConvertEncoding(userInput, Encoding.GetEncoding("gb2312"), Encoding.UTF8);
if (userInput == "save")
{
session.SaveSession("chat-with-kunkun-chinese");
}
else if (userInput == "regenerate")
{
await foreach (
var text
in session.RegenerateAssistantMessageAsync(
inferenceParams))
{
callback(ConvertEncoding(text, Encoding.UTF8, Encoding.GetEncoding("gb2312")));
}
}
else
{
await foreach (
var text
in session.ChatAsync(
new ChatHistory.Message(AuthorRole.User, userInput),
inferenceParams))
{
callback(text);
}
}
userInput = "";
}
}
}
Form1界面事件:
public partial class Form1 : Form
{
Chat chat = new Chat();
public Form1()
{
InitializeComponent();
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
chat.Init();
}
private void btnSend_Click(object sender, EventArgs e)
{
var call = new Action<string>(x =>
{
this.Invoke(() =>
{
txtLog.AppendText(x);
});
});
//chat.Run(txtMsg.Text, call);
Task.Run(() =>
{
chat.Run(txtMsg.Text, call);
});
}
}
更新例子可以去官網(wǎng)上看,寫的比較專業(yè)。
https://scisharp.github.io/LLamaSharp/0.13.0/