自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<sub id="qitsg"></sub>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

C# 使用Vosk離線語音轉(zhuǎn)文字完整實現(xiàn)指南

作者：iamrick 2024-11-29 07:45:38

開發(fā) 前端

本文介紹了如何使用 Vosk 和 NAudio 庫實現(xiàn)語音轉(zhuǎn)文字的功能，支持 MP3 和 WAV 格式的音頻輸入，并自動將 MP3 轉(zhuǎn)換為 WAV 格式，同時對音頻進行重采樣至 16kHz，以滿足 Vosk 的要求。

1. 項目準(zhǔn)備

首先需要安裝必要的 NuGet 包：

<PackageReference Include="Vosk" Version="0.3.38" />
<PackageReference Include="NAudio" Version="2.2.1" />

圖片

2. 下載語音模型

訪問 Vosk 模型下載頁面

https://alphacephei.com/vosk/models

圖片

下載中文模型中文或其他語言模型

圖片

解壓模型文件到項目目錄下的 Models 文件夾

3. 完整代碼實現(xiàn)

using NAudio.Wave;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using Vosk;

namespace AppVosk
{
    public class SpeechToTextConverter
    {
        private readonly string _modelPath;

        public SpeechToTextConverter(string modelPath)
        {
            _modelPath = modelPath;
            // 初始化 Vosk
            Vosk.Vosk.SetLogLevel(0);
            if (!Directory.Exists(_modelPath))
            {
                throw new DirectoryNotFoundException($"請確保模型文件夾存在: {_modelPath}");
            }
        }

        public async Task<string> ConvertToText(string audioFilePath)
        {
            if (!File.Exists(audioFilePath))
            {
                throw new FileNotFoundException($"音頻文件不存在: {audioFilePath}");
            }

            // 將音頻轉(zhuǎn)換為 WAV 格式（如果是 MP3）
            string wavFile = audioFilePath;
            bool needsDisposal = false;

            if (Path.GetExtension(audioFilePath).ToLower() == ".mp3")
            {
                wavFile = Path.Combine(Path.GetTempPath(), Path.GetFileNameWithoutExtension(audioFilePath) + ".wav");
                using (var reader = new Mp3FileReader(audioFilePath))
                using (var writer = new WaveFileWriter(wavFile, reader.WaveFormat))
                {
                    reader.CopyTo(writer);
                }
                needsDisposal = true;
            }

            try
            {
                using (var model = new Model(_modelPath))
                using (var recognizer = new VoskRecognizer(model, 16000.0f))
                using (var waveStream = new WaveFileReader(wavFile))
                {
                    // 重采樣到 16kHz (如果需要)
                    var outFormat = new WaveFormat(16000, 1);
                    using (var resampler = new MediaFoundationResampler(waveStream, outFormat))
                    {
                        byte[] buffer = new byte[4096];
                        int bytesRead;

                        while ((bytesRead = resampler.Read(buffer, 0, buffer.Length)) > 0)
                        {
                            if (recognizer.AcceptWaveform(buffer, bytesRead))
                            {
                                // 處理中間結(jié)果（如果需要）
                            }
                        }
                    }

                    // 獲取最終識別結(jié)果
                    var result = JsonDocument.Parse(recognizer.FinalResult());
                    return result.RootElement.GetProperty("text").GetString() ?? string.Empty;
                }
            }
            finally
            {
                // 清理臨時 WAV 文件
                if (needsDisposal && File.Exists(wavFile))
                {
                    try
                    {
                        File.Delete(wavFile);
                    }
                    catch { /* 忽略刪除失敗 */ }
                }
            }
        }
    }
}

4. 使用示例

圖片

class Program
{
    static async Task Main(string[] args)
    {
        try
        {
            // 指定模型路徑和音頻文件路徑
            string modelPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Models", "vosk-model-small-cn-0.22");
            string audioFile = "test.mp3";  // 或 test.wav

            var converter = new SpeechToTextConverter(modelPath);
            string result = await converter.ConvertToText(audioFile);

            Console.WriteLine("識別結(jié)果:");
            Console.WriteLine(result);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"發(fā)生錯誤: {ex.Message}");
        }
    }
}

圖片

我下載的是1.3G的模型，這個速度會有點慢。

5. 主要功能說明

格式轉(zhuǎn)換：支持 MP3 和 WAV 格式的輸入，自動將 MP3 轉(zhuǎn)換為 WAV
重采樣：自動將音頻重采樣至 16kHz，以符合 Vosk 的要求
錯誤處理：包含完整的錯誤處理和資源清理
內(nèi)存優(yōu)化：使用流式處理，適合處理大文件
臨時文件管理：自動清理轉(zhuǎn)換過程中產(chǎn)生的臨時文件

6. 總結(jié)

本文介紹了如何使用 Vosk 和 NAudio 庫實現(xiàn)語音轉(zhuǎn)文字的功能，支持 MP3 和 WAV 格式的音頻輸入，并自動將 MP3 轉(zhuǎn)換為 WAV 格式，同時對音頻進行重采樣至 16kHz，以滿足 Vosk 的要求。文章詳細(xì)講解了代碼實現(xiàn)，包括模型文件的加載、音頻格式轉(zhuǎn)換、重采樣處理以及最終的語音識別流程。通過將模型加載放在應(yīng)用程序啟動時，可以提高識別效率，避免重復(fù)加載模型。整體方案簡單高效，適合需要實現(xiàn)語音識別功能的開發(fā)者參考和使用。

責(zé)任編輯：武曉燕來源：技術(shù)老小子

C#離線語音文字

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<legend id="6lgnx"><track id="6lgnx"><dfn id="6lgnx"></dfn></track></legend>