鴻蒙AI能力之語音識別
51CTO和華為官方合作共建的鴻蒙技術(shù)社區(qū)
文章旨在幫助大家開發(fā)錄音及語音識別時少踩一點(diǎn)坑。
效果

左側(cè)為簡易UI布局及識別成果,右側(cè)為網(wǎng)易云播放的測試音頻。
開發(fā)步驟
IDE安裝、項目創(chuàng)建等在此略過。App采用SDK版本為API 6,使用JS UI。
1.權(quán)限申請
AI語音識別不需要任何權(quán)限,但此處使用到麥克風(fēng)錄制音頻,就需要申請麥克風(fēng)權(quán)限。
在config.json配置文件中添加權(quán)限:
- "reqPermissions": [
- {
- "name": "ohos.permission.MICROPHONE"
- }
- ]
在MainAbility中顯示申明麥克風(fēng)權(quán)限:
- @Override
- public void onStart(Intent intent) {
- super.onStart(intent);
- requestPermission();
- }
- //獲取權(quán)限
- private void requestPermission() {
- String[] permission = {
- "ohos.permission.MICROPHONE",
- };
- List<String> applyPermissions = new ArrayList<>();
- for (String element : permission) {
- if (verifySelfPermission(element) != 0) {
- if (canRequestPermission(element)) {
- applyPermissions.add(element);
- }
- }
- }
- requestPermissionsFromUser(applyPermissions.toArray(new String[0]), 0);
- }
2.創(chuàng)建音頻錄制的工具類
首先創(chuàng)建音頻錄制的工具類AudioCaptureUtils。
而音頻錄制需要用到AudioCapturer類,而在創(chuàng)建AudioCapture類時又會用到AudioStreamInfo類及AudioCapturerInfo類,所以我們分別申明以上3個類的變量。
- private AudioStreamInfo audioStreamInfo;
- private AudioCapturer audioCapturer;
- private AudioCapturerInfo audioCapturerInfo;
在語音識別時對音頻的錄制是由限制的,限制如下:

所以我們在錄制音頻時需要注意:
1.采樣率16000HZ
2.聲道為單聲道
3.僅支持普通話
作為工具類,為了使AudioCaptureUtils能多處使用,我們在創(chuàng)建構(gòu)造函數(shù)時,提供聲道與頻率的參數(shù)重載,并在構(gòu)造函數(shù)中初始化AudioStreamInfo類及AudioCapturerInfo類。
- //channelMask 聲道
- //SampleRate 頻率
- public AudioCaptureUtils(AudioStreamInfo.ChannelMask channelMask, int SampleRate) {
- this.audioStreamInfo = new AudioStreamInfo.Builder()
- .encodingFormat(AudioStreamInfo.EncodingFormat.ENCODING_PCM_16BIT)
- .channelMask(channelMask)
- .sampleRate(SampleRate)
- .build();
- this.audioCapturerInfo = new AudioCapturerInfo.Builder().audioStreamInfo(audioStreamInfo).build();
- }
在init函數(shù)中進(jìn)行audioCapturer的初始化,在初始化時對音效進(jìn)行設(shè)置,默認(rèn)為降噪模式。
- //packageName 包名
- public void init(String packageName) {
- this.init(SoundEffect.SOUND_EFFECT_TYPE_NS,packageName );
- }
- //soundEffect 音效uuid
- //packageName 包名
- public void init(UUID soundEffect, String packageName) {
- if (audioCapturer == null || audioCapturer.getState() == AudioCapturer.State.STATE_UNINITIALIZED)
- audioCapturer = new AudioCapturer(this.audioCapturerInfo);
- audioCapturer.addSoundEffect(soundEffect, packageName);
- }
初始化后提供start、stop和destory方法,分別開啟音頻錄制、停止音頻錄制和銷毀,此處都是調(diào)用AudioCapturer類中對應(yīng)函數(shù)。
- public void stop(){
- this.audioCapturer.stop();
- }
- public void destory(){
- this.audioCapturer.stop();
- this.audioCapturer.release();
- }
- public Boolean start() {
- if (audioCapturer == null)
- return false;
- return audioCapturer.start();
- }
提供一個讀取音頻流的方法及獲取AudioCapturer實(shí)例的方法。
- //buffers 需要寫入的數(shù)據(jù)流
- //offset 數(shù)據(jù)流的偏移量
- //byteslength 數(shù)據(jù)流的長度
- public int read(byte[] buffers, int offset, int bytesLength){
- return audioCapturer.read(buffers,offset,bytesLength);
- }
- //獲取AudioCapturer的實(shí)例audioCapturer
- public AudioCapturer get(){
- return this.audioCapturer;
- }
3.創(chuàng)建語音識別的工具類
在上面我們已經(jīng)創(chuàng)建好一個音頻錄制的工具類,接下來在創(chuàng)建一個語音識別的工具類 AsrUtils。
我們再回顧一下語音識別的約束與限制:

在此補(bǔ)充一個隱藏限制,PCM流的長度只允許640與1280兩種長度,也就是我們音頻讀取流時只能使用640與1280兩種長度。
接下來我們定義一些基本常量:
- //采樣率限定16000HZ
- private static final int VIDEO_SAMPLE_RATE = 16000;
- //VAD結(jié)束時間 默認(rèn)2000ms
- private static final int VAD_END_WAIT_MS = 2000;
- //VAD起始時間 默認(rèn)4800ms
- //這兩參數(shù)與識別準(zhǔn)確率有關(guān),相關(guān)信息可百度查看,在此使用系統(tǒng)默認(rèn)
- private static final int VAD_FRONT_WAIT_MS = 4800;
- //輸入時常 20000ms
- private static final int TIMEOUT_DURATION = 20000;
- //PCM流長度僅限640或1280
- private static final int BYTES_LENGTH = 1280;
- //線程池相關(guān)參數(shù)
- private static final int CAPACITY = 6;
- private static final int ALIVE_TIME = 3;
- private static final int POOL_SIZE = 3;
因?yàn)橐诤笈_持續(xù)錄制音頻,所以需要開辟一個新的線程。此處用到j(luò)ava的ThreadPoolExecutor類進(jìn)行線程操作。
定義一個線程池實(shí)例以及其它相關(guān)屬性如下:
- //錄音線程
- private ThreadPoolExecutor poolExecutor;
- /* 自定義狀態(tài)信息
- ** 錯誤:-1
- ** 初始:0
- ** init:1
- ** 開始輸入:2
- ** 結(jié)束輸入:3
- ** 識別結(jié)束:5
- ** 中途出識別結(jié)果:9
- ** 最終識別結(jié)果:10
- */
- public int state = 0;
- //識別結(jié)果
- public String result;
- //是否開啟語音識別
- //當(dāng)開啟時才寫入PCM流
- boolean isStarted = false;
- //ASR客戶端
- private AsrClient asrClient;
- //ASR監(jiān)聽對象
- private AsrListener listener;
- AsrIntent asrIntent;
- //音頻錄制工具類
- private AudioCaptureUtils audioCaptureUtils;
在構(gòu)造函數(shù)中初始化相關(guān)屬性:
- public AsrUtils(Context context) {
- //實(shí)例化一個單聲道,采集頻率16000HZ的音頻錄制工具類實(shí)例
- this.audioCaptureUtils = new AudioCaptureUtils(AudioStreamInfo.ChannelMask.CHANNEL_IN_MONO, VIDEO_SAMPLE_RATE);
- //初始化降噪音效
- this.audioCaptureUtils.init("com.panda_coder.liedetector");
- //結(jié)果值設(shè)為空
- this.result = "";
- //給錄音控件初始化一個新的線程池
- poolExecutor = new ThreadPoolExecutor(
- POOL_SIZE,
- POOL_SIZE,
- ALIVE_TIME,
- TimeUnit.SECONDS,
- new LinkedBlockingQueue<>(CAPACITY),
- new ThreadPoolExecutor.DiscardOldestPolicy());
- if (asrIntent == null) {
- asrIntent = new AsrIntent();
- //設(shè)置音頻來源為PCM流
- //此處也可設(shè)置為文件
- asrIntent.setAudioSourceType(AsrIntent.AsrAudioSrcType.ASR_SRC_TYPE_PCM);
- asrIntent.setVadEndWaitMs(VAD_END_WAIT_MS);
- asrIntent.setVadFrontWaitMs(VAD_FRONT_WAIT_MS);
- asrIntent.setTimeoutThresholdMs(TIMEOUT_DURATION);
- }
- if (asrClient == null) {
- //實(shí)例化AsrClient
- asrClient = AsrClient.createAsrClient(context).orElse(null);
- }
- if (listener == null) {
- //實(shí)例化MyAsrListener
- listener = new MyAsrListener();
- //初始化AsrClient
- this.asrClient.init(asrIntent, listener);
- }
- }
- //夠建一個實(shí)現(xiàn)AsrListener接口的類MyAsrListener
- class MyAsrListener implements AsrListener {
- @Override
- public void onInit(PacMap pacMap) {
- HiLog.info(TAG, "====== init");
- state = 1;
- }
- @Override
- public void onBeginningOfSpeech() {
- state = 2;
- }
- @Override
- public void onRmsChanged(float v) {
- }
- @Override
- public void onBufferReceived(byte[] bytes) {
- }
- @Override
- public void onEndOfSpeech() {
- state = 3;
- }
- @Override
- public void onError(int i) {
- state = -1;
- if (i == AsrError.ERROR_SPEECH_TIMEOUT) {
- //當(dāng)超時時重新監(jiān)聽
- asrClient.startListening(asrIntent);
- } else {
- HiLog.info(TAG, "======error code:" + i);
- asrClient.stopListening();
- }
- }
- //注意與onIntermediateResults獲取結(jié)果值的區(qū)別
- //pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
- @Override
- public void onResults(PacMap pacMap) {
- state = 10;
- //獲取最終結(jié)果
- //{"result":[{"confidence":0,"ori_word":"你 好 ","pinyin":"NI3 HAO3 ","word":"你好。"}]}
- String results = pacMap.getString(AsrResultKey.RESULTS_RECOGNITION);
- ZSONObject zsonObject = ZSONObject.stringToZSON(results);
- ZSONObject infoObject;
- if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) {
- infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
- String resultWord = infoObject.getString("ori_word").replace(" ", "");
- result += resultWord;
- }
- }
- //中途識別結(jié)果
- //pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE)
- @Override
- public void onIntermediateResults(PacMap pacMap) {
- state = 9;
- // String result = pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE);
- // if (result == null)
- // return;
- // ZSONObject zsonObject = ZSONObject.stringToZSON(result);
- // ZSONObject infoObject;
- // if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) {
- // infoObject = zsonObject.getZSONArray("result").getZSONObject(0);
- // String resultWord = infoObject.getString("ori_word").replace(" ", "");
- // HiLog.info(TAG, "=========== 9 " + resultWord);
- // }
- }
- @Override
- public void onEnd() {
- state = 5;
- //當(dāng)還在錄音時,重新監(jiān)聽
- if (isStarted)
- asrClient.startListening(asrIntent);
- }
- @Override
- public void onEvent(int i, PacMap pacMap) {
- }
- @Override
- public void onAudioStart() {
- state = 2;
- }
- @Override
- public void onAudioEnd() {
- state = 3;
- }
- }
開啟識別與停止識別的函數(shù):
- public void start() {
- if (!this.isStarted) {
- this.isStarted = true;
- asrClient.startListening(asrIntent);
- poolExecutor.submit(new AudioCaptureRunnable());
- }
- }
- public void stop() {
- this.isStarted = false;
- asrClient.stopListening();
- audioCaptureUtils.stop();
- }
- //音頻錄制的線程
- private class AudioCaptureRunnable implements Runnable {
- @Override
- public void run() {
- byte[] buffers = new byte[BYTES_LENGTH];
- //開啟錄音
- audioCaptureUtils.start();
- while (isStarted) {
- //讀取錄音的PCM流
- int ret = audioCaptureUtils.read(buffers, 0, BYTES_LENGTH);
- if (ret <= 0) {
- HiLog.error(TAG, "======Error read data");
- } else {
- //將錄音的PCM流寫入到語音識別服務(wù)中
- //若buffer的長度不為1280或640時,則需要手動處理成1280或640
- asrClient.writePcm(buffers, BYTES_LENGTH);
- }
- }
- }
- }
識別結(jié)果是通過listener的回調(diào)獲取的結(jié)果,所以我們在處理時是將結(jié)果賦值給result,通過getresult或getResultAndClear函數(shù)獲取結(jié)果。
- public String getResult() {
- return result;
- }
- public String getResultAndClear() {
- if (this.result == "")
- return "";
- String results = getResult();
- this.result = "";
- return results;
- }
4.創(chuàng)建一個簡易的JS UI,并通過JS調(diào)ServerAbility的能力調(diào)用Java
hml代碼:
- <div class="container">
- <div>
- <button class="btn" @touchend="start">開啟</button>
- <button class="btn" @touchend="sub">訂閱結(jié)果</button>
- <button class="btn" @touchend="stop">關(guān)閉</button>
- </div>
- <text class="title">
- 語音識別內(nèi)容: {{ text }}
- </text>
- </div>
樣式代碼:
- .container {
- flex-direction: column;
- justify-content: flex-start;
- align-items: center;
- width: 100%;
- height: 100%;
- padding: 10%;
- }
- .title {
- font-size: 20px;
- color: #000000;
- opacity: 0.9;
- text-align: left;
- width: 100%;
- margin: 3% 0;
- }
- .btn{
- padding: 10px 20px;
- margin:3px;
- border-radius: 6px;
- }
js邏輯控制代碼:
- //js調(diào)Java ServiceAbility的工具類
- import { jsCallJavaAbility } from '../../common/JsCallJavaAbilityUtils.js';
- export default {
- data: {
- text: ""
- },
- //開啟事件
- start() {
- jsCallJavaAbility.callAbility("ControllerAbility",100,{}).then(result=>{
- console.log(result)
- })
- },
- //關(guān)閉事件
- stop() {
- jsCallJavaAbility.callAbility("ControllerAbility",101,{}).then(result=>{
- console.log(result)
- })
- jsCallJavaAbility.unSubAbility("ControllerAbility",201).then(result=>{
- if (result.code == 200) {
- console.log("取消訂閱成功");
- }
- })
- },
- //訂閱Java端結(jié)果事件
- sub() {
- jsCallJavaAbility.subAbility("ControllerAbility", 200, (data) => {
- let text = data.data.text
- text && (this.text += text)
- }).then(result => {
- if (result.code == 200) {
- console.log("訂閱成功");
- }
- })
- }
- }
ServerAbility:
- public class ControllerAbility extends Ability {
- AnswerRemote remote = new AnswerRemote();
- AsrUtils asrUtils;
- //訂閱事件的委托
- private static HashMap<Integer, IRemoteObject> remoteObjectHandlers = new HashMap<Integer, IRemoteObject>();
- @Override
- public void onStart(Intent intent) {
- HiLog.error(LABEL_LOG, "ControllerAbility::onStart");
- super.onStart(intent);
- //初始化語音識別工具類
- asrUtils = new AsrUtils(this);
- }
- @Override
- public void onCommand(Intent intent, boolean restart, int startId) {
- }
- @Override
- public IRemoteObject onConnect(Intent intent) {
- super.onConnect(intent);
- return remote.asObject();
- }
- class AnswerRemote extends RemoteObject implements IRemoteBroker {
- AnswerRemote() {
- super("");
- }
- @Override
- public boolean onRemoteRequest(int code, MessageParcel data, MessageParcel reply, MessageOption option) {
- Map<String, Object> zsonResult = new HashMap<String, Object>();
- String zsonStr = data.readString();
- ZSONObject zson = ZSONObject.stringToZSON(zsonStr);
- switch (code) {
- case 100: {
- //當(dāng)js發(fā)送code為100時,開啟語音識別
- asrUtils.start();
- break;
- }
- case 101: {
- //當(dāng)js發(fā)送code為101時,關(guān)閉語音識別
- asrUtils.stop();
- break;
- }
- case 200: {
- //當(dāng)js發(fā)送code為200時,訂閱獲取識別結(jié)果事件
- remoteObjectHandlers.put(200 ,data.readRemoteObject());
- //定時獲取語音識別結(jié)果并返回JS UI
- getAsrText();
- break;
- }
- default: {
- reply.writeString("service not defined");
- return false;
- }
- }
- reply.writeString(ZSONObject.toZSONString(zsonResult));
- return true;
- }
- @Override
- public IRemoteObject asObject() {
- return this;
- }
- }
- public void getAsrText() {
- new Thread(() -> {
- while (true) {
- try {
- Thread.sleep(1 * 500);
- Map<String, Object> zsonResult = new HashMap<String, Object>();
- zsonResult.put("text",asrUtils.getResultAndClear());
- ReportEvent(200, zsonResult);
- } catch (RemoteException | InterruptedException e) {
- break;
- }
- }
- }).start();
- }
- private void ReportEvent(int remoteHandler, Object backData) throws RemoteException {
- MessageParcel data = MessageParcel.obtain();
- MessageParcel reply = MessageParcel.obtain();
- MessageOption option = new MessageOption();
- data.writeString(ZSONObject.toZSONString(backData));
- IRemoteObject remoteObject = remoteObjectHandlers.get(remoteHandler);
- remoteObject.sendRequest(100, data, reply, option);
- reply.reclaim();
- data.reclaim();
- }
- }
至此簡易的語音識別功能完畢。
相關(guān)演示:https://www.bilibili.com/video/BV1E44y177hv/
完整代碼開源:https://gitee.com/panda-coder/harmonyos-apps/tree/master/AsrDemo
51CTO和華為官方合作共建的鴻蒙技術(shù)社區(qū)