自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

鴻蒙AI能力之語音識別

作者：panda_coder 2021-12-24 10:34:11

文章旨在幫助大家開發(fā)錄音及語音識別時少踩一點(diǎn)坑。AI語音識別不需要任何權(quán)限，但此處使用到麥克風(fēng)錄制音頻，就需要申請麥克風(fēng)權(quán)限。

想了解更多內(nèi)容，請訪問：

51CTO和華為官方合作共建的鴻蒙技術(shù)社區(qū)

https://harmonyos.51cto.com

文章旨在幫助大家開發(fā)錄音及語音識別時少踩一點(diǎn)坑。

效果

鴻蒙AI能力之語音識別-鴻蒙HarmonyOS技術(shù)社區(qū)

左側(cè)為簡易UI布局及識別成果，右側(cè)為網(wǎng)易云播放的測試音頻。

開發(fā)步驟

IDE安裝、項目創(chuàng)建等在此略過。App采用SDK版本為API 6，使用JS UI。

1.權(quán)限申請

AI語音識別不需要任何權(quán)限，但此處使用到麥克風(fēng)錄制音頻，就需要申請麥克風(fēng)權(quán)限。

在config.json配置文件中添加權(quán)限：

"reqPermissions": [ 
      { 
        "name": "ohos.permission.MICROPHONE" 
      } 
    ]

在MainAbility中顯示申明麥克風(fēng)權(quán)限：

@Override 
    public void onStart(Intent intent) { 
        super.onStart(intent); 
        requestPermission(); 
    } 
 
    //獲取權(quán)限 
    private void requestPermission() { 
        String[] permission = { 
                "ohos.permission.MICROPHONE", 
        }; 
        List<String> applyPermissions = new ArrayList<>(); 
        for (String element : permission) { 
            if (verifySelfPermission(element) != 0) { 
                if (canRequestPermission(element)) { 
                    applyPermissions.add(element); 
                } 
            } 
        } 
        requestPermissionsFromUser(applyPermissions.toArray(new String[0]), 0); 
    }

2.創(chuàng)建音頻錄制的工具類

首先創(chuàng)建音頻錄制的工具類AudioCaptureUtils。

而音頻錄制需要用到AudioCapturer類，而在創(chuàng)建AudioCapture類時又會用到AudioStreamInfo類及AudioCapturerInfo類，所以我們分別申明以上3個類的變量。

private AudioStreamInfo audioStreamInfo; 
   private AudioCapturer audioCapturer; 
   private AudioCapturerInfo audioCapturerInfo;

在語音識別時對音頻的錄制是由限制的，限制如下：

鴻蒙AI能力之語音識別-鴻蒙HarmonyOS技術(shù)社區(qū)

所以我們在錄制音頻時需要注意：

1.采樣率16000HZ

2.聲道為單聲道

3.僅支持普通話

作為工具類，為了使AudioCaptureUtils能多處使用，我們在創(chuàng)建構(gòu)造函數(shù)時，提供聲道與頻率的參數(shù)重載,并在構(gòu)造函數(shù)中初始化AudioStreamInfo類及AudioCapturerInfo類。

//channelMask 聲道 
//SampleRate 頻率 
  public AudioCaptureUtils(AudioStreamInfo.ChannelMask channelMask, int SampleRate) { 
        this.audioStreamInfo = new AudioStreamInfo.Builder() 
                .encodingFormat(AudioStreamInfo.EncodingFormat.ENCODING_PCM_16BIT) 
                .channelMask(channelMask) 
                .sampleRate(SampleRate) 
                .build(); 
        this.audioCapturerInfo = new AudioCapturerInfo.Builder().audioStreamInfo(audioStreamInfo).build(); 
    }

在init函數(shù)中進(jìn)行audioCapturer的初始化，在初始化時對音效進(jìn)行設(shè)置，默認(rèn)為降噪模式。

//packageName 包名 
 public void init(String packageName) { 
        this.init(SoundEffect.SOUND_EFFECT_TYPE_NS,packageName ); 
    } 
//soundEffect 音效uuid 
//packageName 包名 
   public void init(UUID soundEffect, String packageName) { 
        if (audioCapturer == null || audioCapturer.getState() == AudioCapturer.State.STATE_UNINITIALIZED) 
            audioCapturer = new AudioCapturer(this.audioCapturerInfo); 
        audioCapturer.addSoundEffect(soundEffect, packageName); 
    }

初始化后提供start、stop和destory方法，分別開啟音頻錄制、停止音頻錄制和銷毀,此處都是調(diào)用AudioCapturer類中對應(yīng)函數(shù)。

public void stop(){ 
       this.audioCapturer.stop(); 
   } 
 
   public void destory(){ 
       this.audioCapturer.stop(); 
       this.audioCapturer.release(); 
   } 
 
   public Boolean start() { 
       if (audioCapturer == null) 
           return false; 
       return audioCapturer.start(); 
   }

提供一個讀取音頻流的方法及獲取AudioCapturer實(shí)例的方法。

//buffers 需要寫入的數(shù)據(jù)流 
//offset 數(shù)據(jù)流的偏移量 
//byteslength 數(shù)據(jù)流的長度 
 public int read(byte[] buffers, int offset, int bytesLength){ 
        return audioCapturer.read(buffers,offset,bytesLength); 
    } 
 
//獲取AudioCapturer的實(shí)例audioCapturer 
  public AudioCapturer get(){ 
        return this.audioCapturer; 
    }

3.創(chuàng)建語音識別的工具類

在上面我們已經(jīng)創(chuàng)建好一個音頻錄制的工具類，接下來在創(chuàng)建一個語音識別的工具類 AsrUtils。

我們再回顧一下語音識別的約束與限制：

鴻蒙AI能力之語音識別-鴻蒙HarmonyOS技術(shù)社區(qū)

在此補(bǔ)充一個隱藏限制，PCM流的長度只允許640與1280兩種長度，也就是我們音頻讀取流時只能使用640與1280兩種長度。

接下來我們定義一些基本常量：

//采樣率限定16000HZ 
   private static final int VIDEO_SAMPLE_RATE = 16000; 
//VAD結(jié)束時間 默認(rèn)2000ms 
   private static final int VAD_END_WAIT_MS = 2000; 
//VAD起始時間 默認(rèn)4800ms  
//這兩參數(shù)與識別準(zhǔn)確率有關(guān)，相關(guān)信息可百度查看，在此使用系統(tǒng)默認(rèn) 
   private static final int VAD_FRONT_WAIT_MS = 4800; 
//輸入時常 20000ms 
   private static final int TIMEOUT_DURATION = 20000; 
 
//PCM流長度僅限640或1280 
   private static final int BYTES_LENGTH = 1280; 
//線程池相關(guān)參數(shù) 
   private static final int CAPACITY = 6; 
   private static final int ALIVE_TIME = 3; 
   private static final int POOL_SIZE = 3;

因?yàn)橐诤笈_持續(xù)錄制音頻，所以需要開辟一個新的線程。此處用到j(luò)ava的ThreadPoolExecutor類進(jìn)行線程操作。

定義一個線程池實(shí)例以及其它相關(guān)屬性如下：

//錄音線程 
   private ThreadPoolExecutor poolExecutor;  
 /* 自定義狀態(tài)信息 
    **  錯誤：-1 
    **  初始：0 
    **  init:1 
    **  開始輸入:2 
    **  結(jié)束輸入:3 
    **  識別結(jié)束:5 
    **  中途出識別結(jié)果:9 
    **  最終識別結(jié)果：10 
    */ 
   public int state = 0; 
   //識別結(jié)果 
   public String result; 
//是否開啟語音識別 
//當(dāng)開啟時才寫入PCM流 
   boolean isStarted = false; 
 
//ASR客戶端 
   private AsrClient asrClient; 
//ASR監(jiān)聽對象 
   private AsrListener listener; 
   AsrIntent asrIntent; 
//音頻錄制工具類 
   private AudioCaptureUtils audioCaptureUtils;

在構(gòu)造函數(shù)中初始化相關(guān)屬性：

public AsrUtils(Context context) { 
        //實(shí)例化一個單聲道，采集頻率16000HZ的音頻錄制工具類實(shí)例 
        this.audioCaptureUtils = new AudioCaptureUtils(AudioStreamInfo.ChannelMask.CHANNEL_IN_MONO, VIDEO_SAMPLE_RATE); 
        //初始化降噪音效 
        this.audioCaptureUtils.init("com.panda_coder.liedetector"); 
        //結(jié)果值設(shè)為空 
        this.result = ""; 
        //給錄音控件初始化一個新的線程池 
        poolExecutor = new ThreadPoolExecutor( 
                POOL_SIZE, 
                POOL_SIZE, 
                ALIVE_TIME, 
                TimeUnit.SECONDS, 
                new LinkedBlockingQueue<>(CAPACITY), 
                new ThreadPoolExecutor.DiscardOldestPolicy()); 
 
        if (asrIntent == null) { 
            asrIntent = new AsrIntent(); 
            //設(shè)置音頻來源為PCM流 
            //此處也可設(shè)置為文件             
            asrIntent.setAudioSourceType(AsrIntent.AsrAudioSrcType.ASR_SRC_TYPE_PCM); 
            asrIntent.setVadEndWaitMs(VAD_END_WAIT_MS); 
            asrIntent.setVadFrontWaitMs(VAD_FRONT_WAIT_MS); 
            asrIntent.setTimeoutThresholdMs(TIMEOUT_DURATION); 
        } 
 
        if (asrClient == null) { 
            //實(shí)例化AsrClient 
            asrClient = AsrClient.createAsrClient(context).orElse(null); 
        } 
        if (listener == null) { 
            //實(shí)例化MyAsrListener 
            listener = new MyAsrListener(); 
            //初始化AsrClient 
            this.asrClient.init(asrIntent, listener); 
        } 
     
    } 
 
//夠建一個實(shí)現(xiàn)AsrListener接口的類MyAsrListener  
 class MyAsrListener implements AsrListener { 
 
        @Override 
        public void onInit(PacMap pacMap) { 
            HiLog.info(TAG, "====== init"); 
            state = 1; 
        } 
 
        @Override 
        public void onBeginningOfSpeech() { 
            state = 2; 
        } 
 
        @Override 
        public void onRmsChanged(float v) { 
 
        } 
 
        @Override 
        public void onBufferReceived(byte[] bytes) { 
 
        } 
 
        @Override 
        public void onEndOfSpeech() { 
            state = 3; 
        } 
 
        @Override 
        public void onError(int i) { 
            state = -1; 
            if (i == AsrError.ERROR_SPEECH_TIMEOUT) { 
                //當(dāng)超時時重新監(jiān)聽 
                asrClient.startListening(asrIntent); 
            } else { 
                HiLog.info(TAG, "======error code:" + i); 
                asrClient.stopListening(); 
            } 
        } 
 
        //注意與onIntermediateResults獲取結(jié)果值的區(qū)別 
        //pacMap.getString(AsrResultKey.RESULTS_RECOGNITION); 
        @Override 
        public void onResults(PacMap pacMap) { 
            state = 10; 
            //獲取最終結(jié)果  
            //{"result":[{"confidence":0,"ori_word":"你 好 ","pinyin":"NI3 HAO3 ","word":"你好。"}]} 
            String results = pacMap.getString(AsrResultKey.RESULTS_RECOGNITION); 
            ZSONObject zsonObject = ZSONObject.stringToZSON(results); 
            ZSONObject infoObject; 
            if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) { 
                infoObject = zsonObject.getZSONArray("result").getZSONObject(0); 
                String resultWord = infoObject.getString("ori_word").replace(" ", ""); 
                result += resultWord; 
            } 
        } 
 
        //中途識別結(jié)果 
        //pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE) 
        @Override 
        public void onIntermediateResults(PacMap pacMap) { 
            state = 9; 
//            String result = pacMap.getString(AsrResultKey.RESULTS_INTERMEDIATE); 
//            if (result == null) 
//                return; 
//            ZSONObject zsonObject = ZSONObject.stringToZSON(result); 
//            ZSONObject infoObject; 
//            if (zsonObject.getZSONArray("result").getZSONObject(0) instanceof ZSONObject) { 
//                infoObject = zsonObject.getZSONArray("result").getZSONObject(0); 
//                String resultWord = infoObject.getString("ori_word").replace(" ", ""); 
//                HiLog.info(TAG, "=========== 9 " + resultWord); 
//            } 
        } 
 
 
        @Override 
        public void onEnd() { 
            state = 5; 
            //當(dāng)還在錄音時，重新監(jiān)聽 
            if (isStarted) 
                asrClient.startListening(asrIntent); 
        } 
 
        @Override 
        public void onEvent(int i, PacMap pacMap) { 
 
        } 
 
        @Override 
        public void onAudioStart() { 
            state = 2; 
 
        } 
 
        @Override 
        public void onAudioEnd() { 
            state = 3; 
        } 
    }

開啟識別與停止識別的函數(shù)：

public void start() { 
     if (!this.isStarted) { 
         this.isStarted = true; 
         asrClient.startListening(asrIntent); 
         poolExecutor.submit(new AudioCaptureRunnable()); 
     } 
 } 
 
 public void stop() { 
     this.isStarted = false; 
     asrClient.stopListening(); 
     audioCaptureUtils.stop(); 
 } 
 
//音頻錄制的線程 
 private class AudioCaptureRunnable implements Runnable { 
     @Override 
     public void run() { 
         byte[] buffers = new byte[BYTES_LENGTH]; 
//開啟錄音 
         audioCaptureUtils.start(); 
         while (isStarted) { 
    //讀取錄音的PCM流 
             int ret = audioCaptureUtils.read(buffers, 0, BYTES_LENGTH); 
             if (ret <= 0) { 
                 HiLog.error(TAG, "======Error read data"); 
             } else { 
        //將錄音的PCM流寫入到語音識別服務(wù)中 
        //若buffer的長度不為1280或640時，則需要手動處理成1280或640 
                 asrClient.writePcm(buffers, BYTES_LENGTH); 
             } 
         } 
     } 
 }

識別結(jié)果是通過listener的回調(diào)獲取的結(jié)果，所以我們在處理時是將結(jié)果賦值給result，通過getresult或getResultAndClear函數(shù)獲取結(jié)果。

public String getResult() { 
       return result; 
   } 
 
   public String getResultAndClear() { 
       if (this.result == "") 
           return ""; 
       String results = getResult(); 
       this.result = ""; 
       return results; 
   }

4.創(chuàng)建一個簡易的JS UI，并通過JS調(diào)ServerAbility的能力調(diào)用Java

hml代碼：

<div class="container"> 
    <div> 
        <button class="btn" @touchend="start">開啟</button> 
        <button class="btn" @touchend="sub">訂閱結(jié)果</button> 
        <button class="btn" @touchend="stop">關(guān)閉</button> 
    </div> 
    <text class="title"> 
        語音識別內(nèi)容： {{ text }} 
    </text> 
</div>

樣式代碼：

.container { 
    flex-direction: column; 
    justify-content: flex-start; 
    align-items: center; 
    width: 100%; 
    height: 100%; 
    padding: 10%; 
} 
 
.title { 
    font-size: 20px; 
    color: #000000; 
    opacity: 0.9; 
    text-align: left; 
    width: 100%; 
    margin: 3% 0; 
} 
 
.btn{ 
    padding: 10px 20px; 
    margin:3px; 
    border-radius: 6px; 
}

js邏輯控制代碼：

//js調(diào)Java ServiceAbility的工具類 
import { jsCallJavaAbility } from '../../common/JsCallJavaAbilityUtils.js'; 
 
export default { 
    data: { 
        text: "" 
    }, 
    //開啟事件 
    start() { 
        jsCallJavaAbility.callAbility("ControllerAbility",100,{}).then(result=>{ 
            console.log(result) 
        }) 
    }, 
//關(guān)閉事件 
    stop() { 
        jsCallJavaAbility.callAbility("ControllerAbility",101,{}).then(result=>{ 
            console.log(result) 
        }) 
        jsCallJavaAbility.unSubAbility("ControllerAbility",201).then(result=>{ 
            if (result.code == 200) { 
                console.log("取消訂閱成功"); 
            } 
        }) 
    }, 
//訂閱Java端結(jié)果事件 
    sub() { 
        jsCallJavaAbility.subAbility("ControllerAbility", 200, (data) => { 
            let text = data.data.text 
            text && (this.text += text) 
        }).then(result => { 
            if (result.code == 200) { 
                console.log("訂閱成功"); 
            } 
        }) 
    } 
}

ServerAbility：

public class ControllerAbility extends Ability { 
    AnswerRemote remote = new AnswerRemote(); 
    AsrUtils asrUtils; 
    //訂閱事件的委托 
    private static HashMap<Integer, IRemoteObject> remoteObjectHandlers = new HashMap<Integer, IRemoteObject>(); 
 
    @Override 
    public void onStart(Intent intent) { 
        HiLog.error(LABEL_LOG, "ControllerAbility::onStart"); 
        super.onStart(intent); 
    //初始化語音識別工具類 
        asrUtils = new AsrUtils(this); 
    } 
 
 
    @Override 
    public void onCommand(Intent intent, boolean restart, int startId) { 
    } 
 
    @Override 
    public IRemoteObject onConnect(Intent intent) { 
        super.onConnect(intent); 
        return remote.asObject(); 
    } 
 
    class AnswerRemote extends RemoteObject implements IRemoteBroker { 
        AnswerRemote() { 
            super(""); 
        } 
 
        @Override 
        public boolean onRemoteRequest(int code, MessageParcel data, MessageParcel reply, MessageOption option) { 
            Map<String, Object> zsonResult = new HashMap<String, Object>(); 
            String zsonStr = data.readString(); 
            ZSONObject zson = ZSONObject.stringToZSON(zsonStr); 
            switch (code) { 
                case 100: { 
            //當(dāng)js發(fā)送code為100時，開啟語音識別 
                    asrUtils.start(); 
                    break; 
                } 
                case 101: { 
            //當(dāng)js發(fā)送code為101時，關(guān)閉語音識別 
                    asrUtils.stop(); 
                    break; 
                } 
                case 200: { 
            //當(dāng)js發(fā)送code為200時，訂閱獲取識別結(jié)果事件 
                    remoteObjectHandlers.put(200 ,data.readRemoteObject()); 
            //定時獲取語音識別結(jié)果并返回JS UI                     
                getAsrText(); 
                    break; 
                } 
                default: { 
                    reply.writeString("service not defined"); 
                    return false; 
                } 
            } 
            reply.writeString(ZSONObject.toZSONString(zsonResult)); 
            return true; 
        } 
 
        @Override 
        public IRemoteObject asObject() { 
            return this; 
        } 
    } 
 
    public void getAsrText() { 
        new Thread(() -> { 
            while (true) { 
                try { 
                    Thread.sleep(1 * 500); 
                    Map<String, Object> zsonResult = new HashMap<String, Object>(); 
                    zsonResult.put("text",asrUtils.getResultAndClear()); 
                    ReportEvent(200, zsonResult); 
 
                } catch (RemoteException | InterruptedException e) { 
                    break; 
                } 
            } 
        }).start(); 
    } 
 
    private void ReportEvent(int remoteHandler, Object backData) throws RemoteException { 
        MessageParcel data = MessageParcel.obtain(); 
        MessageParcel reply = MessageParcel.obtain(); 
        MessageOption option = new MessageOption(); 
        data.writeString(ZSONObject.toZSONString(backData)); 
        IRemoteObject remoteObject = remoteObjectHandlers.get(remoteHandler); 
        remoteObject.sendRequest(100, data, reply, option); 
        reply.reclaim(); 
        data.reclaim(); 
    } 
 
}

至此簡易的語音識別功能完畢。

相關(guān)演示：https://www.bilibili.com/video/BV1E44y177hv/

完整代碼開源：https://gitee.com/panda-coder/harmonyos-apps/tree/master/AsrDemo

想了解更多內(nèi)容，請訪問：

51CTO和華為官方合作共建的鴻蒙技術(shù)社區(qū)

https://harmonyos.51cto.com

責(zé)任編輯：jianghua 來源：鴻蒙社區(qū)

鴻蒙 HarmonyOS 應(yīng)用

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營