自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<meter id="2txt5"></meter>

<sub id="2txt5"></sub>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

音頻處理問題難？快使用Tensorflow構(gòu)建一個(gè)語音識(shí)別模型

作者：黃顯東 2021-03-10 18:26:49

開發(fā) 開發(fā)工具

語音識(shí)別在許多行業(yè)都是一個(gè)復(fù)雜的問題。了解有關(guān)處理音頻數(shù)據(jù)以及如何對(duì)聲音樣本進(jìn)行分類的一些基礎(chǔ)知識(shí)對(duì)豐富個(gè)人能力是件很有益的事。

【51CTO.com快譯】本文我們將通過一個(gè)使用Tensorflow對(duì)一些聲音剪輯進(jìn)行分類的例子，幫助你了解足夠的基礎(chǔ)知識(shí)，從而能夠構(gòu)建自己的語音識(shí)別模型。另外，你也可以通過進(jìn)一步的學(xué)習(xí)，將這些概念應(yīng)用到更大、更復(fù)雜的音頻文件中。

本案例的完整代碼可以在??GitHub??上獲取。

獲取數(shù)據(jù)

數(shù)據(jù)收集是數(shù)據(jù)科學(xué)中的難題之一。雖然有很多可用的數(shù)據(jù)，但并不是所有的數(shù)據(jù)都容易用于機(jī)器學(xué)習(xí)問題。因此必須確保數(shù)據(jù)是干凈的、有標(biāo)簽的和完整的。

為了實(shí)現(xiàn)本次案例，我們將使用Google發(fā)布的一些音頻文件，可以在??Github??上獲取。

首先，我們將創(chuàng)建一個(gè)新的Conducto管道。在這里，您可以構(gòu)建，訓(xùn)練和測試模型，并與其他感興趣的人共享鏈接：

###
# Main Pipeline
###
def main() -> co.Serial:
path = "/conducto/data/pipeline"
root = co.Serial(image = get_image())

# Get data from keras for testing and training
root["Get Data"] = co.Exec(run_whole_thing, f"{path}/raw")

return root

然后，開始編寫 run_whole_thing 功能：

def run_whole_thing(out_dir):
os.makedirs(out_dir, exist_ok=True)
# Set seed for experiment reproducibility
seed = 55
tf.random.set_seed(seed)
np.random.seed(seed)
data_dir = pathlib.Path("data/mini_speech_commands")

接下來，設(shè)置目錄以保存音頻文件：

if not data_dir.exists():
# Get the files from external source and put them in an accessible directory
tf.keras.utils.get_file(
'mini_speech_commands.zip',
origin="http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip",
extract=True)

預(yù)處理數(shù)據(jù)

現(xiàn)在將數(shù)據(jù)保存在正確的目錄中，可以將其拆分為訓(xùn)練、測試和驗(yàn)證數(shù)據(jù)集。

首先，我們需要編寫一些函數(shù)來幫助預(yù)處理數(shù)據(jù)，以使其可以在我們的模型中起作用。

我們需要算法能夠理解的數(shù)據(jù)格式。我們將使用卷積神經(jīng)網(wǎng)絡(luò)，所以數(shù)據(jù)需要轉(zhuǎn)換成圖像。

第一個(gè)函數(shù)將把二進(jìn)制音頻文件轉(zhuǎn)換成一個(gè)張量：

# Convert the binary audio file to a tensor
def decode_audio(audio_binary):
audio, _ = tf.audio.decode_wav(audio_binary)
return tf.squeeze(audio, axis=-1)

由于我們有一個(gè)具有原始數(shù)據(jù)的張量，所以我們需要得到匹配它們的標(biāo)簽。這就是下面的函數(shù)通過從文件路徑獲取音頻文件的標(biāo)簽功能：

# Get the label (yes, no, up, down, etc) for an audio file.
def get_label(file_path):
parts = tf.strings.split(file_path, os.path.sep)
return parts[-2]

接下來，我們需要將音頻文件與正確的標(biāo)簽相關(guān)聯(lián)。執(zhí)行此操作并返回一個(gè)可與 Tensorflow配合使用的元組：

# Create a tuple that has the labeled audio files
def get_waveform_and_label(file_path):
label = get_label(file_path)
audio_binary = tf.io.read_file(file_path)
waveform = decode_audio(audio_binary)
return waveform, label

前面我們簡要提到了使用卷積神經(jīng)網(wǎng)絡(luò)(CNN)算法。這是我們處理語音識(shí)別模型的方法之一。通常CNN在圖像數(shù)據(jù)上工作得很好，有助于減少預(yù)處理時(shí)間。

我們要利用這一點(diǎn)，把音頻文件轉(zhuǎn)換成頻譜圖。頻譜圖是頻率頻譜的圖像。如果查看一個(gè)音頻文件，你會(huì)發(fā)現(xiàn)它只是頻率數(shù)據(jù)。因此，我們要寫一個(gè)將音頻數(shù)據(jù)轉(zhuǎn)換成圖像的函數(shù)：

# Convert audio files to images
def get_spectrogram(waveform):
# Padding for files with less than 16000 samples
zero_padding = tf.zeros([16000] - tf.shape(waveform), dtype=tf.float32)
# Concatenate audio with padding so that all audio clips will be of the same length
waveform = tf.cast(waveform, tf.float32)
equal_length = tf.concat([waveform, zero_padding], 0)
spectrogram = tf.signal.stft(
equal_length, frame_length=255, frame_step=128)
spectrogram = tf.abs(spectrogram)

return spectrogram

現(xiàn)在我們已經(jīng)將數(shù)據(jù)格式化為圖像，我們需要將正確的標(biāo)簽應(yīng)用于這些圖像。這與我們制作原始音頻文件的做法類似：

# Label the images created from the audio files and return a tuple
def get_spectrogram_and_label_id(audio, label):
spectrogram = get_spectrogram(audio)
spectrogram = tf.expand_dims(spectrogram, -1)
label_id = tf.argmax(label == commands)
return spectrogram, label_id

我們需要的最后一個(gè) helper 函數(shù)將處理傳遞給它的任何音頻文件集的所有上述操作：

# Preprocess any audio files
def preprocess_dataset(files, autotune, commands):
# Creates the dataset
files_ds = tf.data.Dataset.from_tensor_slices(files)

# Matches audio files with correct labels
output_ds = files_ds.map(get_waveform_and_label,
num_parallel_calls=autotune)
# Matches audio file images to the correct labels
output_dsoutput_dsoutput_ds = output_ds.map(
get_spectrogram_and_label_id, num_parallel_calls=autotune)
return output_ds

當(dāng)已經(jīng)有了所有這些輔助函數(shù)，我們就可以分割數(shù)據(jù)了。

將數(shù)據(jù)拆分為數(shù)據(jù)集

將音頻文件轉(zhuǎn)換為圖像有助于使用CNN更容易處理數(shù)據(jù)，這就是我們編寫所有這些幫助函數(shù)的原因。我們將做一些事情來簡化數(shù)據(jù)的分割。

首先，我們將獲得所有音頻文件的潛在命令列表，我們將在代碼的其他地方使用這些命令：

# Get all of the commands for the audio files
commands = np.array(tf.io.gfile.listdir(str(data_dir)))
commandscommandscommands = commands[commands != 'README.md']

然后我們將得到數(shù)據(jù)目錄中所有文件的列表，并對(duì)其進(jìn)行混洗，以便為每個(gè)需要的數(shù)據(jù)集分配隨機(jī)值：

# Get a list of all the files in the directory
filenames = tf.io.gfile.glob(str(data_dir) + '/*/*')

# Shuffle the file names so that random bunches can be used as the training, testing, and validation sets
filenames = tf.random.shuffle(filenames)

# Create the list of files for training data
train_files = filenames[:6400]

# Create the list of files for validation data
validation_files = filenames[6400: 6400 + 800]

# Create the list of files for test data
test_files = filenames[-800:]

現(xiàn)在，我們已經(jīng)清晰地將培訓(xùn)、驗(yàn)證和測試文件分開，這樣我們就可以繼續(xù)對(duì)這些文件進(jìn)行預(yù)處理，使它們?yōu)闃?gòu)建和測試模型做好準(zhǔn)備。這里使用autotune來在運(yùn)行時(shí)動(dòng)態(tài)調(diào)整參數(shù)的值：

autotune = tf.data.AUTOTUNE

第一個(gè)示例只是為了展示預(yù)處理的工作原理，它給了一些我們需要的spectrogram_ds值：

# Get the converted audio files for training the model
files_ds = tf.data.Dataset.from_tensor_slices(train_files)
waveform_ds = files_ds.map(
get_waveform_and_label, num_parallel_calls=autotune)
spectrogram_ds = waveform_ds.map(
get_spectrogram_and_label_id, num_parallel_calls=autotune)

既然已經(jīng)了解了預(yù)處理的步驟過程，我們可以繼續(xù)使用helper函數(shù)來處理所有數(shù)據(jù)集：

# Preprocess the training, test, and validation datasets
train_ds = preprocess_dataset(train_files, autotune, commands)
validation_ds = preprocess_dataset(
validation_files, autotune, commands)
test_ds = preprocess_dataset(test_files, autotune, commands)

我們要設(shè)置一些訓(xùn)練示例，這些訓(xùn)練示例在每個(gè)時(shí)期的迭代中運(yùn)行，因此我們將設(shè)置批處理大小：

# Batch datasets for training and validation
batch_size = 64
train_dstrain_dstrain_ds = train_ds.batch(batch_size)
validation_dsvalidation_dsvalidation_ds = validation_ds.batch(batch_size)

最后，我們可以利用緩存來減少訓(xùn)練模型時(shí)的延遲：

# Reduce latency while training
train_dstrain_dstrain_ds = train_ds.cache().prefetch(autotune)
validation_dsvalidation_dsvalidation_ds = validation_ds.cache().prefetch(autotune)

最終，我們的數(shù)據(jù)集采用了可以訓(xùn)練模型的形式。

建立模型

由于數(shù)據(jù)集已明確定義，所以我們可以繼續(xù)構(gòu)建模型。我們將使用CNN創(chuàng)建模型，因此我們需要獲取數(shù)據(jù)的形狀以獲取適用于我們圖層的正確形狀，然后我們繼續(xù)按順序構(gòu)建模型：

# Build model
for spectrogram, _ in spectrogram_ds.take(1):
input_shape = spectrogram.shape

num_labels = len(commands)

norm_layer = preprocessing.Normalization()
norm_layer.adapt(spectrogram_ds.map(lambda x, _: x))

model = models.Sequential([
layers.Input(shape=input_shape),
preprocessing.Resizing(32, 32),
norm_layer,
layers.Conv2D(32, 3, activation='relu'),
layers.Conv2D(64, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.25),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dropout(0.5),
layers.Dense(num_labels),
])

model.summary()

我們?cè)谀Ｐ蜕献隽艘恍┡渲?，以便給我們最好的準(zhǔn)確性：

# Configure built model with losses and metrics
model.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=True),
metrics=['accuracy'],
)

模型建立好了，現(xiàn)在剩下的就是訓(xùn)練它了。

訓(xùn)練模型

在所有的工作都對(duì)數(shù)據(jù)進(jìn)行預(yù)處理和建立模型之后，訓(xùn)練就相對(duì)簡單了。我們確定要使用訓(xùn)練和驗(yàn)證數(shù)據(jù)集運(yùn)行多少個(gè)周期：

# Finally train the model and return info about each epoch
EPOCHS = 10
model.fit(
train_ds,
validation_data=validation_ds,
epochs=EPOCHS,
callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2),
)

這樣這個(gè)模型就已經(jīng)訓(xùn)練好了，現(xiàn)在需要對(duì)它進(jìn)行測試。

測試模型

現(xiàn)在我們有了一個(gè)準(zhǔn)確率約為83%的模型，是時(shí)候測試它在新數(shù)據(jù)上的表現(xiàn)了。所以我們使用測試數(shù)據(jù)集并將音頻文件從標(biāo)簽中分離出來：

# Test the model
test_audio = []
test_labels = []

for audio, label in test_ds:
test_audio.append(audio.numpy())
test_labels.append(label.numpy())

test_audio = np.array(test_audio)
test_labels = np.array(test_labels)

然后我們獲取音頻數(shù)據(jù)并在我們的模型中使用它，看看它是否預(yù)測了正確的標(biāo)簽：

# See how accurate the model is when making predictions on the test dataset
y_pred = np.argmax(model.predict(test_audio), axis=1)
y_true = test_labels

test_acc = sum(y_pred == y_true) / len(y_true)

print(f'Test set accuracy: {test_acc:.0%}')

完成管道

只需要編寫一小段代碼就可以完成您的管道并使其與任何人共享。這定義了將在Conducto管道中使用的圖像，并處理文件執(zhí)行:

###
# Pipeline Helper functions
###
def get_image():
return co.Image(
"python:3.8-slim",
copy_dir=".",
reqs_py=["conducto", "tensorflow", "keras"],
)

if __name__ == "__main__":
co.main(default=main)

現(xiàn)在，你可以在終端中運(yùn)行python pipeline.py——它應(yīng)該會(huì)啟動(dòng)一個(gè)到新Conducto管道的鏈接。

結(jié)論

這是解決音頻處理問題的方法之一，但是根據(jù)要分析的數(shù)據(jù)，它可能要復(fù)雜得多。如果將其構(gòu)建在管道中，可以很輕松地與同事共享并在遇到錯(cuò)誤時(shí)獲得幫助或反饋。

【51CTO譯稿，合作站點(diǎn)轉(zhuǎn)載請(qǐng)注明原文譯者和出處為51CTO.com】

責(zé)任編輯：黃顯東來源： hackernoon.com

Tensorflow 語音識(shí)別音頻處理

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

^{<blockquote id="hjrx1"></blockquote>}