自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<style id="hhbxr"></style>

<sub id="hhbxr"></sub>

<ruby id="hhbxr"></ruby>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

自動化機器學(xué)習(xí)指南之四大成熟度模型

作者：陳峻編譯 2019-12-31 09:00:00

人工智能機器學(xué)習(xí) 自動化

人工智能和機器學(xué)習(xí)的概念在數(shù)據(jù)科學(xué)界中已經(jīng)普遍流行。通過這些概念，過去許多需要人工完成的任務(wù)，已經(jīng)能夠被更加高效且準(zhǔn)確的自動化方式所取代?？梢哉f，隨著技術(shù)趨勢的變化，自動化機器學(xué)習(xí)在簡化人工操作的基礎(chǔ)上，既節(jié)省了時間又提高了效率。

【51CTO.com快譯】人工智能和機器學(xué)習(xí)的概念在數(shù)據(jù)科學(xué)界中已經(jīng)普遍流行。通過這些概念，過去許多需要人工完成的任務(wù)，已經(jīng)能夠被更加高效且準(zhǔn)確的自動化方式所取代?？梢哉f，隨著技術(shù)趨勢的變化，自動化機器學(xué)習(xí)在簡化人工操作的基礎(chǔ)上，既節(jié)省了時間又提高了效率。

自動化機器學(xué)習(xí)：自動化訓(xùn)練過程

從概念上說，機器學(xué)習(xí)旨在通過對機器進行訓(xùn)練，通過處理現(xiàn)實數(shù)據(jù)的方式，提供預(yù)期的輸出。也就是說：它能夠讓機器從現(xiàn)有的數(shù)據(jù)(或經(jīng)驗)中得到一定程度的訓(xùn)練，在經(jīng)歷了一番完整的數(shù)據(jù)處理之后，進而生成更準(zhǔn)確的輸出。那么自動化機器學(xué)習(xí)(AutoML)的概念顯然是要使得整個過程徹底實現(xiàn)自動化。

自動化機器學(xué)習(xí)的成熟度模型

自動化機器學(xué)習(xí)的不同實現(xiàn)方式可以根據(jù)它們所對應(yīng)的成熟度，進行不同級別的分類。如下圖所示，模型成熟度越高，表示它對于自動化任務(wù)的支持就越好，當(dāng)然也就需要該模型能夠通過對于數(shù)據(jù)的集中訓(xùn)練，執(zhí)行更多的任務(wù)、提供更多的服務(wù)。

1.超參數(shù)優(yōu)化

當(dāng)數(shù)據(jù)集被提交過來之后，自動化機器學(xué)習(xí)會根據(jù)上述成熟度模型，嘗試著去匹配各種既有的模型，例如：隨機森林、線性回歸等(一般使用的是結(jié)構(gòu)化的數(shù)據(jù))。同時，它會按需為那些應(yīng)用到數(shù)據(jù)上的每一種模型，去優(yōu)化超參數(shù)(hyperparamters)。此類優(yōu)化技術(shù)包括：手動搜索、隨機搜索、網(wǎng)格搜索等。

例如：Auto-sklearn使用貝葉斯模型進行超參數(shù)優(yōu)化，并且能夠提供所需的結(jié)果。不過，在該級別的成熟度模型中，自動化機器學(xué)習(xí)只能執(zhí)行有限的任務(wù)，例如：交叉驗證、機器學(xué)習(xí)算法的選擇、超參數(shù)的優(yōu)化等。當(dāng)然，隨著成熟度水平的提高，自動化機器學(xué)習(xí)將會具有更多的功能，提供更出色的結(jié)果。

2. 一級以上的數(shù)據(jù)預(yù)處理

在第一級中，自動化機器學(xué)習(xí)需要用戶自行實現(xiàn)數(shù)據(jù)的預(yù)處理措施。但是，到了第二級，由于使用了更為成熟的模型，各種數(shù)據(jù)的預(yù)處理任務(wù)便可以由自動化本身來完成，并為進一步處理做好準(zhǔn)備。

通過搜索和了解列的類型，機器學(xué)習(xí)自身完全有能力將所有數(shù)據(jù)(包括一些空值數(shù)據(jù))轉(zhuǎn)換為常見的數(shù)值類型。當(dāng)然，此處并不包括對于數(shù)據(jù)的高級轉(zhuǎn)換與預(yù)處理，這些仍然需要數(shù)據(jù)科學(xué)家自行采取進一步的操作。

對于目標(biāo)任務(wù)而言，系統(tǒng)僅負(fù)責(zé)搜索和選擇適當(dāng)?shù)臋C器學(xué)習(xí)算法。例如：根據(jù)手頭的移動應(yīng)用開發(fā)任務(wù)，設(shè)計自動化機器學(xué)習(xí)的算法與模型，通過對于數(shù)據(jù)的預(yù)處理，以得出所需的預(yù)算、時間、以及其他準(zhǔn)確的結(jié)果。

通過對于數(shù)據(jù)的預(yù)處理，自動化機器學(xué)習(xí)系統(tǒng)能夠構(gòu)建并實現(xiàn)特征選擇、降低維度、數(shù)據(jù)壓縮等功能，進而無縫地執(zhí)行各項訓(xùn)練任務(wù)。

3.找到合適的機器學(xué)習(xí)架構(gòu)

上述第一、二級自動化機器學(xué)習(xí)系統(tǒng)，顯然無法根據(jù)數(shù)據(jù)的性質(zhì)，主動發(fā)現(xiàn)合適的機器學(xué)習(xí)架構(gòu)，并通過執(zhí)行，以確保出色的輸出。而在第三級中，以AutoKeras為代表的開源式自動化機器學(xué)習(xí)庫，實現(xiàn)了神經(jīng)架構(gòu)搜索(NAS，neural architecture search，請參見：https://en.wikipedia.org/wiki/Neural_architecture_search)。該流行架構(gòu)能夠有效地在圖像、語音或文本上實施機器學(xué)習(xí)算法。

因此，數(shù)據(jù)科學(xué)家可以使用不同的神經(jīng)架構(gòu)搜索算法，來增強對于自動化機器學(xué)習(xí)的支持與經(jīng)驗積累。在實際應(yīng)用中，自動駕駛汽車、自動化消費服務(wù)等領(lǐng)域都采用了第三級的自動化機器學(xué)習(xí)系統(tǒng)。

4.相關(guān)領(lǐng)域知識的使用

為了能夠提供準(zhǔn)確的機器學(xué)習(xí)系統(tǒng)輸出，深入地了解數(shù)據(jù)，特別是數(shù)據(jù)的范圍和承載的系統(tǒng)是非常必要的。只有使用相關(guān)領(lǐng)域的知識，并時刻參照所有必需考慮的標(biāo)準(zhǔn)，才能實現(xiàn)復(fù)雜的人工智能的效果。

可見，針對現(xiàn)有相關(guān)領(lǐng)域的知識儲備和在實際場景中的使用，無疑會提高最終結(jié)果的準(zhǔn)確性。與此同時，準(zhǔn)確性的提高也會驅(qū)動出色的預(yù)測能力，并為自動化機器學(xué)習(xí)的各項任務(wù)提供全面的支持。因此，這個級別的成熟度模型注重的是：通過增加背景領(lǐng)域的相關(guān)知識，憑借具有明確結(jié)果導(dǎo)向(result-oriented)的記錄，來提高自動化機器學(xué)習(xí)系統(tǒng)的準(zhǔn)確性。

自動化機器學(xué)習(xí)的實例

從事數(shù)據(jù)科學(xué)研究的人員可以根據(jù)實際應(yīng)用場景的需求，使用各種工具和軟件庫來開發(fā)自動化的流程、以及具有精準(zhǔn)輸出的機器學(xué)習(xí)系統(tǒng)。

自動化機器學(xué)習(xí)的開源庫

目前，業(yè)界有著很多種類的開源庫，能夠支持和滿足開發(fā)人員在其系統(tǒng)中實現(xiàn)各種自動化的機器學(xué)習(xí)需求。

1. AutoKeras

該軟件庫在GitHub上可供開發(fā)人員免費使用。由Data Lab開發(fā)的AutoKeras，旨在提供對于所有深度學(xué)習(xí)(deep learning)工具的訪問，進而增強深度學(xué)習(xí)模型的整體能力。如下代碼是AutoKeras的應(yīng)用示例：

import autokeras as ak 
clf = ak.ImageClassifier() 
clf.fit(x_train, y_train) 
results = clf.predict(x_test)

Python源代碼鏈接：https://github.com/jhfjhfj1/autokeras

2. MLBox

MLBox是另一種使用Python編寫的開源庫。它能夠更快、更輕松地開發(fā)出自動化機器學(xué)習(xí)的各種函數(shù)，其中包含了可用于數(shù)據(jù)預(yù)處理、清理、以及格式化等功能。如下代碼示例展示了在導(dǎo)入數(shù)據(jù)之后，如何進行數(shù)據(jù)預(yù)處理的過程：

from mlbox.preprocessing import * 
from mlbox.optimisation import * 
from mlbox.prediction import * 
paths = ["../input/train.csv","../input/test.csv"] 
target_name = "Survived" 
rd = Reader(sep = ",") 
df = rd.train_test_split(paths, target_name) #reading and preprocessing (dates, ...)

Python源代碼鏈接：https://www.kaggle.com/axelderomblay/running-mlbox-auto-ml-package-on-titanic

3. Auto-sklearn

Auto-sklearn是另一種開源的自動化機器學(xué)習(xí)支持庫。它通過選擇適當(dāng)?shù)臋C器學(xué)習(xí)算法，來研究數(shù)據(jù)的模型和需求。它消除了用戶端對于超參數(shù)處理的要求，進而能夠自行開展處理工作。如下代碼是在數(shù)據(jù)集上實現(xiàn)Auto-sklearn的應(yīng)用示例：

import autosklearn.classification 
import sklearn.model_selection 
import sklearn.datasets 
import sklearn.metrics 
X, y = sklearn.datasets.load_digits(return_X_y=True) 
X_train, X_test, y_train, y_test = \ 
sklearn.model_selection.train_test_split(X, y, random_state=1) 
automl = autosklearn.classification.AutoSklearnClassifier() 
automl.fit(X_train, y_train) 
y_hat = automl.predict(X_test) 
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, y_h

Python源代碼鏈接：https://自動化機器學(xué)習(xí).github.io/auto-sklearn/master/

自動化機器學(xué)習(xí)工具

如下工具雖然是為商業(yè)用途而發(fā)布的，但是它們不但得到了廣泛地使用，而且保證了自動化機器學(xué)習(xí)的輸出質(zhì)量。

DataRobot

DataRobot是第一款支持自動化機器學(xué)習(xí)的工具。它提供了一個能夠?qū)崿F(xiàn)人工智能的先進平臺，在協(xié)助用戶解決各項問題的同時，用戶可以不必過于擔(dān)心其執(zhí)行過程，而能夠輕松地獲取所需的結(jié)果。DataRobot API不但支持預(yù)測，而且能夠使機器通過選擇適當(dāng)?shù)姆椒?，來實現(xiàn)自動化處理，并提供輸出結(jié)果。

如下代碼是DataRobot API的一個應(yīng)用示例。它使用數(shù)據(jù)集來預(yù)測30天之內(nèi)各家醫(yī)院的患者可能再次入院的幾率。

import datarobot as dr  
import pandas as pd  
pd.options.display.max_columns = 1000  
import numpy as np  
import time  
import matplotlib.pyplot as plt  
from jupyterthemes import jtplot  
# currently installed theme will be used to set plot style if no arguments provided  
jtplot.style()  
get_ipython().magic('matplotlib inline')  
# load input data  
df = pd.read_csv('../demo_data/10kDiabetes.csv')  
# initialize datarobot client instance  
dr.Client(config_path='/Users/benjamin.miller/.config/datarobot/my_drconfig.yaml')  
# create 100 samples with replacement from the original 10K diabetes dataset 
samples = []  
for i in range(100):  
samples.append(df.sample(10000, replace=True))  
# loop through each sample dataframe  
for i, s in enumerate(samples):  
# initialize project  
project = dr.Project.start  
(  
project_name='API_Test_{}'.format(i+20),  
sourcedata=s,  
target='readmitted',  
worker_count=2  
)  
# get all projects  
projects = []  
for project in dr.Project.list():  
if "API_Test" in project.project_name:  
projects.append(project)  
# *For each project...*  
# Make predictions on the original dataset using the most accurate model  
# initialize list of all predictions for consolidating results  
bootstrap_predictions = []  
# loop through each relevant project to get predictions on original input dataset  
for project in projects:  
# get best performing model  
model = dr.Model.get(project=project.id, model_id=project.get_models()[0].id)  
# upload dataset  
new_data = project.upload_dataset(df) 
# start a predict job  
predict_job = model.request_predictions(new_data.id)  
# get job status every 5 seconds and move on once 'inprogress'  
for i in range(100):  
time.sleep(5) 
try:  
job_status = dr.PredictJob.get(  
project_id=project.id,  
predict_job_id=predict_job.id  
).status  
except: # normally the job_status would produce an error when it is completed 
break  
# now the predictions are finished  
predictions = dr.PredictJob.get_predictions( 
project_id=project.id, 
predict_job_id=predict_job.id 
)  
# extract row ids and positive probabilities for all records and set to dictionary  
pred_dict = {k: v for k, v in zip(predictions.row_id, predictions.positive_probability)} 
# append prediction dictionary to bootstrap predictions 
bootstrap_predictions.append(pred_dict)  
# combine all predictions into single dataframe with keys as ids  
# each record is a row, each column is a set of predictions pertaining to  
# a model created from a bootstrapped dataset  
df_predictions = pd.DataFrame(bootstrap_predictions).T  
# add mean predictions for each observation in df_predictions  
df_predictions['mean'] = df_predictions.mean(axis=1)  
# place each record into equal sized probability groups using the mean  
df_predictions['probability_group'] = pd.qcut(df_predictions['mean'], 10)  
# aggregate all predictions for each probability group  
d = {} # dictionary to contain {Interval(probability_group): array([predictions])}  
for pg in set(df_predictions.probability_group):  
# combine all predictions for a given group  
frame = df_predictions[df_predictions.probability_group == pg].iloc[:, 0:100]  
d[str(pg)] = frame.as_matrix().flatten()  
# create dataframe from all probability group predictions  
df_pg = pd.DataFrame(d)  
# create boxplots in order of increasing probability ranges 
props = dict(boxes='slategray', medians='black', whiskers='slategray') 
viz = df_pg.plot.box(color=props, figsize=(15,7), patch_artist=True, rot=45) 
grid = viz.grid(False, axis='x') 
ylab = viz.set_ylabel('Readmission Probability') 
xlab = viz.set_xlabel('Mean Prediction Probability Ranges') 
title = viz.set_title( 
label='Expected Prediction Distributions by Readmission Prediction Range', 
fontsize=18 
)

Python源代碼鏈接：https://blog.datarobot.com/estimation-of-prediction-distributions-using-datarobot

H2O.ai

另一款支持人工智能的服務(wù)平臺工具是H2O。它主要被用于完成諸如：無人駕駛的AI結(jié)果輸出等方面的機器學(xué)習(xí)任務(wù)。

總結(jié)

除了上述提到的工具和軟件庫，市場上也有著諸如：Google AutoML(https://cloud.google.com/automl/)等其他商業(yè)類型的解決方案。它們在實現(xiàn)機器學(xué)習(xí)相關(guān)概念的同時，推動了自動化數(shù)據(jù)的訓(xùn)練，并能夠提供出色的結(jié)果與預(yù)測。如今，隨著自動化機器學(xué)習(xí)效果的凸顯，人工智能技術(shù)也得到了不斷地增強，越來越多的企業(yè)都能夠從此類系統(tǒng)的輸出結(jié)果中持續(xù)受益。

原文標(biāo)題：A Beginner's Guide to Automated Machine Learning: 4 Maturity Models to Understand，作者：Manoj Rupareliya

【51CTO譯稿，合作站點轉(zhuǎn)載請注明原文譯者和出處為51CTO.com】

責(zé)任編輯：龐桂玉來源： 51CTO

機器學(xué)習(xí)人工智能 AI

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營