自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<xmp id="rncte"><style id="rncte"><rp id="rncte"></rp></style></xmp>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

如何爬升用于機(jī)器學(xué)習(xí)的測(cè)試集

作者：佚名 2021-03-12 11:00:14

人工智能機(jī)器學(xué)習(xí)

在本教程中，您將發(fā)現(xiàn)如何爬升用于機(jī)器學(xué)習(xí)的測(cè)試集。

爬坡測(cè)試集是一種在不影響訓(xùn)練集甚至開(kāi)發(fā)預(yù)測(cè)模型的情況下，在機(jī)器學(xué)習(xí)競(jìng)賽中實(shí)現(xiàn)良好或完美預(yù)測(cè)的方法。作為機(jī)器學(xué)習(xí)競(jìng)賽的一種方法，這是理所當(dāng)然的，大多數(shù)競(jìng)賽平臺(tái)都對(duì)其施加了限制，以防止出現(xiàn)這種情況，這一點(diǎn)很重要。但是，爬坡測(cè)試集是機(jī)器學(xué)習(xí)從業(yè)人員在參加比賽時(shí)不小心做的事情。通過(guò)開(kāi)發(fā)一個(gè)明確的實(shí)現(xiàn)來(lái)爬升測(cè)試集，它有助于更好地了解通過(guò)過(guò)度使用測(cè)試數(shù)據(jù)集來(lái)評(píng)估建模管道而過(guò)度擬合測(cè)試數(shù)據(jù)集的難易程度。

在本教程中，您將發(fā)現(xiàn)如何爬升用于機(jī)器學(xué)習(xí)的測(cè)試集。完成本教程后，您將知道：

無(wú)需查看訓(xùn)練數(shù)據(jù)集，就可以通過(guò)爬上測(cè)試集來(lái)做出完美的預(yù)測(cè)。
如何為分類和回歸任務(wù)爬坡測(cè)試集。
當(dāng)我們過(guò)度使用測(cè)試集來(lái)評(píng)估建模管道時(shí)，我們暗中爬升了測(cè)試集。

教程概述

本教程分為五個(gè)部分。他們是：

爬坡測(cè)試儀
爬山算法
如何進(jìn)行爬山
爬坡糖尿病分類數(shù)據(jù)集
爬坡房屋回歸數(shù)據(jù)集

爬坡測(cè)試儀

像Kaggle上的機(jī)器學(xué)習(xí)比賽一樣，機(jī)器學(xué)習(xí)比賽提供了完整的訓(xùn)練數(shù)據(jù)集以及測(cè)試集的輸入。給定比賽的目的是預(yù)測(cè)目標(biāo)值，例如測(cè)試集的標(biāo)簽或數(shù)值。針對(duì)隱藏的測(cè)試設(shè)置目標(biāo)值評(píng)估解決方案，并進(jìn)行適當(dāng)評(píng)分。與測(cè)試集得分最高的參賽作品贏得了比賽。機(jī)器學(xué)習(xí)競(jìng)賽的挑戰(zhàn)可以被定義為一個(gè)優(yōu)化問(wèn)題。傳統(tǒng)上，競(jìng)賽參與者充當(dāng)優(yōu)化算法，探索導(dǎo)致不同組預(yù)測(cè)的不同建模管道，對(duì)預(yù)測(cè)進(jìn)行評(píng)分，然后對(duì)管道進(jìn)行更改以期望獲得更高的分?jǐn)?shù)。此過(guò)程也可以直接用優(yōu)化算法建模，無(wú)需查看訓(xùn)練集就可以生成和評(píng)估候選預(yù)測(cè)。通常，這稱為爬山測(cè)試集，作為解決此問(wèn)題的最簡(jiǎn)單的優(yōu)化算法之一就是爬山算法。盡管在實(shí)際的機(jī)器學(xué)習(xí)競(jìng)賽中應(yīng)該正確地爬升測(cè)試集，但是實(shí)施該方法以了解該方法的局限性和過(guò)度安裝測(cè)試集的危險(xiǎn)可能是一個(gè)有趣的練習(xí)。此外，無(wú)需接觸訓(xùn)練數(shù)據(jù)集就可以完美預(yù)測(cè)測(cè)試集的事實(shí)常常使很多初學(xué)者機(jī)器學(xué)習(xí)從業(yè)人員感到震驚。最重要的是，當(dāng)我們反復(fù)評(píng)估不同的建模管道時(shí)，我們暗中爬升了測(cè)試集。風(fēng)險(xiǎn)是測(cè)試集的分?jǐn)?shù)得到了提高，但代價(jià)是泛化誤差增加，即在更廣泛的問(wèn)題上表現(xiàn)較差。進(jìn)行機(jī)器學(xué)習(xí)競(jìng)賽的人們都非常清楚這個(gè)問(wèn)題，并且對(duì)預(yù)測(cè)評(píng)估施加了限制以應(yīng)對(duì)該問(wèn)題，例如將評(píng)估限制為每天一次或幾次，并在測(cè)試集的隱藏子集而不是整個(gè)測(cè)試集上報(bào)告分?jǐn)?shù)。。有關(guān)更多信息，請(qǐng)參閱進(jìn)一步閱讀部分中列出的論文。接下來(lái)，讓我們看看如何實(shí)施爬坡算法來(lái)優(yōu)化測(cè)試集的預(yù)測(cè)。

爬山算法

爬山算法是一種非常簡(jiǎn)單的優(yōu)化算法。它涉及生成候選解決方案并進(jìn)行評(píng)估。然后是逐步改進(jìn)的起點(diǎn)，直到無(wú)法實(shí)現(xiàn)進(jìn)一步的改進(jìn)，或者我們用光了時(shí)間，資源或興趣。從現(xiàn)有候選解決方案中生成新的候選解決方案。通常，這涉及對(duì)候選解決方案進(jìn)行單個(gè)更改，對(duì)其進(jìn)行評(píng)估，并且如果候選解決方案與先前的當(dāng)前解決方案一樣好或更好，則將該候選解決方案接受為新的“當(dāng)前”解決方案。否則，將其丟棄。我們可能會(huì)認(rèn)為只接受分?jǐn)?shù)更高的候選人是一個(gè)好主意。對(duì)于許多簡(jiǎn)單問(wèn)題，這是一種合理的方法，盡管在更復(fù)雜的問(wèn)題上，希望接受具有相同分?jǐn)?shù)的不同候選者，以幫助搜索過(guò)程縮放要素空間中的平坦區(qū)域（高原）。當(dāng)爬上測(cè)試集時(shí)，候選解決方案是預(yù)測(cè)列表。對(duì)于二進(jìn)制分類任務(wù)，這是兩個(gè)類的0和1值的列表。對(duì)于回歸任務(wù)，這是目標(biāo)變量范圍內(nèi)的數(shù)字列表。對(duì)候選分類解決方案的修改將是選擇一個(gè)預(yù)測(cè)并將其從0翻轉(zhuǎn)為1或從1翻轉(zhuǎn)為0。對(duì)回歸進(jìn)行候選解決方案的修改將是將高斯噪聲添加到列表中的一個(gè)值或替換一個(gè)值在列表中使用新值。解決方案的評(píng)分涉及計(jì)算評(píng)分指標(biāo)，例如分類任務(wù)的分類準(zhǔn)確性或回歸任務(wù)的平均絕對(duì)誤差。現(xiàn)在我們已經(jīng)熟悉了算法，現(xiàn)在就來(lái)實(shí)現(xiàn)它。

如何進(jìn)行爬山

我們將在綜合分類任務(wù)上開(kāi)發(fā)爬坡算法。首先，我們創(chuàng)建一個(gè)包含許多輸入變量和5,000行示例的二進(jìn)制分類任務(wù)。然后，我們可以將數(shù)據(jù)集分為訓(xùn)練集和測(cè)試集。下面列出了完整的示例。

# example of a synthetic dataset.  
from sklearn.datasets import make_classification  
from sklearn.model_selection import train_test_split  
# define dataset  
X, y = make_classification(n_samples=5000, n_features=20, n_informative=15, n_redundant=5, random_state=1)  
print(X.shape, y.shape)  
# split dataset  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1) 
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

運(yùn)行示例首先報(bào)告創(chuàng)建的數(shù)據(jù)集的形狀，顯示5,000行和20個(gè)輸入變量。然后將數(shù)據(jù)集分為訓(xùn)練集和測(cè)試集，其中約3,300個(gè)用于訓(xùn)練，約1,600個(gè)用于測(cè)試。

(5000, 20) (5000,)  
(3350, 20) (1650, 20) (3350,) (1650,)

現(xiàn)在我們可以開(kāi)發(fā)一個(gè)登山者。首先，我們可以創(chuàng)建一個(gè)將加載的函數(shù)，或者在這種情況下，定義數(shù)據(jù)集。當(dāng)我們要更改數(shù)據(jù)集時(shí)，可以稍后更新此功能。

# load or prepare the classification dataset  
def load_dataset():  
 return make_classification(n_samples=5000, n_features=20, n_informative=15, n_redundant=5, random_state=1)

接下來(lái)，我們需要一個(gè)函數(shù)來(lái)評(píng)估候選解決方案，即預(yù)測(cè)列表。我們將使用分類精度，其中分?jǐn)?shù)范圍在0（最壞的解決方案）到1（完美的預(yù)測(cè)集）之間。

# evaluate a set of predictions  
def evaluate_predictions(y_test, yhat):  
 return accuracy_score(y_test, yhat)

接下來(lái)，我們需要一個(gè)函數(shù)來(lái)創(chuàng)建初始候選解決方案。這是0和1類標(biāo)簽的預(yù)測(cè)列表，長(zhǎng)度足以匹配測(cè)試集中的示例數(shù)，在這種情況下為1650。我們可以使用randint（）函數(shù)生成0和1的隨機(jī)值。

# create a random set of predictions  
def random_predictions(n_examples):  
 return [randint(0, 1) for _ in range(n_examples)]

接下來(lái)，我們需要一個(gè)函數(shù)來(lái)創(chuàng)建候選解決方案的修改版本。在這種情況下，這涉及在解決方案中選擇一個(gè)值并將其從0翻轉(zhuǎn)為1或從1翻轉(zhuǎn)為0。通常，我們會(huì)在爬坡期間對(duì)每個(gè)新的候選解決方案進(jìn)行一次更改，但是我已經(jīng)對(duì)該函數(shù)進(jìn)行了參數(shù)化，因此您可以根據(jù)需要探索多個(gè)更改。

# modify the current set of predictions  
def modify_predictions(current, n_changes=1):  
 # copy current solution  
 updated = current.copy() 
 for i in range(n_changes):  
  # select a point to change  
  ix = randint(0, len(updated)-1)  
  # flip the class label  
  updated[ix] = 1 - updated[ix]  
 return updated

到現(xiàn)在為止還挺好。接下來(lái)，我們可以開(kāi)發(fā)執(zhí)行搜索的功能。首先，通過(guò)調(diào)用random_predictions（）函數(shù)和隨后的validate_predictions（）函數(shù)來(lái)創(chuàng)建和評(píng)估初始解決方案。然后，我們循環(huán)進(jìn)行固定次數(shù)的迭代，并通過(guò)調(diào)用Modify_predictions（）生成一個(gè)新的候選值，對(duì)其進(jìn)行求值，如果分?jǐn)?shù)與當(dāng)前解決方案相同或更好，則將其替換。當(dāng)我們完成預(yù)設(shè)的迭代次數(shù)（任意選擇）或達(dá)到理想分?jǐn)?shù)時(shí)，該循環(huán)結(jié)束，在這種情況下，我們知道其精度為1.0（100％）。下面的函數(shù)hill_climb_testset（）實(shí)現(xiàn)了此功能，將測(cè)試集作為輸入并返回在爬坡過(guò)程中發(fā)現(xiàn)的最佳預(yù)測(cè)集。

# run a hill climb for a set of predictions  
def hill_climb_testset(X_test, y_test, max_iterations):  
 scores = list()  
 # generate the initial solution  
 solution = random_predictions(X_test.shape[0])  
 # evaluate the initial solution  
 score = evaluate_predictions(y_test, solution) 
 scores.append(score)  
 # hill climb to a solution  
 for i in range(max_iterations):  
  # record scores  
  scores.append(score)  
  # stop once we achieve the best score  
  if score == 1.0:  
   break  
  # generate new candidate  
  candidate = modify_predictions(solution)  
  # evaluate candidate  
  value = evaluate_predictions(y_test, candidate)  
  # check if it is as good or better  
  if value >= score:  
   solution, score = candidate, value  
   print('>%d, score=%.3f' % (i, score))  
 return solution, scores

這里的所有都是它的。下面列出了爬坡測(cè)試裝置的完整示例。

# example of hill climbing the test set for a classification task  
from random import randint  
from sklearn.datasets import make_classification  
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score  
from matplotlib import pyplot   
# load or prepare the classification dataset  
def load_dataset():  
 return make_classification(n_samples=5000, n_features=20, n_informative=15, n_redundant=5, random_state=1)   
# evaluate a set of predictions  
def evaluate_predictions(y_test, yhat):  
 return accuracy_score(y_test, yhat)   
# create a random set of predictions  
def random_predictions(n_examples):  
 return [randint(0, 1) for _ in range(n_examples)] 
# modify the current set of predictions  
def modify_predictions(current, n_changes=1):  
 # copy current solution  
 updated = current.copy()  
 for i in range(n_changes):  
  # select a point to change  
  ix = randint(0, len(updated)-1)  
  # flip the class label  
  updated[ix] = 1 - updated[ix]  
 return updated   
# run a hill climb for a set of predictions  
def hill_climb_testset(X_test, y_test, max_iterations):  
 scores = list()  
 # generate the initial solution  
 solution = random_predictions(X_test.shape[0])  
 # evaluate the initial solution  
 score = evaluate_predictions(y_test, solution)  
 scores.append(score)  
 # hill climb to a solution  
 for i in range(max_iterations):  
  # record scores  
  scores.append(score)  
  # stop once we achieve the best score  
  if score == 1.0:  
   break  
  # generate new candidate  
  candidate = modify_predictions(solution)  
  # evaluate candidate  
  value = evaluate_predictions(y_test, candidate)  
  # check if it is as good or better  
  if value >= score:  
   solution, score = candidate, value  
   print('>%d, score=%.3f' % (i, score))  
 return solution, scores  
# load the dataset  
X, y = load_dataset()  
print(X.shape, y.shape)  
# split dataset into train and test sets  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)  
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)  
# run hill climb  
yhat, scores = hill_climb_testset(X_test, y_test, 20000)  
# plot the scores vs iterations 
pyplot.plot(scores)  
pyplot.show()

運(yùn)行示例將使搜索進(jìn)行20,000次迭代，或者如果達(dá)到理想的準(zhǔn)確性，則停止搜索。注意：由于算法或評(píng)估程序的隨機(jī)性，或者數(shù)值精度的差異，您的結(jié)果可能會(huì)有所不同?？紤]運(yùn)行該示例幾次并比較平均結(jié)果。在這種情況下，我們?cè)诩s12,900次迭代中找到了一組理想的測(cè)試集預(yù)測(cè)?；叵胍幌拢@是在不接觸訓(xùn)練數(shù)據(jù)集且不通過(guò)查看測(cè)試集目標(biāo)值進(jìn)行欺騙的情況下實(shí)現(xiàn)的。相反，我們只是簡(jiǎn)單地優(yōu)化了一組數(shù)字。這里的教訓(xùn)是，將測(cè)試管道用作爬山優(yōu)化算法，對(duì)測(cè)試集重復(fù)建模管道的評(píng)估會(huì)做同樣的事情。解決方案將過(guò)度適合測(cè)試集。

...  
>8092, score=0.996  
>8886, score=0.997  
>9202, score=0.998  
>9322, score=0.998  
>9521, score=0.999  
>11046, score=0.999  
>12932, score=1.000

還創(chuàng)建了優(yōu)化進(jìn)度圖。這有助于了解優(yōu)化算法的更改（例如，在坡道上更改內(nèi)容的選擇以及更改方式）如何影響搜索的收斂性。

爬坡糖尿病分類數(shù)據(jù)集

我們將使用糖尿病數(shù)據(jù)集作為探索爬坡測(cè)試集以解決分類問(wèn)題的基礎(chǔ)。每條記錄都描述了女性的醫(yī)療細(xì)節(jié)，并且預(yù)測(cè)是未來(lái)五年內(nèi)糖尿病的發(fā)作。

數(shù)據(jù)集詳細(xì)信息：pima-indians-diabetes.names數(shù)據(jù)集：pima-indians-diabetes.csv

數(shù)據(jù)集有八個(gè)輸入變量和768行數(shù)據(jù)；輸入變量均為數(shù)字，目標(biāo)具有兩個(gè)類別標(biāo)簽，例如這是一個(gè)二進(jìn)制分類任務(wù)。下面提供了數(shù)據(jù)集前五行的示例。

6,148,72,35,0,33.6,0.627,50,1  
1,85,66,29,0,26.6,0.351,31,0  
8,183,64,0,0,23.3,0.672,32,1  
1,89,66,23,94,28.1,0.167,21,0  
0,137,40,35,168,43.1,2.288,33,1  
...

我們可以使用Pandas直接加載數(shù)據(jù)集，如下所示。

# load or prepare the classification dataset  
def load_dataset(): 
  url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv'  
 df = read_csv(url, header=None) 
 data = df.values  
 return data[:, :-1], data[:, -1]

其余代碼保持不變。創(chuàng)建該文件是為了使您可以放入自己的二進(jìn)制分類任務(wù)并進(jìn)行嘗試。下面列出了完整的示例。

# example of hill climbing the test set for the diabetes dataset  
from random import randint  
from pandas import read_csv  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import accuracy_score  
from matplotlib import pyplot   
# load or prepare the classification dataset  
def load_dataset():  
 url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv'  
 df = read_csv(url, header=None)  
 data = df.values  
 return data[:, :-1], data[:, -1]  
# evaluate a set of predictions  
def evaluate_predictions(y_test, yhat):  
 return accuracy_score(y_test, yhat)   
# create a random set of predictions  
def random_predictions(n_examples):  
 return [randint(0, 1) for _ in range(n_examples)]  
# modify the current set of predictions  
def modify_predictions(current, n_changes=1):  
 # copy current solution  
 updated = current.copy()  
 for i in range(n_changes):  
  # select a point to change  
  ix = randint(0, len(updated)-1)  
  # flip the class label  
  updated[ix] = 1 - updated[ix]  
 return updated   
# run a hill climb for a set of predictions  
def hill_climb_testset(X_test, y_test, max_iterations):  
 scores = list()  
 # generate the initial solution  
 solution = random_predictions(X_test.shape[0])  
 # evaluate the initial solution  
 score = evaluate_predictions(y_test, solution)  
 scores.append(score)  
 # hill climb to a solution  
 for i in range(max_iterations):  
  # record scores  
  scores.append(score)  
  # stop once we achieve the best score  
  if score == 1.0: 
   break  
  # generate new candidate  
  candidate = modify_predictions(solution)  
  # evaluate candidate  
  value = evaluate_predictions(y_test, candidate)  
  # check if it is as good or better  
  if value >= score:  
   solution, score = candidate, value  
   print('>%d, score=%.3f' % (i, score))  
 return solution, scores  
# load the dataset  
X, y = load_dataset()  
print(X.shape, y.shape)  
# split dataset into train and test sets  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)  
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)  
# run hill climb  
yhat, scores = hill_climb_testset(X_test, y_test, 5000)  
# plot the scores vs iterations  
pyplot.plot(scores)  
pyplot.show()

運(yùn)行示例將報(bào)告每次搜索過(guò)程中看到改進(jìn)時(shí)的迭代次數(shù)和準(zhǔn)確性。

在這種情況下，我們使用的迭代次數(shù)較少，因?yàn)橐M(jìn)行的預(yù)測(cè)較少，因此優(yōu)化起來(lái)比較簡(jiǎn)單。

注意：由于算法或評(píng)估程序的隨機(jī)性，或者數(shù)值精度的差異，您的結(jié)果可能會(huì)有所不同?？紤]運(yùn)行該示例幾次并比較平均結(jié)果。

在這種情況下，我們可以看到在大約1,500次迭代中達(dá)到了完美的精度。

...  
>617, score=0.961  
>627, score=0.965  
>650, score=0.969  
>683, score=0.972  
>743, score=0.976  
>803, score=0.980  
>817, score=0.984  
>945, score=0.988  
>1350, score=0.992  
>1387, score=0.996  
>1565, score=1.000

還創(chuàng)建了搜索進(jìn)度的折線圖，表明收斂迅速。

爬坡房屋回歸數(shù)據(jù)集

我們將使用住房數(shù)據(jù)集作為探索爬坡測(cè)試集回歸問(wèn)題的基礎(chǔ)。住房數(shù)據(jù)集包含給定房屋及其附近地區(qū)詳細(xì)信息的數(shù)千美元房屋價(jià)格預(yù)測(cè)。

數(shù)據(jù)集詳細(xì)信息：housing.names數(shù)據(jù)集：housing.csv

這是一個(gè)回歸問(wèn)題，這意味著我們正在預(yù)測(cè)一個(gè)數(shù)值。共有506個(gè)觀測(cè)值，其中包含13個(gè)輸入變量和一個(gè)輸出變量。下面列出了前五行的示例。

0.00632,18.00,2.310,0,0.5380,6.5750,65.20,4.0900,1,296.0,15.30,396.90,4.98,24.00  
0.02731,0.00,7.070,0,0.4690,6.4210,78.90,4.9671,2,242.0,17.80,396.90,9.14,21.60 
0.02729,0.00,7.070,0,0.4690,7.1850,61.10,4.9671,2,242.0,17.80,392.83,4.03,34.70  
0.03237,0.00,2.180,0,0.4580,6.9980,45.80,6.0622,3,222.0,18.70,394.63,2.94,33.40  
0.06905,0.00,2.180,0,0.4580,7.1470,54.20,6.0622,3,222.0,18.70,396.90,5.33,36.20 
 ...

首先，我們可以更新load_dataset（）函數(shù)以加載住房數(shù)據(jù)集。作為加載數(shù)據(jù)集的一部分，我們將標(biāo)準(zhǔn)化目標(biāo)值。由于我們可以將浮點(diǎn)值限制在0到1的范圍內(nèi)，這將使爬坡的預(yù)測(cè)更加簡(jiǎn)單。通常不需要這樣做，只是此處采用的簡(jiǎn)化搜索算法的方法。

# load or prepare the classification dataset  
def load_dataset():  
 url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'  
 df = read_csv(url, header=None)  
 data = df.values  
 X, y = data[:, :-1], data[:, -1]  
 # normalize the target  
 scaler = MinMaxScaler()  
 yy = y.reshape((len(y), 1))  
 y = scaler.fit_transform(y)  
 return X, y

接下來(lái)，我們可以更新評(píng)分函數(shù)，以使用預(yù)期值和預(yù)測(cè)值之間的平均絕對(duì)誤差。

# evaluate a set of predictions  
def evaluate_predictions(y_test, yhat):  
 return mean_absolute_error(y_test, yhat)

我們還必須將解決方案的表示形式從0和1標(biāo)簽更新為介于0和1之間的浮點(diǎn)值。必須更改初始候選解的生成以創(chuàng)建隨機(jī)浮點(diǎn)列表。

# create a random set of predictions  
def random_predictions(n_examples):  
 return [random() for _ in range(n_examples)]

在這種情況下，對(duì)解決方案所做的單個(gè)更改以創(chuàng)建新的候選解決方案，包括簡(jiǎn)單地用新的隨機(jī)浮點(diǎn)數(shù)替換列表中的隨機(jī)選擇的預(yù)測(cè)。我選擇它是因?yàn)樗芎?jiǎn)單。

# modify the current set of predictions  
def modify_predictions(current, n_changes=1):  
 # copy current solution  
 updated = current.copy()  
 for i in range(n_changes):  
  # select a point to change  
  ix = randint(0, len(updated)-1)  
  # flip the class label  
  updated[ix] = random()  
 return updated

更好的方法是將高斯噪聲添加到現(xiàn)有值，我將其作為擴(kuò)展留給您。如果您嘗試過(guò)，請(qǐng)?jiān)谙旅娴脑u(píng)論中告訴我。例如：

# add gaussian noise  
updated[ix] += gauss(0, 0.1)

最后，必須更新搜索。最佳值現(xiàn)在是錯(cuò)誤0.0，如果發(fā)現(xiàn)錯(cuò)誤，該錯(cuò)誤將用于停止搜索。

# stop once we achieve the best score  
if score == 0.0:  
 break

我們還需要將搜索從最大分?jǐn)?shù)更改為現(xiàn)在最小分?jǐn)?shù)。

# check if it is as good or better  
if value <= score:  
 solution, score = candidate, value  
 print('>%d, score=%.3f' % (i, score))

下面列出了具有這兩個(gè)更改的更新的搜索功能。

# run a hill climb for a set of predictions  
def hill_climb_testset(X_test, y_test, max_iterations):  
 scores = list()  
 # generate the initial solution  
 solution = random_predictions(X_test.shape[0])  
 # evaluate the initial solution  
 score = evaluate_predictions(y_test, solution)  
 print('>%.3f' % score)  
 # hill climb to a solution  
 for i in range(max_iterations):  
  # record scores  
  scores.append(score)  
  # stop once we achieve the best score  
  if score == 0.0:  
   break  
  # generate new candidate  
  candidate = modify_predictions(solution)  
  # evaluate candidate  
  value = evaluate_predictions(y_test, candidate)  
  # check if it is as good or better  
  if value <= score:  
   solution, score = candidate, value  
   print('>%d, score=%.3f' % (i, score))  
 return solution, scores

結(jié)合在一起，下面列出了用于回歸任務(wù)的測(cè)試集爬坡的完整示例。

# example of hill climbing the test set for the housing dataset  
from random import random  
from random import randint  
from pandas import read_csv  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import mean_absolute_error  
from sklearn.preprocessing import MinMaxScaler  
from matplotlib import pyplot   
# load or prepare the classification dataset 
def load_dataset():  
 url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv'  
 df = read_csv(url, header=None)  
 data = df.values  
 X, y = data[:, :-1], data[:, -1]  
 # normalize the target  
 scaler = MinMaxScaler()  
 yy = y.reshape((len(y), 1))  
 y = scaler.fit_transform(y)  
 return X, y  
# evaluate a set of predictions  
def evaluate_predictions(y_test, yhat):  
 return mean_absolute_error(y_test, yhat)   
# create a random set of predictions  
def random_predictions(n_examples): 
  return [random() for _ in range(n_examples)]  
# modify the current set of predictions  
def modify_predictions(current, n_changes=1):  
 # copy current solution  
 updated = current.copy()  
 for i in range(n_changes):  
  # select a point to change  
  ix = randint(0, len(updated)-1)  
  # flip the class label  
  updated[ix] = random()  
 return updated   
# run a hill climb for a set of predictions  
def hill_climb_testset(X_test, y_test, max_iterations):  
 scores = list()  
 # generate the initial solution  
 solution = random_predictions(X_test.shape[0])  
 # evaluate the initial solution  
 score = evaluate_predictions(y_test, solution)  
 print('>%.3f' % score)  
 # hill climb to a solution  
 for i in range(max_iterations):  
  # record scores  
  scores.append(score)  
  # stop once we achieve the best score  
  if score == 0.0:  
   break  
  # generate new candidate  
  candidate = modify_predictions(solution)  
  # evaluate candidate  
  value = evaluate_predictions(y_test, candidate)  
  # check if it is as good or better  
  if value <= score:  
   solution, score = candidate, value  
   print('>%d, score=%.3f' % (i, score))  
 return solution, scores  
# load the dataset  
X, y = load_dataset()  
print(X.shape, y.shape)  
# split dataset into train and test sets  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=1)  
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)  
# run hill climb  
yhat, scores = hill_climb_testset(X_test, y_test, 100000)  
# plot the scores vs iterations  
pyplot.plot(scores)  
pyplot.show()

運(yùn)行示例將在搜索過(guò)程中每次看到改進(jìn)時(shí)報(bào)告迭代次數(shù)和MAE。

在這種情況下，我們將使用更多的迭代，因?yàn)橐獌?yōu)化它是一個(gè)更復(fù)雜的問(wèn)題。選擇的用于創(chuàng)建候選解決方案的方法也使它變慢了，也不太可能實(shí)現(xiàn)完美的誤差。實(shí)際上，我們不會(huì)實(shí)現(xiàn)完美的錯(cuò)誤；相反，如果錯(cuò)誤達(dá)到的值低于最小值（例如1e-7）或?qū)δ繕?biāo)域有意義的值，則最好停止操作。這也留給讀者作為練習(xí)。例如：

# stop once we achieve a good enough  
if score <= 1e-7:  
 break

注意：由于算法或評(píng)估程序的隨機(jī)性，或者數(shù)值精度的差異，您的結(jié)果可能會(huì)有所不同?？紤]運(yùn)行該示例幾次并比較平均結(jié)果。

在這種情況下，我們可以看到在運(yùn)行結(jié)束時(shí)實(shí)現(xiàn)了良好的錯(cuò)誤。

>95991, score=0.001  
>96011, score=0.001  
>96295, score=0.001  
>96366, score=0.001  
>96585, score=0.001  
>97575, score=0.001  
>98828, score=0.001  
>98947, score=0.001  
>99712, score=0.001  
>99913, score=0.001

還創(chuàng)建了搜索進(jìn)度的折線圖，顯示收斂速度很快，并且在大多數(shù)迭代中保持不變。

責(zé)任編輯：龐桂玉來(lái)源： Python中文社區(qū) (ID:python-china)

機(jī)器學(xué)習(xí)人工智能爬坡測(cè)試

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<sub id="6zx1a"></sub>

<blockquote id="6zx1a"><p id="6zx1a"></p></blockquote>