自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<style id="u1was"></style>

<sup id="u1was"><rt id="u1was"></rt></sup>

^{<thead id="u1was"></thead>}

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

作者：讀芯術(shù) 2020-08-12 14:56:48

開發(fā) 前端

在開始一項數(shù)據(jù)科學(xué)項目時，我們通常需要進行設(shè)置或配置，以確保所需的依賴關(guān)系，保持輸出穩(wěn)定，準(zhǔn)備通用函數(shù)。本文將介紹JuypterNotebook中最有幫助的一些項目設(shè)置。

本文轉(zhuǎn)載自公眾號“讀芯術(shù)”(ID：AI_Discovery)。

在開始一項數(shù)據(jù)科學(xué)項目時，我們通常需要進行設(shè)置或配置，以確保所需的依賴關(guān)系，保持輸出穩(wěn)定，準(zhǔn)備通用函數(shù)。

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

項目設(shè)置的一個案例(來自Handson-ML2)

本文將介紹JuypterNotebook中最有幫助的一些項目設(shè)置。

1. 確保Python版本

檢查JupyterNotebook中的Python解釋器版本：

import sys 
sys.version'3.7.6 (default, Jan 8 2020, 13:42:34) \n[Clang 4.0.1 (tags/RELEASE_401/final)]'

為確保項目由Python解釋器的最低及以上要求版本運行，可在項目設(shè)置中添加以下代碼：

# Python ≥3.7 is required 
import sys 
assert sys.version_info >= (3, 7)

Python需要為3.7及以上版本，否則會拋出AssertionError。

2. 確保程序包版本

檢查安裝的程序包版本，如TensorFlow。

import tensorflow as tf 
tf.__version__'2.0.0'

確保項目是由TensorFlow2.0及以上版本運行的，否則會拋出AssertionError。

# TensorFlow ≥2.0 is required 
import tensorflow as tf 
assert tf.__version__ >= "2.0"

3. 避免繪制模糊圖像

JuypterNotebook中的默認(rèn)繪圖看起來有些模糊。例如，一張查找缺失值的簡單熱圖。

(https://towardsdatascience.com/using-pandas-pipe-function-to-improve-code-readability-96d66abfaf8)

import seaborn as sns 
import matplotlib.pyplot as plt 
%matplotlib inline# Default figure format png 
sns.heatmap(df.isnull(), 
            yticklabels=False, 
            cbar=False, 
            cmap='viridis')

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

默認(rèn)圖像看起來很模糊

由上圖可以看出，文本很模糊，Cabin欄中的缺失值過于擁擠，Embarked欄中的缺失值無法識別。

要解決這個問題，可在%matplotlib inline之后使用%config InlineBackend.figure_format='retina'或 %configInlineBackend.figure_format = 'svg'，即：

%matplotlib inline 
%config InlineBackend.figure_format = 'retina'         # or 'svg'sns.heatmap(df.isnull(), 
            yticklabels=False, 
            cbar=False, 
            cmap='viridis')

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

圖片格式設(shè)置為retina或svg

與先前的圖片比較，上圖更加清晰，Embarked欄中的缺失值也能成功識別。

4. 在不同運行中保持輸出穩(wěn)定

數(shù)據(jù)科學(xué)項目中很多地方都在使用隨機數(shù)字。例如：

來自Scikit-Learn的 train_test_split()
用于初始化權(quán)重的np.random.rand()

若未重置隨機種子，則每次調(diào)用都會出現(xiàn)不同的數(shù)字：

>>> np.random.rand(4) 
array([0.83209492, 0.10917076, 0.15798519, 0.99356723]) 
>>> np.random.rand(4) 
array([0.46183001, 0.7523687 , 0.96599624, 0.32349079])

np.random.seed(0)使隨機數(shù)字可預(yù)測：

>>> np.random.seed(0) 
>>> np.random.rand(4) 
array([0.5488135 , 0.71518937, 0.60276338, 0.54488318]) 
>>> np.random.seed(0) 
>>> np.random.rand(4) 
array([0.5488135 , 0.71518937, 0.60276338, 0.54488318])

如果(每次)都重置隨機種子，那么每次都會出現(xiàn)相同的數(shù)據(jù)組。因此，項目能在不同運行中保持輸出穩(wěn)定。

5. 多單元輸出

默認(rèn)情況下，JupyterNotebook不能在同一單元中輸出多種結(jié)果。要輸出多種結(jié)果，可使用IPython重新配置shell。

from IPython.core.interactiveshell import InteractiveShell 
InteractiveShell.ast_node_interactivity = "all"

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

6. 將圖片保存到文件

Matplotlib能通過savefig()方法保存圖片，但如果給定路徑不存在則會引發(fā)錯誤。

plt.savefig('./figures/my_plot.png')FileNotFoundError: [Errno 2] Nosuch file or directory: './figures/my_plot.png'

最好的做法是將所有圖片都放到一個地方，如工作區(qū)的figures文件夾?？墒褂肙S GUI(操作系統(tǒng)界面)或是在JupyterNotebook中運行l(wèi)ogic指令，來手動創(chuàng)建一個figures文件夾，但是最好創(chuàng)建一個小函數(shù)來實現(xiàn)該操作。

當(dāng)需要一些自定義圖形設(shè)置或附加子文件夾來分組圖形時，這種方法尤其適用。以下是將圖片保存到文件的函數(shù)：

import os 
%matplotlib inline 
import matplotlib.pyplot as plt# Where to save the figures 
PROJECT_ROOT_DIR = "." 
SUB_FOLDER = "sub_folder"    #a sub-folder 
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", SUB_FOLDER)defsave_fig(name, images_path=IMAGES_PATH, tight_layout=True,extension="png", resolution=300): 
    if not os.path.isdir(images_path): 
        os.makedirs(images_path) 
    path = os.path.join(images_path, name+ "." + extension) 
    print("Saving figure:",name) 
    if tight_layout: 
        plt.tight_layout() 
    plt.savefig(path, format=extension,dpi=resolution)

現(xiàn)在調(diào)用save_fig('figure_name')，會在工作區(qū)中創(chuàng)建一個images/sub_folder目錄，圖片以“figure_name.png”名稱被保存到目錄中。此外，還提供了三個最常用的設(shè)置：

tight_layout 能自動調(diào)整子圖填充
extension 能以多種格式保存圖片
resolution 可設(shè)置圖片分辨率

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

穩(wěn)定輸出加速開發(fā)：數(shù)據(jù)科學(xué)項目開始時應(yīng)該包括的7個設(shè)置

7. 下載數(shù)據(jù)(并解壓)

處理網(wǎng)絡(luò)數(shù)據(jù)對于數(shù)據(jù)科學(xué)工作者是常事。可以使用瀏覽器下載數(shù)據(jù)，并運行指令來解壓文件，但最好的是創(chuàng)建一個小函數(shù)來執(zhí)行該操作。當(dāng)數(shù)據(jù)需要定期更改時，這一點尤其重要。

編寫一個小腳本，在獲取最新數(shù)據(jù)時運行(也可以設(shè)置一個定期自動執(zhí)行的計劃工作)即可。如果需要在多臺機器上安裝數(shù)據(jù)集，自動化抓取數(shù)據(jù)流程也十分有用。

以下是下載并解壓數(shù)據(jù)的函數(shù)：

import os 
import tarfile 
import zipfile 
import urllib 
  
# Where to save the data 
PROJECT_ROOT_DIR = "." 
SUB_FOLDER = "group_name" 
LOCAL_PATH = os.path.join(PROJECT_ROOT_DIR, "datasets", SUB_FOLDER)defdownload(file_url, local_path = LOCAL_PATH): 
    if not os.path.isdir(local_path): 
        os.makedirs(local_path) 
        
    # Download file 
    print(">>>downloading") 
    filename = os.path.basename(file_url) 
    file_local_path =os.path.join(local_path, filename) 
    urllib.request.urlretrieve(file_url,file_local_path) 
    
    # untar/unzip file 
    if filename.endswith("tgz")or filename.endswith("tar.gz"): 
        print(">>>unpacking file:", filename) 
        tar =tarfile.open(file_local_path, "r:gz") 
        tar.extractall(path = local_path) 
        tar.close() 
    eliffilename.endswith("tar"): 
        print(">>> unpackingfile:", filename) 
        tar =tarfile.open(file_local_path, "r:") 
        tar.extractall(path = local_path) 
        tar.close() 
    eliffilename.endwith("zip"): 
        print(">>>unpacking file:", filename) 
        zip_file = zipfile.ZipFile(file_local_path) 
        zip_file.extractall(path =local_path) 
        zip_file.close() 
    print("Done")

現(xiàn)在調(diào)用download("http://a_valid_url/housing.tgz")，會在工作區(qū)創(chuàng)建一個datasets/group_name目錄，下載housing.tgz，并從該目錄中提取出housing.csv ，這個小函數(shù)也能用于CSV和文本文件。

[[337548]]

圖源：unsplash

請查看筆者Github庫中的源代碼：

https://github.com/BindiChen/machine-learning/blob/master/data-analysis/004-7-setups-for-a-data-science-project/7-setups.ipynb

責(zé)任編輯：趙寧寧來源：今日頭條

數(shù)據(jù)科學(xué)Python 函數(shù)

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營