自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專(zhuān)欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

12個(gè)Numpy與Pandas高效技巧，讓數(shù)據(jù)分析更容易

作者：佚名 2021-07-07 09:50:23

開(kāi)發(fā) 后端

本文分享給大家 12 種 Numpy 和 Pandas 函數(shù)，這些高效的函數(shù)會(huì)令數(shù)據(jù)分析更為容易、便捷。最后，讀者也可以在 GitHub 項(xiàng)目中找到本文所用代碼的 Jupyter Notebook。

本文分享給大家 12 種 Numpy 和 Pandas 函數(shù)，這些高效的函數(shù)會(huì)令數(shù)據(jù)分析更為容易、便捷。最后，讀者也可以在 GitHub 項(xiàng)目中找到本文所用代碼的 Jupyter Notebook。

項(xiàng)目地址：https://github.com/kunaldhariwal/12-Amazing-Pandas-NumPy-Functions

Numpy 的 6 種高效函數(shù)

首先從 Numpy 開(kāi)始。Numpy 是用于科學(xué)計(jì)算的 Python 語(yǔ)言擴(kuò)展包，通常包含強(qiáng)大的 N 維數(shù)組對(duì)象、復(fù)雜函數(shù)、用于整合 C/C++和 Fortran 代碼的工具以及有用的線性代數(shù)、傅里葉變換和隨機(jī)數(shù)生成能力。

除了上面這些明顯的用途，Numpy 還可以用作通用數(shù)據(jù)的高效多維容器（container），定義任何數(shù)據(jù)類(lèi)型。這使得 Numpy 能夠?qū)崿F(xiàn)自身與各種數(shù)據(jù)庫(kù)的無(wú)縫、快速集成。

接下來(lái)一一解析 6 種 Numpy 函數(shù)。

1、argpartition()

借助于 argpartition()，Numpy 可以找出 N 個(gè)最大數(shù)值的索引，也會(huì)將找到的這些索引輸出。然后我們根據(jù)需要對(duì)數(shù)值進(jìn)行排序。

x = np.array([12, 10, 12, 0, 6, 8, 9, 1, 16, 4, 6, 0])index_val = np.argpartition(x, -4)[-4:]  
index_val  
array([1, 8, 2, 0], dtype=int64)np.sort(x[index_val])  
array([10, 12, 12, 16])

2、allclose()

allclose() 用于匹配兩個(gè)數(shù)組，并得到布爾值表示的輸出。如果在一個(gè)公差范圍內(nèi)（within a tolerance）兩個(gè)數(shù)組不等同，則 allclose() 返回 False。該函數(shù)對(duì)于檢查兩個(gè)數(shù)組是否相似非常有用。

array1 = np.array([0.12,0.17,0.24,0.29])  
array2 = np.array([0.13,0.19,0.26,0.31])# with a tolerance of 0.1, it should return False:  
np.allclose(array1,array2,0.1)  
False# with a tolerance of 0.2, it should return True: 
np.allclose(array1,array2,0.2)  
True

3、clip()

Clip() 使得一個(gè)數(shù)組中的數(shù)值保持在一個(gè)區(qū)間內(nèi)。有時(shí)，我們需要保證數(shù)值在上下限范圍內(nèi)。為此，我們可以借助 Numpy 的 clip() 函數(shù)實(shí)現(xiàn)該目的。給定一個(gè)區(qū)間，則區(qū)間外的數(shù)值被剪切至區(qū)間上下限（interval edge）。

x = np.array([3, 17, 14, 23, 2, 2, 6, 8, 1, 2, 16, 0])np.clip(x,2,5)  
array([3, 5, 5, 5, 2, 2, 5, 5, 2, 2, 5, 2])

4、extract()

顧名思義，extract() 是在特定條件下從一個(gè)數(shù)組中提取特定元素。借助于 extract()，我們還可以使用 and 和 or 等條件。

# Random integers  
array = np.random.randint(20, size=12)  
array  
array([ 0,  1,  8, 19, 16, 18, 10, 11,  2, 13, 14,  3])#  Divide by 2 and check if remainder is 1  
cond = np.mod(array, 2)==1  
cond  
array([False,  True, False,  True, False, False, False,  True, False, True, False,  True])# Use extract to get the values  
np.extract(cond, array)  
array([ 1, 19, 11, 13,  3])# Apply condition on extract directly  
np.extract(((array < 3) | (array > 15)), array)  
array([ 0,  1, 19, 16, 18,  2])

5、where()

Where() 用于從一個(gè)數(shù)組中返回滿足特定條件的元素。比如，它會(huì)返回滿足特定條件的數(shù)值的索引位置。Where() 與 SQL 中使用的 where condition 類(lèi)似，如以下示例所示：

y = np.array([1,5,6,8,1,7,3,6,9])# Where y is greater than 5, returns index position  
np.where(y>5)  
array([2, 3, 5, 7, 8], dtype=int64),)# First will replace the values that match the condition,   
# second will replace the values that does not  
np.where(y>5, "Hit", "Miss")  
array([ Miss ,  Miss ,  Hit ,  Hit ,  Miss ,  Hit ,  Miss ,  Hit ,  Hit ],dtype= <U4 )

6、percentile()

Percentile() 用于計(jì)算特定軸方向上數(shù)組元素的第 n 個(gè)百分位數(shù)。

a = np.array([1,5,6,8,1,7,3,6,9])print("50th Percentile of a, axis = 0 : ",    
      np.percentile(a, 50, axis =0))  
50th Percentile of a, axis = 0 :  6.0b = np.array([[10, 7, 4], [3, 2, 1]])print("30th Percentile of b, axis = 0 : ",    
      np.percentile(b, 30, axis =0))  
30th Percentile of b, axis = 0 :  [5.1 3.5 1.9]

這就是 Numpy 擴(kuò)展包的 6 種高效函數(shù)，相信會(huì)為你帶來(lái)幫助。接下來(lái)看一看 Pandas 數(shù)據(jù)分析庫(kù)的 6 種函數(shù)。

Pandas的 6 種高效函數(shù)

Pandas 也是一個(gè) Python 包，它提供了快速、靈活以及具有顯著表達(dá)能力的數(shù)據(jù)結(jié)構(gòu)，旨在使處理結(jié)構(gòu)化 (表格化、多維、異構(gòu)) 和時(shí)間序列數(shù)據(jù)變得既簡(jiǎn)單又直觀。

Pandas 適用于以下各類(lèi)數(shù)據(jù):

具有異構(gòu)類(lèi)型列的表格數(shù)據(jù)，如 SQL 表或 Excel 表；
有序和無(wú)序 (不一定是固定頻率) 的時(shí)間序列數(shù)據(jù)；
帶有行/列標(biāo)簽的任意矩陣數(shù)據(jù)（同構(gòu)類(lèi)型或者是異構(gòu)類(lèi)型）；
其他任意形式的統(tǒng)計(jì)數(shù)據(jù)集。事實(shí)上，數(shù)據(jù)根本不需要標(biāo)記就可以放入 Pandas 結(jié)構(gòu)中。

1、read_csv(nrows=n)

大多數(shù)人都會(huì)犯的一個(gè)錯(cuò)誤是，在不需要.csv 文件的情況下仍會(huì)完整地讀取它。如果一個(gè)未知的.csv 文件有 10GB，那么讀取整個(gè).csv 文件將會(huì)非常不明智，不僅要占用大量?jī)?nèi)存，還會(huì)花很多時(shí)間。我們需要做的只是從.csv 文件中導(dǎo)入幾行，之后根據(jù)需要繼續(xù)導(dǎo)入。

import io  
import requests# I am using this online data set just to make things easier for you guys  
url = "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/AirPassengers.csv"  
s = requests.get(url).content# read only first 10 rows  
df = pd.read_csv(io.StringIO(s.decode( utf-8 )),nrows=10 , index_col=0)

2、map()

map( ) 函數(shù)根據(jù)相應(yīng)的輸入來(lái)映射 Series 的值。用于將一個(gè) Series 中的每個(gè)值替換為另一個(gè)值，該值可能來(lái)自一個(gè)函數(shù)、也可能來(lái)自于一個(gè) dict 或 Series。

# create a dataframe  
dframe = pd.DataFrame(np.random.randn(4, 3), columns=list( bde ), index=[ India ,  USA ,  China ,  Russia ])#compute a formatted string from each floating point value in frame 
changefn = lambda x:  %.2f  % x# Make changes element-wise 
dframe[ d ].map(changefn)

3、apply()

apply() 允許用戶傳遞函數(shù)，并將其應(yīng)用于 Pandas 序列中的每個(gè)值。

# max minus mix lambda fn  
fn = lambda x: x.max() - x.min()# Apply this on dframe that we ve just created above  
dframe.apply(fn)

4、isin()

lsin () 用于過(guò)濾數(shù)據(jù)幀。Isin () 有助于選擇特定列中具有特定（或多個(gè)）值的行。

# Using the dataframe we created for read_csv  
filter1 = df["value"].isin([112])   
filter2 = df["time"].isin([1949.000000])df [filter1 & filter2]

5、copy()

Copy () 函數(shù)用于復(fù)制 Pandas 對(duì)象。當(dāng)一個(gè)數(shù)據(jù)幀分配給另一個(gè)數(shù)據(jù)幀時(shí)，如果對(duì)其中一個(gè)數(shù)據(jù)幀進(jìn)行更改，另一個(gè)數(shù)據(jù)幀的值也將發(fā)生更改。為了防止這類(lèi)問(wèn)題，可以使用 copy () 函數(shù)。

# creating sample series   
data = pd.Series([ India ,  Pakistan ,  China ,  Mongolia ])# Assigning issue that we face  
datadata1= data  
# Change a value  
data1[0]= USA   
# Also changes value in old dataframe  
data# To prevent that, we use  
# creating copy of series   
new = data.copy()# assigning new values   
new[1]= Changed value # printing data   
print(new)   
print(data)

6、select_dtypes()

select_dtypes() 的作用是，基于 dtypes 的列返回?cái)?shù)據(jù)幀列的一個(gè)子集。這個(gè)函數(shù)的參數(shù)可設(shè)置為包含所有擁有特定數(shù)據(jù)類(lèi)型的列，亦或者設(shè)置為排除具有特定數(shù)據(jù)類(lèi)型的列。

# We ll use the same dataframe that we used for read_csv  
framex =  df.select_dtypes(include="float64")# Returns only time column

最后，pivot_table( ) 也是 Pandas 中一個(gè)非常有用的函數(shù)。如果對(duì) pivot_table( ) 在 excel 中的使用有所了解，那么就非常容易上手了。

# Create a sample dataframe  
school = pd.DataFrame({ A : [ Jay ,  Usher ,  Nicky ,  Romero ,  Will ],   
       B : [ Masters ,  Graduate ,  Graduate ,  Masters ,  Graduate ],   
       C : [26, 22, 20, 23, 24]})# Lets create a pivot table to segregate students based on age and course  
table = pd.pivot_table(school, values = A , index =[ B ,  C ],   
                         columns =[ B ], aggfunc = np.sum, fill_value="Not Available")   
table

責(zé)任編輯：龐桂玉來(lái)源： Linux公社

Numpy Pandas Python

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<sub id="jcjpv"></sub>

<sub id="jcjpv"><input id="jcjpv"></input></sub>