自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓

鴻蒙開發(fā)者社區(qū)

WOT技術大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考華為認證廠商認證 IT技術 PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設置退出

聊一聊十個Pandas的小技巧

作者：Shaun Enslin 2022-10-19 15:20:58

數據庫其他數據庫

pandas是數據科學家必備的數據處理庫，我們今天總結了10個在實際應用中肯定會用到的技巧。

pandas是數據科學家必備的數據處理庫，我們今天總結了10個在實際應用中肯定會用到的技巧。

1、Select from table where f1=’a’ and f2=’b’

使用AND或OR選擇子集

dfb = df.loc[(df.Week == week) & (df.Day == day)]

OR的話是這樣

dfb = df.loc[(df.Week == week)|(df.Day == day)]

2、Select where in

從一個df中選擇一個包含在另外一個df的數據，例如下面的sql

select * from table1 where field1 in (select field1 from table2)

我們有一個名為“days”的df，它包含以下值。

如果有第二個df:

可以直接用下面的方式獲取

days = [0,1,2]
 df[df(days)]

3、Select where not in

就像IN一樣，我們肯定也要選擇NOT IN，這個可能是更加常用的一個需求，但是卻很少有文章提到，還是使用上面的數據：

days = [0,1,2]
 df[~df(days)]

使用~操作符就可以了

4、select sum(*) from table group by

分組統(tǒng)計和求和也是常見的操作，但是使用起來并不簡單

df(by=['RepID','Week','CallCycleDay']).sum()

如果想保存結果或稍后使用它們并引用這些字段，請?zhí)砑?as_index=False

df.groupby(by=['RepID','Week','CallCycleDay'], as_index=False).sum()

使用as_index= false，可以表的形式保存列。

5、從一個表更另外一個表的字段

我們從一個df中更改了一些值，現在想要更新另外一個df，這個操作就很有用。

dfb = dfa[dfa.field1='somevalue'].copy()
 dfb['field2'] = 'somevalue'
 dfa.update(dfb)

這里的更新是通過索引匹配的

6、使用apply/lambda創(chuàng)建新字段

我們創(chuàng)建了一個名為address的新字段，它是幾個字段進行拼接的。

dfa['address'] = dfa.apply(lambda row: row['StreetName'] + ', ' +

7、插入新行

插入新數據的最佳方法是使用concat。我們可以用有pd. datafframe .from_records一將新行轉換為df。

newRow = row.copy()
 newRow.CustomerID = str(newRow.CustomerID)+'-'+str(x)
 newRow.duplicate = True
 df = pd.concat([df,pd.DataFrame.from_records([newRow])])

8、更改列的類型

可以使用astype函數將其快速更改列的數據類型

df = pd.read_excel(customers_.xlsx')
 df['Longitude'] = df['Longitude'].astype(str)
 df['Latitude'] = df['Longitude'].astype(str)

9、刪除列

使用drop可以刪除列

def cleanColumns(df):
  for col in df.columns:
  
  
  return df

10、地圖上標注點

這個可能是最沒用的技巧，但是他很好玩。

這里我們有一些經緯度的數據。

現在我們把它根據經緯度在地圖上進行標注：

df_clustercentroids = pd.read_csv(centroidFile)
 lst_elements = sorted(list(dfm.cluster2.unique()))
 lst_colors = ['#%06X' % np.random.randint(0, 0xFFFFFF) for i in range(len(lst_elements))]
 dfm["color"] = dfm["cluster2"]
 dfm["color"] = dfm["color"].apply(lambda x:lst_colors[lst_elements.index(x)])
 
 m = folium.Map(locatinotallow=[dfm.iloc[0].Latitude,dfm.iloc[0].Longitude], zoom_start = 9)
 
 for index, row in dfm.iterrows():
  folium.CircleMarker(locatinotallow=[float(row['Latitude']), float(row['Longitude'])],radius=4,popup=str(row['RepID']) + '|' +str(row.CustomerID),color=row['color'],fill=True,fill_color=row['color']
 ).add_to(m)
 
 for index, row in df_clustercentroids.iterrows():
  folium.Marker(locatinotallow=[float(row['Latitude']), float(row['Longitude'])],popup=str(index) + '|#=' + str(dfm.loc[dfm.cluster2==index].groupby(['cluster2'])['CustomerID'].count().iloc[0]),icnotallow=folium.Icon(color='black',icon_color=lst_colors[index]),tooltip=str(index) + '|#=' + str(dfm.loc[dfm.cluster2==index].groupby(['cluster2'])['CustomerID'].count().iloc[0])).add_to(m)
  
 m

結果如下

責任編輯：華軒來源： DeepHub IMBA

pandas 數據處理庫技巧

51CTO技術棧公眾號

業(yè)務
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學堂精培企業(yè)培訓 CTO訓練營

<sub id="gzbl0"></sub>

<center id="gzbl0"></center>