五種常用格式的數(shù)據(jù)輸出，手把手教你用Pandas實現(xiàn)

作者：李慶輝 2022-04-24 10:33:56

任何原始格式的數(shù)據(jù)載入DataFrame后，都可以使用類似DataFrame.to_csv()的方法輸出到相應(yīng)格式的文件或者目標系統(tǒng)里。本文將介紹一些常用的數(shù)據(jù)輸出目標格式。

01 CSV

DataFrame.to_csv方法可以將DataFrame導出為CSV格式的文件，需要傳入一個CSV文件名。

df.to_csv('done.csv')
df.to_csv('data/done.csv') # 可以指定文件目錄路徑
df.to_csv('done.csv', index=False) # 不要索引

另外還可以使用sep參數(shù)指定分隔符，columns傳入一個序列指定列名，編碼用encoding傳入。如果不需要表頭，可以將header設(shè)為False。如果文件較大，可以使用compression進行壓縮：

# 創(chuàng)建一個包含out.csv的壓縮文件out.zip
compression_opts = dict(method='zip',
                        archive_name='out.csv')  
df.to_csv('out.zip', index=False,
          compression=compression_opts)

02 Excel

將DataFrame導出為Excel格式也很方便，使用DataFrame.to_excel方法即可。要想把DataFrame對象導出，首先要指定一個文件名，這個文件名必須以.xlsx或.xls為擴展名，生成的文件標簽名也可以用sheet_name指定。

如果要導出多個DataFrame到一個Excel，可以借助ExcelWriter對象來實現(xiàn)。

# 導出，可以指定文件路徑
df.to_excel('path_to_file.xlsx')
# 指定sheet名，不要索引
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1', index=False)
# 指定索引名，不合并單元格
df.to_excel('path_to_file.xlsx', index_label='label', merge_cells=False)

多個數(shù)據(jù)的導出如下：

# 將多個df分不同sheet導入一個Excel文件中
with pd.ExcelWriter('path_to_file.xlsx') as writer:
    df1.to_excel(writer, sheet_name='Sheet1')
    df2.to_excel(writer, sheet_name='Sheet2')

使用指定的Excel導出引擎如下：

# 指定操作引擎
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
# 在'engine'參數(shù)中設(shè)置ExcelWriter使用的引擎
writer = pd.ExcelWriter('path_to_file.xlsx', engine='xlsxwriter')
df.to_excel(writer)
writer.save()

# 設(shè)置系統(tǒng)引擎
from pandas import options  # noqa: E402
options.io.excel.xlsx.writer = 'xlsxwriter'
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1')

03 HTML

DataFrame.to_html會將DataFrame中的數(shù)據(jù)組裝在HTML代碼的table標簽中，輸入一個字符串，這部分HTML代碼可以放在網(wǎng)頁中進行展示，也可以作為郵件正文。

print(df.to_html())
print(df.to_html(columns=[0])) # 輸出指定列
print(df.to_html(bold_rows=False)) # 表頭不加粗
# 表格指定樣式，支持多個
print(df.to_html(classes=['class1', 'class2']))

04 數(shù)據(jù)庫（SQL）

將DataFrame中的數(shù)據(jù)保存到數(shù)據(jù)庫的對應(yīng)表中：

# 需要安裝SQLAlchemy庫
from sqlalchemy import create_engine
# 創(chuàng)建數(shù)據(jù)庫對象，SQLite內(nèi)存模式
engine = create_engine('sqlite:///:memory:')
# 取出表名為data的表數(shù)據(jù)
with engine.connect() as conn, conn.begin():
    data = pd.read_sql_table('data', conn)

# data
# 將數(shù)據(jù)寫入
data.to_sql('data', engine)
# 大量寫入
data.to_sql('data_chunked', engine, chunksize=1000)
# 使用SQL查詢
pd.read_sql_query('SELECT * FROM data', engine)

05 Markdown

Markdown是一種常用的技術(shù)文檔編寫語言，Pandas支持輸出Markdown格式的字符串，如下：

print(cdf.to_markdown())

'''
|    |   x |   y |   z |
|:---|----:|----:|----:|
| a  |   1 |   2 |   3 |
| b  |   4 |   5 |   6 |
| c  |   7 |   8 |   9 |
'''

小結(jié)

本文介紹了如何將DataFrame對象數(shù)據(jù)進行輸出，數(shù)據(jù)經(jīng)輸出、持久化后會成為固定的數(shù)據(jù)資產(chǎn)，供我們進行歸檔和分析。

關(guān)于作者：李慶輝，數(shù)據(jù)產(chǎn)品專家，某電商公司數(shù)據(jù)產(chǎn)品團隊負責人，擅長通過數(shù)據(jù)治理、數(shù)據(jù)分析、數(shù)據(jù)化運營提升公司的數(shù)據(jù)應(yīng)用水平。精通Python數(shù)據(jù)科學及Python Web開發(fā)，曾獨立開發(fā)公司的自動化數(shù)據(jù)分析平臺，參與教育部“1+X”數(shù)據(jù)分析（Python）職業(yè)技能等級標準評審。中國人工智能學會會員，企業(yè)數(shù)字化、數(shù)據(jù)產(chǎn)品和數(shù)據(jù)分析講師，在個人網(wǎng)站“蓋若”上編寫的技術(shù)和產(chǎn)品教程廣受歡迎。

本書摘編自《深入淺出Pandas：利用Python進行數(shù)據(jù)處理與分析》，機械工業(yè)出版社華章公司2021年出版。轉(zhuǎn)載請與我們?nèi)〉檬跈?quán)。

責任編輯：龐桂玉來源：大數(shù)據(jù)DT

大數(shù)據(jù)數(shù)據(jù)分析

自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

五種常用格式的數(shù)據(jù)輸出，手把手教你用Pandas實現(xiàn)

五種常用格式的數(shù)據(jù)輸出，手把手教你用Pandas實現(xiàn)