Top 5 Python 數(shù)據(jù)可視化技術(shù)
掌握這五種高級(jí)可視化圖表將使數(shù)據(jù)可視化變得容易。這些庫(kù)互為補(bǔ)充,以最大化數(shù)據(jù)表達(dá)。
和弦圖 (Chord Diagram)
和弦圖創(chuàng)造性地展示了數(shù)據(jù)點(diǎn)之間復(fù)雜的關(guān)系。節(jié)點(diǎn)圍繞一個(gè)圓圈排列,通過(guò)弧線連接?;【€的長(zhǎng)度反映了連接值,其粗細(xì)表示關(guān)系的重要性。顏色對(duì)數(shù)據(jù)進(jìn)行分類,使比較變得容易。廣泛應(yīng)用于各個(gè)領(lǐng)域,特別是在可視化遺傳數(shù)據(jù)方面。
以下是一個(gè)使用 Holoviews & Bokeh 創(chuàng)建顯示五個(gè)國(guó)家之間貿(mào)易關(guān)系的和弦圖的示例。
import holoviews as hv
from holoviews import opts
import pandas as pd
import numpy as np
hv.extension('bokeh')
# Sample matrix representing the export volumes between 5 countries
export_data = np.array([[0, 50, 30, 20, 10],
[10, 0, 40, 30, 20],
[20, 10, 0, 35, 25],
[30, 20, 10, 0, 40],
[25, 15, 30, 20, 0]])
labels = ['USA', 'China', 'Germany', 'Japan', 'India']
# Creating a pandas DataFrame
df = pd.DataFrame(export_data, index=labels, columns=labels)
df = df.stack().reset_index()
df.columns = ['source', 'target', 'value']
# Creating a Chord object
chord = hv.Chord(df)
# Styling the Chord diagram
chord.opts(
opts.Chord(
cmap='Category20', edge_cmap='Category20',
labels='source', label_text_font_size='10pt',
edge_color='source', node_color='index',
width=700, height=700
)
).select(value=(5, None))
# Display the plot
chord
參考鏈接:
- https://holoviews.org/reference/elements/matplotlib/Chord.html
- https://github.com/moshi4/pyCirclize
Sunburst Chart
Sunburst Chart 通過(guò)清晰展示層次數(shù)據(jù),超越了傳統(tǒng)的餅圖和環(huán)圖。它使用同心圓,每個(gè)圓代表層次中的一級(jí)。中心是根,扇形表示節(jié)點(diǎn)。每個(gè)扇形的大小反映了它的值,直觀地理解數(shù)據(jù)的重要性。在可視化文件系統(tǒng)層次結(jié)構(gòu)、用戶導(dǎo)航路徑、市場(chǎng)細(xì)分和遺傳數(shù)據(jù)方面非常有用。以下是一個(gè)使用 Plotly 庫(kù)創(chuàng)建Sunburst Chart 的示例。
import plotly.express as px
import numpy as np
df = px.data.gapminder().query("year == 2007")
fig = px.sunburst(df, path=['continent', 'country'],
values='pop',
color='lifeExp',
hover_data=['iso_alpha'],
color_continuous_scale='RdBu',
color_continuous_midpoint=np.average(df['lifeExp'], weights=df['pop']))
fig.show()
參考鏈接:https://plotly.com/python/sunburst-charts/
六邊形分箱圖 (Hexbin Plot)
六邊形分箱圖,或稱六邊形分箱,對(duì)于可視化二維數(shù)據(jù)分布非常有效,特別是當(dāng)數(shù)據(jù)點(diǎn)密集時(shí)。它將數(shù)據(jù)空間劃分為六邊形箱,顏色表示每個(gè)箱中的點(diǎn)數(shù),清晰地表示數(shù)據(jù)分布。
以下是一個(gè)使用 Python 和 Matplotlib 創(chuàng)建六邊形分箱圖的示例,展示了空氣質(zhì)量指數(shù) (AQI) 與醫(yī)院訪問(wèn)之間的相關(guān)性。
import numpy as np
import matplotlib.pyplot as plt
from mplhexbin import HexBin
# Simulated data
np.random.seed(0) # Ensure reproducibility
n_points = 10000
x = np.random.rand(n_points) * 100 # Air Quality Index (AQI) range from 0 to 100
y = 5 * np.sin(x * np.pi / 50) + np.random.randn(n_points) * 15 # Simulated hospital visits, related to AQI but with noise
# Create a new figure
fig, ax = plt.subplots(figsize=(10, 8))
# Use HexBin to create a hexagonal bin plot
hb = HexBin(ax, gridsize=20, cmap='viridis', extent=[0, 100, -30, 50]) # Set grid size, colormap, and range
hb.hexbin(x, y, mincnt=1) # Draw the hexagonal bin plot, mincnt sets the minimum count threshold
# Add title and axis labels
ax.set_title('Relationship between Air Quality Index (AQI) and Hospital Visits')
ax.set_xlabel('Air Quality Index (AQI)')
ax.set_ylabel('Hospital Visits')
# Show the figure
plt.colorbar(hb.cmap, ax=ax, label='Number of Data Points') # Add color bar and set label
plt.show()
參考鏈接:https://matplotlib.org/stable/gallery/statistics/hexbin_demo.html
桑基圖 (Sankey Diagram)
?;鶊D可視化數(shù)據(jù)流,非常適合能源、材料和財(cái)務(wù)數(shù)據(jù)。以 Matthew Henry Phineas Riall Sankey 命名,它顯示了系統(tǒng)各階段或部分之間的流量。節(jié)點(diǎn)寬度與流量數(shù)量成比例,易于理解數(shù)據(jù)規(guī)模和方向。
以下是一個(gè)使用 Python 創(chuàng)建?;鶊D的示例,展示了從生產(chǎn)源頭到小城市消費(fèi)者的能量流。
import plotly.graph_objects as go
labels = ["Coal", "Solar", "Wind", "Nuclear", "Residential", "Industrial", "Commercial"]
source = [0, 1, 2, 3, 0, 1, 2, 3]
target = [4, 4, 4, 4, 5, 5, 5, 5]
value = [25, 10, 40, 20, 30, 15, 25, 35]
# Create the Sankey diagram object
fig = go.Figure(data=[go.Sankey(
node=dict(
pad=15,
thickness=20,
line=dict(color="black", width=0.5),
label=labels
),
link=dict(
source=source,
target=target,
value=value
))])
fig.update_layout(title_text="Energy Flow in Model City", font_size=12)
fig.show()
參考鏈接:https://plotly.com/python/sankey-diagram/
流圖 (Stream Graph, 主題河流)
流圖類似于河流,描繪了隨時(shí)間的變化。顏色區(qū)分類別,而“河流”的寬度表示每個(gè)類別的值。它直觀地展示趨勢(shì)和關(guān)系,易于理解數(shù)據(jù)動(dòng)態(tài)。
以下是一個(gè)使用 Altair 庫(kù)創(chuàng)建流圖的示例。
import altair as alt
from vega_datasets import data
source = data.unemployment_across_industries.url
alt.Chart(source).mark_area().encode(
alt.X('yearmonth(date):T',
axis=alt.Axis(format='%Y', domain=False, tickSize=0)
),
alt.Y('sum(count):Q', stack='center', axis=None),
alt.Color('series:N',
scale=alt.Scale(scheme='category20b')
)
).interactive()
參考鏈接:https://altair-viz.github.io/gallery/streamgraph.html