SQL項(xiàng)目實(shí)戰(zhàn):銀行客戶分析
在本文中,將與大家分享一個(gè)SQL項(xiàng)目,即根據(jù)從數(shù)據(jù)集收集到的信息分析銀行客戶流失的可能性。這些洞察來自個(gè)人信息,如年齡、性別、收入和人口統(tǒng)計(jì)信息、銀行卡類型、產(chǎn)品、客戶信用評分以及客戶在銀行的服務(wù)時(shí)間長短等。對于銀行而言,了解如何留住客戶比尋找其他客戶更有利。
客戶流失是指客戶或顧客的流失。公司通常將其作為關(guān)鍵業(yè)務(wù)指標(biāo)之一,因?yàn)榛謴?fù)的長期客戶對公司的價(jià)值遠(yuǎn)遠(yuǎn)高于新招募的客戶。
客戶流失有兩種類型:自愿流失和非自愿流失。自愿流失是由于客戶決定轉(zhuǎn)向其他公司或服務(wù)提供商,而非自愿流失則是由于客戶搬遷到長期護(hù)理機(jī)構(gòu)、死亡或搬遷到較遠(yuǎn)的地方等情況造成的。
在本項(xiàng)目中,本文將集中討論自愿流失,因?yàn)樗赡苁怯捎诠九c客戶關(guān)系中公司可以控制的因素造成的,例如如何處理賬單互動(dòng)或如何提供售后幫助。
【來源】:
https://en.wikipedia.org/wiki/Customer_attrition
數(shù)據(jù)集解釋
【網(wǎng)址】:
https://www.kaggle.com/datasets/radheshyamkollipara/bank-customer-churn
圖片
本文使用的是customer_churn_records
表,該表包含多列,customerid
是表的主鍵。
- RowNumber:對應(yīng)記錄(行)編號
- CustomerId:客戶的ID編號
- Surname:客戶的姓氏
- CreditScore:客戶信用行為預(yù)測值
- Geography:客戶所在地
- Gender:客戶的性別信息
- Age:客戶的年齡信息
- Tenure:客戶在銀行的使用年限
- Balance:客戶賬戶中的余額信息
- NumOfProducts:客戶購買的產(chǎn)品數(shù)量
- HasCrCard:客戶是否擁有信用卡
- IsActiveMember:客戶是否處于活躍狀態(tài)
- EstimatedSalary(估計(jì)工資):客戶的估計(jì)工資金額
- Exited:客戶是否離開銀行
- Complain:客戶是否有投訴
- Satisfaction Score:客戶對銀行的滿意度評分
- Card Type:客戶持有的銀行卡類型
- Points Earned:客戶使用信用卡獲得的積分
# 顯示表中的列 = customer_churn_records
q='''
SELECT * FROM customer_churn_records
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
查詢客戶流失率
導(dǎo)入軟件包
import psycopg2 # PostgreSQL數(shù)據(jù)庫適配器
import pandas as pd # 用于分析數(shù)據(jù)
from sqlalchemy import create_engine # 促進(jìn)Python程序與數(shù)據(jù)庫之間的通信
首先,本文根據(jù)已退出的列計(jì)算有多少客戶流失。
# 統(tǒng)計(jì)是否流失/退出的客戶總數(shù)
q='''
WITH temp_churn AS(
SELECT exited,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(exited) as Total
FROM temp_churn
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
在10000名客戶中,有近20%從銀行退出或流失。盡管這個(gè)數(shù)字并不算很大,但如果現(xiàn)在還不能解決這個(gè)問題,它可能會(huì)增長得更多。
現(xiàn)在,本文將從活躍客戶、性別、人口統(tǒng)計(jì)、年齡、臨時(shí)工齡、信用分?jǐn)?shù)、產(chǎn)品數(shù)量、滿意度分?jǐn)?shù)、投訴、是否有信用卡、卡類型、已獲積分、預(yù)估薪資和余額等多個(gè)方面來檢查客戶流失狀況的類型。
# 統(tǒng)計(jì)有多少活躍客戶流失
q='''
WITH temp_isactivemember AS(
SELECT exited,
CASE
WHEN isactivemember = 1 THEN 'Active'
ELSE 'Not Active'
END AS isactivemember
from customer_churn_records
)
SELECT isactivemember,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM temp_isactivemember
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)性別計(jì)算是否流失/退出的客戶總數(shù)
q='''
SELECT gender,
COUNT(gender) as Total,
COUNT(case when exited = 1 then 1 end) as Churn,
COUNT(case when exited = 0 then 1 end) as Not_churn
FROM customer_churn_records
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)人口統(tǒng)計(jì)數(shù)據(jù)計(jì)算流失客戶的數(shù)量
q='''
SELECT geography,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM customer_churn_records
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
#根據(jù)年齡組計(jì)算流失客戶的數(shù)量
q='''
SELECT
CASE
WHEN age <= 20 THEN 'Group <= 20'
WHEN age >= 21 AND age <= 40 THEN 'Group 21-40'
WHEN age >= 41 AND age <= 60 THEN 'Group 41-60'
ELSE 'Group > 60'
END AS age_category,
COUNT(CASE WHEN exited = 1 then 1 end) as Churn,
COUNT(CASE WHEN exited = 0 then 1 end) as Not_Churn
FROM customer_churn_records
GROUP BY 1
ORDER BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)他們成為客戶的時(shí)間計(jì)算是否流失/退出的客戶總數(shù)
q='''
WITH temp_tenure AS(
SELECT tenure,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(tenure) as Average_tenure
FROM temp_tenure
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)信用評分計(jì)算有多少客戶流失
#(平均值、最高值和最低值)
q='''
WITH temp_creditscore AS(
SELECT creditscore,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(creditscore) as avg_creditscore,
MAX(creditscore) as max_creditscore,
MIN(creditscore) as min_creditscore
FROM temp_creditscore
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶使用銀行產(chǎn)品的數(shù)量,計(jì)算有多少客戶流失
q='''
WITH temp_bankprod AS(
SELECT numofproducts,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(numofproducts) as avg_numofproducts
FROM temp_bankprod
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶 對銀行的滿意度得分的平均得分,計(jì)算有多少客戶流失
q='''
WITH temp_satisfaction AS(
SELECT satisfaction_score,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(satisfaction_score) as satisfaction_level
FROM temp_satisfaction
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的投訴量,統(tǒng)計(jì)有多少客戶流失
q='''
WITH temp_complain AS(
SELECT complain,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(complain) as complain
FROM temp_complain
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶是否擁有信用卡計(jì)算有多少客戶流失
q='''
WITH temp_hascrcard AS(
SELECT hascrcard,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT (CASE WHEN hascrcard = 1 THEN 1 END) AS has_creditcard,
COUNT (CASE WHEN hascrcard = 0 THEN 1 END) AS no_creditcard
FROM temp_hascrcard
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df
圖片
# 根據(jù)卡的類型計(jì)算流失客戶的數(shù)量
q='''
WITH temp_card AS(
SELECT card_type,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
COUNT(CASE WHEN card_type = 'PLATINUM' THEN 1 END) as PLATINUM,
COUNT(CASE WHEN card_type = 'DIAMOND' THEN 1 END) as DIAMOND,
COUNT(CASE WHEN card_type = 'GOLD' THEN 1 END) as GOLD,
COUNT(CASE WHEN card_type = 'SILVER' THEN 1 END) as SILVER
FROM temp_card
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶獲得的積分計(jì)算有多少客戶流失
q='''
WITH temp_point AS(
SELECT point_earned,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(point_earned) as avg_point,
MAX(point_earned) as max_point,
MIN(point_earned) as min_point
FROM temp_point
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的預(yù)估工資計(jì)算有多少客戶流失
q='''
WITH temp_salary AS(
SELECT estimatedsalary,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(estimatedsalary) as avg_salary,
MAX(estimatedsalary) as max_salary,
MIN(estimatedsalary) as min_salary
FROM temp_salary
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
# 根據(jù)客戶的銀行存款余額計(jì)算有多少客戶流失
#(平均、最高、最低)
q='''
WITH temp_balance AS(
SELECT balance,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)
SELECT STATUS,
AVG(balance) as avg_balance,
MAX(balance) as max_balance,
MIN(balance) as min_balance
FROM temp_balance
GROUP BY 1
'''
df = pd.read_sql(q,engine_postgresql)
df.head()
圖片
結(jié)論
根據(jù)上述問題,有一些類別可以幫助確定哪些方面會(huì)真正影響客戶流失。不管客戶在銀行停留了多長時(shí)間,他們?nèi)匀挥锌赡芰魇?,或者說,客戶的銀行賬戶上有相當(dāng)數(shù)量的存款,他們?nèi)匀挥锌赡芰魇А?/p>
41至60歲年齡段的客戶比其他年齡段的客戶更容易流失。為了解決這個(gè)問題,銀行可以集中精力創(chuàng)造或提升產(chǎn)品和服務(wù),以幫助吸引和維護(hù)特定年齡段的客戶,比如為年齡較大的客戶提供更流暢的服務(wù)和最短的排隊(duì)時(shí)間。
持有信用卡的客戶往往不會(huì)流失,而是會(huì)繼續(xù)留在銀行。銀行最好通過各種促銷活動(dòng)說服更多的客戶申請信用卡,這取決于客戶細(xì)分,可根據(jù)客戶的卡種(鉆石卡、白金卡、金卡、銀卡)、性別、年齡、支出和人口分布進(jìn)行細(xì)分。
留存客戶和流失客戶的滿意度得分有點(diǎn)令人擔(dān)憂 [ 3.017960 / 2.997547 ]。銀行需要進(jìn)行評估,以保持流失客戶和留存客戶之間的滿意度得分差距,并保持活躍客戶,因?yàn)榛钴S客戶流失的可能性較低。






