SQL案例分析:移動(dòng)平均值與累計(jì)求和
許多常見(jiàn)的聚合函數(shù)也可以作為窗口函數(shù)使用,包括AVG()、SUM()、COUNT()、MAX()以及MIN()函數(shù)等。
關(guān)于聚合函數(shù)的語(yǔ)法可以參考這篇文章。
示例表sales_monthly中存儲(chǔ)了不同產(chǎn)品(蘋(píng)果、香蕉以及桔子)每個(gè)月份的銷量情況,以下是該表的創(chuàng)建腳本和數(shù)據(jù):
-- 創(chuàng)建銷量表sales_monthly
-- product表示產(chǎn)品名稱,ym表示年月,amount表示銷售金額(元)
CREATE TABLE sales_monthly(product VARCHAR(20), ym VARCHAR(10), amount NUMERIC(10, 2));
-- 生成測(cè)試數(shù)據(jù)
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201801',10159.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201802',10211.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201803',10247.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201804',10376.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201805',10400.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201806',10565.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201807',10613.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201808',10696.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201809',10751.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201810',10842.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201811',10900.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201812',10972.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201901',11155.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201902',11202.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201903',11260.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201904',11341.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201905',11459.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('蘋(píng)果','201906',11560.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201801',10138.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201802',10194.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201803',10328.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201804',10322.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201805',10481.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201806',10502.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201807',10589.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201808',10681.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201809',10798.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201810',10829.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201811',10913.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201812',11056.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201901',11161.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201902',11173.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201903',11288.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201904',11408.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201905',11469.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('香蕉','201906',11528.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201801',10154.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201802',10183.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201803',10245.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201804',10325.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201805',10465.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201806',10505.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201807',10578.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201808',10680.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201809',10788.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201810',10838.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201811',10942.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201812',10988.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201901',11099.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201902',11181.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201903',11302.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201904',11327.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201905',11423.00);
INSERT INTO sales_monthly (product,ym,amount) VALUES ('桔子','201906',11524.00);
移動(dòng)平均值
AVG()函數(shù)作為窗口函數(shù)使用時(shí),可以用于計(jì)算隨著當(dāng)前行移動(dòng)的窗口內(nèi)數(shù)據(jù)行的平均值。例如,以下語(yǔ)句用于查找不同產(chǎn)品截止到每個(gè)月份為止、最近3個(gè)月的平均銷量:
SELECT product AS "產(chǎn)品", ym "年月", amount "銷量",
AVG(amount) OVER (
PARTITION BY product
ORDER BY ym
ROWS BETWEEN 2PRECEDING AND CURRENT ROW
) AS "最近平均銷量"
FROM sales_monthly
ORDER BY product, ym;
AVG()函數(shù)OVER子句中的PARTITION BY選項(xiàng)表示按照產(chǎn)品進(jìn)行分區(qū),ORDERBY選項(xiàng)表示按照月份進(jìn)行排序,ROWS BETWEEN 2 PRECEDING AND CURRENT ROW表示窗口從當(dāng)前行的前2行開(kāi)始直到當(dāng)前行結(jié)束。
該查詢返回的結(jié)果如下:
產(chǎn)品|年月 |銷量 |最近平均銷量
---|------|--------|------------
桔子|201801|10154.00|10154.000000
桔子|201802|10183.00|10168.500000
桔子|201803|10245.00|10194.000000
桔子|201804|10325.00|10251.000000
桔子|201805|10465.00|10345.000000
桔子|201806|10505.00|10431.666667
...
對(duì)于“桔子”,第一個(gè)月份的分析窗口只有1行數(shù)據(jù),因此平均銷量為10154。第二個(gè)月份的分析窗口為第1行和第2行數(shù)據(jù),因此平均銷量為10168.5((10154+10183)/2)。第三個(gè)月份的分析窗口為第1行到第3行數(shù)據(jù),因此平均銷量為10194((10154+10183+10245)/3)。
依此類推,直到計(jì)算完“桔子”所有月份的平均銷量,然后開(kāi)始計(jì)算其他產(chǎn)品的平均銷量。
累計(jì)求和
SUM()函數(shù)作為窗口函數(shù)時(shí),可以用于統(tǒng)計(jì)指定窗口內(nèi)的累計(jì)值。例如,以下語(yǔ)句用于查找不同產(chǎn)品截止到當(dāng)前月份為止的累計(jì)銷量:
SELECT product AS "產(chǎn)品", ym "年月", amount "銷量",
SUM(amount) OVER (
PARTITION BY product
ORDER BY ym
ROWS BETWEENUNBOUNDED PRECEDING AND CURRENT ROW
) AS "累計(jì)銷量"
FROM sales_monthly
ORDER BY product, ym;
SUM()函數(shù)OVER子句中的PARTITION BY選項(xiàng)表示按照產(chǎn)品進(jìn)行分區(qū),ORDERBY選項(xiàng)表示按照月份進(jìn)行排序,ROWS BETWEENUNBOUNDED PRECEDING AND CURRENT ROW表示窗口從當(dāng)前分區(qū)第1行開(kāi)始直到當(dāng)前行結(jié)束。
該查詢返回的結(jié)果如下:
產(chǎn)品|年月 |銷量 |累計(jì)銷量
---|------|--------|---------
桔子|201801|10154.00| 10154.00
桔子|201802|10183.00| 20337.00
桔子|201803|10245.00| 30582.00
桔子|201804|10325.00| 40907.00
桔子|201805|10465.00| 51372.00
桔子|201806|10505.00| 61877.00
...
對(duì)于“桔子”,第一個(gè)月份的分析窗口只有1行數(shù)據(jù),因此累計(jì)銷量為10154。第二個(gè)月份的分析窗口為第1行和第2行數(shù)據(jù),因此累計(jì)銷量為20337(10154+10183)。第三個(gè)月份的分析窗口為第1行到第3行數(shù)據(jù),因此累計(jì)銷量為30582(10154+10183+10245)。
依此類推,直到計(jì)算完“桔子”所有月份的累計(jì)銷量,然后開(kāi)始計(jì)算其他產(chǎn)品的累計(jì)銷量。
提示:對(duì)于聚合窗口函數(shù),如果我們沒(méi)有指定ORDER BY選項(xiàng),默認(rèn)的窗口大小就是整個(gè)分區(qū)。如果我們指定了ORDERBY選項(xiàng),默認(rèn)的窗口大小就是分區(qū)的第一行直到當(dāng)前行。因此,以上示例語(yǔ)句的中ROWSBETWEEN UNBOUNDED PRECEDING AND CURRENT ROW選項(xiàng)可以省略。
除了使用ROWS關(guān)鍵字以數(shù)據(jù)行為單位指定窗口的偏移量之外,我們也可以使用RANGE關(guān)鍵字以數(shù)值為單位指定窗口的偏移量。
示例表transfer_log中記錄了一些銀行賬號(hào)的交易日志,以下是該表創(chuàng)建腳本:
-- 創(chuàng)建銀行交易日志表transfer_log
-- Oracle、MySQL、PostgreSQL以及SQLite
CREATE TABLE transfer_log
( log_id INTEGER NOT NULL PRIMARY KEY, -- 交易日志編號(hào)
log_ts TIMESTAMP NOT NULL, -- 交易時(shí)間
from_user VARCHAR(50) NOT NULL, -- 交易發(fā)起賬號(hào)
to_user VARCHAR(50), -- 交易接收賬號(hào)
type VARCHAR(10) NOT NULL, -- 交易類型
amount NUMERIC(10) NOT NULL -- 交易金額(元)
);
-- SQL Server
CREATE TABLE transfer_log
( log_id INTEGER NOT NULL PRIMARY KEY, -- 交易日志編號(hào)
log_ts DATETIME2 NOT NULL, -- 交易時(shí)間
from_user VARCHAR(50) NOT NULL, -- 交易發(fā)起賬號(hào)
to_user VARCHAR(50), -- 交易接收賬號(hào)
type VARCHAR(10) NOT NULL, -- 交易類型
amount NUMERIC(10) NOT NULL -- 交易金額(元)
);
-- 生成測(cè)試數(shù)據(jù)
-- Oracle 需要執(zhí)行以下ALTER語(yǔ)句
-- ALTER SESSION SET nls_timestamp_format = 'YYYY-MM-DD HH24:MI:SS';
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (1,'2021-01-02 10:31:40','62221234567890',NULL,'存款',50000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (2,'2021-01-02 10:32:15','62221234567890',NULL,'存款',100000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (3,'2021-01-03 08:14:29','62221234567890','62226666666666','轉(zhuǎn)賬',200000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (4,'2021-01-05 13:55:38','62221234567890','62226666666666','轉(zhuǎn)賬',150000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (5,'2021-01-07 20:00:31','62221234567890','62227777777777','轉(zhuǎn)賬',300000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (6,'2021-01-09 17:28:07','62221234567890','62227777777777','轉(zhuǎn)賬',500000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (7,'2021-01-10 07:46:02','62221234567890','62227777777777','轉(zhuǎn)賬',100000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (8,'2021-01-11 09:36:53','62221234567890',NULL,'存款',40000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (9,'2021-01-12 07:10:01','62221234567890','62228888888881','轉(zhuǎn)賬',10000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (10,'2021-01-12 07:11:12','62221234567890','62228888888882','轉(zhuǎn)賬',8000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (11,'2021-01-12 07:12:36','62221234567890','62228888888883','轉(zhuǎn)賬',5000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (12,'2021-01-12 07:13:55','62221234567890','62228888888884','轉(zhuǎn)賬',6000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (13,'2021-01-12 07:14:24','62221234567890','62228888888885','轉(zhuǎn)賬',7000);
INSERT INTO transfer_log (log_id,log_ts,from_user,to_user,type,amount) VALUES (14,'2021-01-21 12:11:16','62221234567890','62228888888885','轉(zhuǎn)賬',70000);
以下語(yǔ)句用于查找短期之內(nèi)(5天)累計(jì)轉(zhuǎn)賬超過(guò)一百萬(wàn)元的賬號(hào):
-- Oracle、MySQL以及PostgreSQL
SELECT log_ts, from_user,total_amount
FROM (
SELECT log_ts, from_user,
SUM(amount) OVER (
PARTITION BY from_user
ORDER BY log_ts
RANGE INTERVAL '5' DAYPRECEDING
) AS total_amount
FROM transfer_log
WHERE TYPE = '轉(zhuǎn)賬'
) t
WHERE total_amount >= 1000000;
其中,SUM()函數(shù)OVER子句中的RANGE選項(xiàng)指定了一個(gè)5天之內(nèi)的時(shí)間窗口。該查詢返回的結(jié)果如下。
log_ts |from_user |total_amount
-------------------|--------------|------------
2021-01-10 07:46:02|62221234567890| 1050000
賬號(hào)“62221234567890”截止2021年01月10日07點(diǎn)46份02秒在最近5天之內(nèi)累計(jì)轉(zhuǎn)賬105萬(wàn)。
SQLite不支持INTERVAL時(shí)間常量,我們可以將時(shí)間戳數(shù)據(jù)轉(zhuǎn)換為整數(shù)后使用。例如:
-- SQLite
WITH tl(log_ts, unix, from_user,amount) AS (
SELECT log_ts, CAST(STRFTIME('%s',log_ts) AS INT), from_user, amount
FROM transfer_log
WHERE type = '轉(zhuǎn)賬'
)
SELECT log_ts, from_user, total_amount
FROM (
SELECT log_ts, from_user,
SUM(amount) OVER (
PARTITION BY from_user
ORDER BY unix
RANGE 5 * 86400PRECEDING
) AS total_amount
FROM tl
) t
WHERE total_amount >= 1000000;
我們首先定義了一個(gè)CTE,字段unix是將log_ts轉(zhuǎn)換為1970年1月1日以來(lái)的整數(shù)秒。然后我們?cè)赟UM()函數(shù)中通過(guò)RANGE選項(xiàng)指定了一個(gè)5天(5*86400秒)之內(nèi)的時(shí)間窗口。
Microsoft SQL Server中的RANGE窗口大小選項(xiàng)只能指定UNBOUNDED PRECEDING、UNBOUNDED FOLLOWING或者CURRENT ROW,不能指定一個(gè)具體的數(shù)值,因此無(wú)法實(shí)現(xiàn)以上查詢。