PostgreSQL的最佳特性 你用了嗎?
SQL語句通常不是很容易理解,特別是你閱讀別人已經(jīng)寫好的語句。因此,很多人指出我們應該遵循在其他語言中遵循的原則,像加上注釋和功能模塊化。 我***注意到一個很多人都沒有使用的Postgres關鍵特性,也就是 @timonk在AWS Re:Invent 大會關于數(shù)據(jù)倉庫服務Redshift主題演講時指出的一個特性。這個特性實際上使得SQL兼具了可讀性和模塊性。在以前,我回頭閱讀自己的幾個月前的 SQL語句,通常很難理解,而現(xiàn)在我可以做到這一點。
這個特性就是CTEs,也就是公用表表達式,你有可能稱做它為WITH
語句。和數(shù)據(jù)庫中視圖一樣,它的主要好處就是,它允許你在當前事務中創(chuàng)建臨時表。你可以大量使用它,因為它允許你思路清晰的構建模塊,別人很容易就理解你在做什么。
讓我們舉個簡單的例子
- WITH users_tasks AS (
- SELECT
- users.email,
- array_agg(tasks.name) as task_list,
- projects.title
- FROM
- users,
- tasks,
- project
- WHERE
- users.id = tasks.user_id
- projects.title = tasks.project_id
- GROUP BY
- users.email,
- projects.title
- )
通過這樣定義臨時表users_tasks,我就可以在后面加上對users_tasks基本查詢語句,像:
- SELECT *
- FROM users_tasks;
有趣的是你可以將它們連在一起。當我知道分配給每個用戶的任務量時,也許我想知道在一個指定的任務上,誰因為對這個任務負責超過了50%而因此造成瓶頸。為了簡化,我們可以使用多種方式,先計算每個任務的總量,然后是每人針對每個任務的負責總量。
- total_tasks_per_project AS (
- SELECT
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY project_id
- ),
- tasks_per_project_per_user AS (
- SELECT
- user_id,
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY user_id, project_id
- ),
現(xiàn)在我們將組合一下然后發(fā)現(xiàn)超過50%的用戶
- overloaded_users AS (
- SELECT tasks_per_project_per_user.user_id,
- FROM tasks_per_project_per_user,
- total_tasks_per_project
- WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
- )
最終目標,我想獲得超負荷工作這的用戶和任務的逗號分隔列表。我們只要簡單地對overloaded_users和 users_tasks的初始列表進行join操作。放在一起可能有點長,但是可讀性強。作為額外幫助,我又在每一層加了注釋。
- --- Created by Craig Kerstiens 11/18/2013
- --- Query highlights users that have over 50% of tasks on a given project
- --- Gives comma separated list of their tasks and the project
- --- Initial query to grab project title and tasks per user
- WITH users_tasks AS (
- SELECT
- users.id as user_id,
- users.email,
- array_agg(tasks.name) as task_list,
- projects.title
- FROM
- users,
- tasks,
- project
- WHERE
- users.id = tasks.user_id
- projects.title = tasks.project_id
- GROUP BY
- users.email,
- projects.title
- ),
- --- Calculates the total tasks per each project
- total_tasks_per_project AS (
- SELECT
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY project_id
- ),
- --- Calculates the projects per each user
- tasks_per_project_per_user AS (
- SELECT
- user_id,
- project_id,
- count(*) as task_count
- FROM tasks
- GROUP BY user_id, project_id
- ),
- --- Gets user ids that have over 50% of tasks assigned
- overloaded_users AS (
- SELECT tasks_per_project_per_user.user_id,
- FROM tasks_per_project_per_user,
- total_tasks_per_project
- WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2)
- )
- SELECT
- email,
- task_list,
- title
- FROM
- users_tasks,
- overloaded_users
- WHERE
- users_tasks.user_id = overloaded_users.user_id
CTEs通常不如經(jīng)過精簡優(yōu)化過的SQL語句性能高。大多數(shù)差距小于一倍差距。對我而言,這種為了可讀性作出的折中是毋庸置疑的。Postgres優(yōu)化器以后肯定會針對這點變的更好。
多說一句,是的我可以用大約10-15行簡短的SQL語句做同樣的事情,但是你也許不能很快的理解它。當你碰到需要保證SQL做正確的事情時,可讀性的優(yōu)勢就出來了。SQL語句總是有個結果,你對此毫無疑問。確保你SQL語句容易推理是保證正確性的關鍵。
原文鏈接:http://www.craigkerstiens.com/2013/11/18/best-postgres-feature-youre-not-using/