層次查詢SQL性能故障不斷?給你份可靠的避坑指南!
近期頻頻遇到層次查詢SQL的性能問題,結(jié)合歷史故障案例,匯總了一些場景connect by常見的性能故障類型,在本文中做個分享。
一、結(jié)果中過濾or生成樹中過濾
過濾條件放置于where后,為在結(jié)果樹生成完成后裁剪葉子節(jié)點;放置于connect by后,為在生成樹的過程中裁剪子樹。
頻繁發(fā)生的現(xiàn)象是業(yè)務(wù)邏輯上其實并不需要先生成結(jié)果樹再去過濾,由于開發(fā)人員對過濾條件放置于不同的位置(where 后,connect by后)產(chǎn)生的過濾效果混淆,導(dǎo)致了低效的性能。
下面這個SQL就是典型案例。用戶反饋,zzzz.SYS_RC_ROUTE_DETAIL表上生產(chǎn)環(huán)境就3000+條數(shù)據(jù),但SQL語句運行時卻跑不出來結(jié)果:
- select xxxxx
- from zzzz.SYS_RC_ROUTE_DETAIL t
- where t.route_id = (select a.route_id
- from xxx.sys_rc_route a, xxx.g_wo_base b
- where a.route_id = b.route_id
- and b.work_order = 'yyyyyyyyy')
- start with t.node_type = '0'
- connect by nocycle prior next_node_id = node_id
讓客戶運行了SQL一分鐘后cancel掉,抓取了監(jiān)視報告如下:
問題點很明顯,表中nextnodeid = node_id的重復(fù)值很多,導(dǎo)致了海量的結(jié)果集。SQL運行的一分鐘內(nèi),connect by尚未把完整的樹生產(chǎn)完成,就已經(jīng)有了3000W+數(shù)據(jù),于是我們開始思考,在邏輯上是否有必要在構(gòu)建完整的樹后再過濾。
與業(yè)務(wù)部門溝通后,發(fā)現(xiàn)果然不需要。
以下數(shù)據(jù)可以測試下,3000行數(shù)據(jù)量,但是count(*) 會非常慢。
- SQL> create table test1 as
- select
- mod(rownum,2) id,
- mod(rownum +1 ,2) id2
- from
- dual
- connect by level <= 3000
- ; 2 3 4 5 6 7 8
- Table created.
- SQL> set timing on
- SQL> select count(*) from test1 where id =0 start with id =0 connect by nocycle prior id = id2 ;
- COUNT(*)
- ----------
- 1500
- Elapsed: 00:09:26.88
- SQL>
結(jié)果中過濾如上所示,用了9分鐘;而生成樹中過濾則只用0.3s:
- SQL> select count(*) from test1 start with id =0 connect by nocycle prior id = id2 and id = 0 ;
- COUNT(*)
- ----------
- 1500
- Elapsed: 00:00:00.31
很多情況下,兩種寫法的結(jié)果集可能是相同的,如下:
- create table test2 as
- select
- rownum id,
- rownum +1 id2,
- rownum + 2 id3
- from
- dual
- connect by level <= 3000;
- SQL> select id from test2 where id3 < 10 start with id = 3 connect by nocycle prior id2 = id;
- ID
- ----------
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 7 rows selected.
- SQL> select id from test2 start with id = 1 connect by nocycle prior id2 = id and id3 <10;
- ID
- ----------
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 7 rows selected.
但其實這兩種寫法在語義上差別很大,結(jié)果集也可能不相同,如下:
- SQL> select id from test2 where id3 = 10 start with id = 3 connect by nocycle prior id2 = id;
- ID
- ----------
- 8
- Elapsed: 00:00:00.13
- SQL> select id from test2 start with id = 3 connect by nocycle prior id2 = id and id3=10;
- ID
- ----------
- 3
- Elapsed: 00:00:00.00
二、CBO估算不準確
層次查詢的SQL語句頻繁出現(xiàn)的問題,就是CBO估算返回結(jié)果集偏差,引起執(zhí)行計劃不準確。雖然表上收集過統(tǒng)計信息,但是CBO對于結(jié)果集的估算跟實際值偏差非常大(幾百上千的倍的差距),但是這個也不能全怪CBO,畢竟遞歸查詢有多少層、有多少數(shù)據(jù)要裁剪,結(jié)合起來考慮,結(jié)果確實難以估量。
對于CBO估算不準的問題,我們考慮了對結(jié)果集相對特殊的參數(shù),在SQL文本上做區(qū)分,應(yīng)用識別特殊參數(shù)運行帶hint地改造SQL,通過hint來指定返回結(jié)果集。這種情況不同于普通的數(shù)據(jù)傾斜,無法通過baseline給出一個不涉及應(yīng)用改造的方案。
三、并行處理
層次查詢的SQL直接使用parallel的hint,會遭遇并行串行化的問題,也就是不能真正并行。對于一些重要且耗時長的層次查詢,可以考慮PIPELINED TABLE FUNCTION改寫SQL的方式來實現(xiàn)。
以下腳本測試參考了陳煥生童鞋的blog以及oracle相關(guān)文檔(Doc ID 2168864.1):
- drop table t1;
- -- t1 with 100,000 rows
- create table t1
- as
- select
- rownum id,
- lpad(rownum, 10, '0') v1,
- trunc((rownum - 1)/100) n1,
- rpad(rownum, 100) padding
- from
- dual
- connect by level <= 100000
- ;
- begin
- dbms_stats.gather_table_stats(user,'T1');
- end;
- /
- select /*+ monitor */
- count(*)
- from
- (
- select
- CONNECT_BY_ROOT ltrim(id) root_id,
- CONNECT_BY_ISLEAF is_leaf,
- level as t1_level,
- a.v1
- from t1 a
- start with a.id <=1000
- connect by NOCYCLE id = prior id + 1000
- );
- create or replace package refcur_pkg
- AS
- TYPE R_REC IS RECORD (row_id ROWID);
- TYPE refcur_t IS REF CURSOR RETURN R_REC;
- END;
- /
- create or replace package connect_by_parallel
- as
- /* Naviagates a shallow hiearchy in parallel, where we do a tree walk for each root */
- CURSOR C1 (p_rowid ROWID) IS -- Cursor done for each subtree. This select is provided by the customer
- select CONNECT_BY_ROOT ltrim(id) root_id, CONNECT_BY_ISLEAF is_leaf, level as t1_level, a.v1
- from t1 a
- start with rowid = p_rowid
- connect by NOCYCLE id = prior id + 1000;
- TYPE T1_TAB is TABLE OF C1%ROWTYPE;
- FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB
- PIPELINED
- PARALLEL_ENABLE(PARTITION p_ref BY ANY);
- END connect_by_parallel;
- /
- create or replace package body connect_by_parallel
- as
- FUNCTION treeWalk (p_ref refcur_pkg.refcur_t) RETURN T1_TAB
- PIPELINED PARALLEL_ENABLE(PARTITION p_ref BY ANY)
- IS
- in_rec p_ref%ROWTYPE;
- BEGIN
- execute immediate 'alter session set "_old_connect_by_enabled"=true';
- LOOP -- for each root
- FETCH p_ref INTO in_rec;
- EXIT WHEN p_ref%NOTFOUND;
- FOR c1rec IN c1(in_rec.row_id) LOOP -- retrieve rows of subtree
- PIPE ROW(c1rec);
- END LOOP;
- END LOOP;
- execute immediate 'alter session set "_old_connect_by_enabled"=false';
- RETURN;
- END treeWalk;
- END connect_by_parallel;
- /
- SELECT
- /*+ monitor */
- COUNT(*)
- FROM TABLE(connect_by_parallel.treeWalk (CURSOR
- (SELECT /*+ parallel (a 100) */
- rowid FROM t1 a WHERE id <= 100))) b;
層次查詢的SQL在整個SQL優(yōu)化場景中占比相對較小,但這種類型的SQL優(yōu)化卻往往比較麻煩,本文分享的三個案例均為實戰(zhàn)中總結(jié),對于Oracle層次查詢的SQL優(yōu)化有極大的借鑒意義,特別是陳煥生提供的做并行的案例,含金量很高,感興趣的童鞋可以測試下。
作者介紹
蔣健,云趣網(wǎng)絡(luò)科技聯(lián)合創(chuàng)始人,Oracle ACE,11g OCM,多年Oracle設(shè)計、管理及實施經(jīng)驗,精通數(shù)據(jù)庫優(yōu)化,Oracle CBO及并行原理。云趣鷹眼監(jiān)控核心設(shè)計和開發(fā)者,資深Python Web開發(fā)者。