LMSn沒有運行在RT (real time) 模式Oracle 19c RAC?
本文轉載自微信公眾號「數(shù)據(jù)和云」,作者張維照 。轉載本文請聯(lián)系數(shù)據(jù)和云公眾號。
Oracle希望在數(shù)據(jù)庫主機CPU使用率枯竭時,盡可能讓核心的幾個后臺進程可以最大優(yōu)先級獲取CPU, 當然CPU過高會導致I/O 響應時間變長和網(wǎng)絡延遲增加,也會間接影響數(shù)據(jù)的整體性能。
從Oracle 10g開始是有隱藏參數(shù)_high_priority_processes控制哪些進程是高優(yōu)先級,19c參數(shù)中除了_high_priority_processes,還增加了_highest_priority_processes控制高優(yōu)先級。在10.2版本中,Oracle缺少_high_priority_processes對RAC的核心進程LMS*設置高優(yōu)先級,在11g版本中對LMS*|VKTM設置高優(yōu)先級,在19c版本中_highest_priority_processes對VKTM是最高優(yōu)先級,且提供了更多對LMS*|LM*|LCK0|GCR*|CKPT|DBRM|RMS0|LGWR|CR*|RMV*配置高優(yōu)先級 。
記得在10.2.0.3前有個bug會導致使用CPU進程過高。最近有客戶19c RAC遇到CPU使用率超過90%時GC問題較為突出,在查看LMS時,沒有在RT模式的狀態(tài)引起了注意,在19c中LMS還是有一些變化,下面進行了簡單的記錄。
在Linux平臺上,進程的內(nèi)核調(diào)用模式分為三類:
- TS – SCHED_OTHER (SCHED_NORMAL) ,這是分時調(diào)度策略,缺省的通用級別;
- FF – SCHED_FIFO,這是實時調(diào)度策略,先進先出;
- RR – SCHED_RR,實時調(diào)度策略,時間片輪轉;
先看一個正常環(huán)境 Oracle 19c RAC 2-nodes on RHEL 7.8
db alert log
- Starting background process CLMN
- CLMN started with pid=3, OS id=28714
- Starting background process PSP0
- PSP0 started with pid=4, OS id=28731
- Starting background process IPC0
- 2021-03-23 10:07:32.440000 +08:00
- IPC0 started with pid=5, OS id=29420
- Starting background process VKTM
- Starting background process GEN0
- VKTM started with pid=6, OS id=29445 at elevated (RT) priority
- VKTM running at (1)millisec precision with DBRM quantum (100)ms
- Starting background process MMAN
- Starting background process LMD1
- LMD0 started with pid=23, OS id=29631
- * Load Monitor used for high load check
- * New Low - High Load Threshold Range = [130560 - 174080]
- LMS1 started with pid=26, OS id=29640_29663 at elevated (RT) priority
- LMS0 started with pid=24, OS id=29635_29662 at elevated (RT) priority
- LMS2 started with pid=28, OS id=29646_29666 at elevated (RT) priority
- Starting background process LMD2
- LMD1 started with pid=36, OS id=29659
- LMS3 started with pid=30, OS id=29649_29667 at elevated (RT) priority
- LMS4 started with pid=32, OS id=29651_29672 at elevated (RT) priority
- LMS5 started with pid=34, OS id=29653_29677 at elevated (RT) priority
- Starting background process LMD3
- LMD2 started with pid=37, OS id=29681
- LMD3 started with pid=38, OS id=29686
- Starting background process RMS0
- RMS0 started with pid=39, OS id=29689
- oracle@anbob_com:/home/oracle> ps -efc|grep vktm
- grid 34874 1 RR 41 Jun03 ? 00:06:20 asm_vktm_+ASM1
- oracle 42358 1 RR 41 Jun03 ? 00:05:24 ora_vktm_anbob1
- grid 58462 1 RR 41 Jun03 ? 00:06:18 mdb_vktm_-MGMTDB
Note:
使用ps-c選項查看進程優(yōu)先級時, vktm是RR mode。
- oracle@anbob_com:/home/oracle> ps -efc|grep lms
- oracle 35148 90946 TS 19 16:02 pts/3 00:00:00 grep --color=auto lms
- oracle 66573 1 TS 19 May21 ? 04:32:32 ora_lms0_anbob1
- oracle 66576 1 TS 19 May21 ? 04:29:41 ora_lms1_anbob1
- oracle 66578 1 TS 19 May21 ? 04:26:33 ora_lms2_anbob1
- oracle 66581 1 TS 19 May21 ? 04:26:51 ora_lms3_anbob1
- oracle 66586 1 TS 19 May21 ? 04:25:38 ora_lms4_anbob1
- oracle 66589 1 TS 19 May21 ? 04:28:44 ora_lms5_anbob1
- oracle 66596 1 TS 19 May21 ? 04:25:44 ora_lms6_anbob1
- oracle 66599 1 TS 19 May21 ? 04:50:02 ora_lms7_anbob1
- oracle 66603 1 TS 19 May21 ? 04:22:42 ora_lms8_anbob1
- oracle 66609 1 TS 19 May21 ? 04:21:31 ora_lms9_anbob1
- oracle 66615 1 TS 19 May21 ? 04:25:41 ora_lmsa_anbob1
- oracle 66620 1 TS 19 May21 ? 04:29:43 ora_lmsb_anbob1
- grid 129022 1 TS 19 May14 ? 00:36:49 asm_lms0_+ASM1
Note:
使用ps-c選項查看進程優(yōu)先級時,lms還是TS Mode。在12c版本及之前PS也是顯示RR mode,如下:
- # sqlplus -V
- SQL*Plus: Release 12.2.0.1.0 Production
- # ps -eLfc |head -n 1;ps -eLfc|grep lms
- UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD
- grid 14661 1 14661 1 RR 41 2019 ? 1-08:14:40 asm_lms0_+ASM1
- oracle 62106 1 62106 1 RR 41 2019 ? 17-22:45:22 ora_lms0_weejar1
- oracle 62109 1 62109 1 RR 41 2019 ? 18-10:30:26 ora_lms1_weejar1
- oracle 62111 1 62111 1 RR 41 2019 ? 18-00:13:16 ora_lms2_weejar1
- oracle 62113 1 62113 1 RR 41 2019 ? 17-22:02:20 ora_lms3_weejar1
- oracle 62115 1 62115 1 RR 41 2019 ? 17-22:07:53 ora_lms4_weejar1
檢查oradism文件
- oracle@anbob_com:/home/oracle> ls -l $ORACLE_HOME/bin/oradism
- -rwsr-x--- 1 root oinstall 147848 Apr 17 2019 /oracle/app/oracle/product/19c/db_1/bin/oradism
正常。
Note:
For 10gR2 and 11gR1 installations, verify that the oradism executable matches the following ownership and permissions “-rwsr-sr-x 1 root dba oradism” and make sure the lms is running in Real Time mode.
檢查Oracle_HOME文件系統(tǒng)掛載點
- oracle@anbob_com:/home/oracle> cat /proc/mounts|grep oracle
- /dev/mapper/fusioncube-oracle /oracle ext4 rw,relatime,stripe=16,data=ordered 0 0
正常。
AWR中LMS
- RAC Statistics
- Begin End
- Number of Instances: 2 2
- Number of LMS’s: 12 12
- Number of realtime LMS’s: 12 12 (0 priority changes)
檢查后臺進程
- SQL> select 'LMS', INST_ID,PRIORITY,COUNT(*) TOTAL FROM GV$BGPROCESS where name like 'LMS%' GROUP BY INST_ID,PRIORITY ;
- 'LMS' INST_ID PRIORITY TOTAL
- ------ ---------- ---------------- ----------
- LMS 1 RT 12
- LMS 2 RT 12
種種顯示當前LMS進程是RT模式,但PS顯示進程還是TS,難道是顯示問題?還是Oracle有新特性改變?
答案是的確發(fā)生了新變化,從18c開始LMS進程改為線程模式。
- oracle@anbob_com:/home/oracle> ps -eLfc |head -n 1;ps -eLfc|grep lms
- UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD
- oracle 66573 1 66573 4 TS 19 May21 ? 00:00:08 ora_lms0_anbob1
- oracle 66573 1 66580 4 RR 41 May21 ? 03:15:29 ora_lms0_anbob1
- oracle 66573 1 67219 4 TS 19 May21 ? 00:23:08 ora_lms0_anbob1
- oracle 66573 1 67240 4 TS 19 May21 ? 00:53:41 ora_lms0_anbob1
- oracle 66576 1 66576 4 TS 19 May21 ? 00:00:08 ora_lms1_anbob1
- oracle 66576 1 66582 4 RR 41 May21 ? 03:12:36 ora_lms1_anbob1
- oracle 66576 1 67270 4 TS 19 May21 ? 00:23:09 ora_lms1_anbob1
- oracle 66576 1 67301 4 TS 19 May21 ? 00:53:43 ora_lms1_anbob1
- oracle 66578 1 66578 4 TS 19 May21 ? 00:00:08 ora_lms2_anbob1
- oracle 66578 1 66591 4 RR 41 May21 ? 03:10:10 ora_lms2_anbob1
- oracle 66578 1 67339 4 TS 19 May21 ? 00:22:52 ora_lms2_anbob1
- ...
OK.
再看另一個問題環(huán)境Oracle 19.4 2-nodes RAC on RHEL 7.5
- RAC Statistics
- Begin End
- Number of Instances: 2 2
- Number of LMS’s: 40 40
- Number of realtime LMS’s: 0 0 (0 priority changes)
- SQL> select * from v$bgprocess where name like 'LMS%';
- PADDR PSERIAL# NAME DESCRIPTION PRIORITY CON_ID
- ---------------- ---------- ----- -------------------------------- -------- ----------
- 0000001E01B628A0 1 LMS0 global cache service process TS 0
- 0000001E01B65360 1 LMS7 global cache service process TS 0
- 0000001E01B67E20 1 LMSE global cache service process TS 0
- 0000001E01B6A8E0 1 LMSL global cache service process TS 0
- 0000001E01B6D3A0 1 LMSS global cache service process TS 0
- 0000001E01B6FE60 1 LMSZ global cache service process TS 0
- 0000001E21AC8498 1 LMS3 global cache service process TS 0
- 0000001E21ACAF58 1 LMSA global cache service process TS 0
- 0000001E21ACDA18 1 LMSH global cache service process TS 0
- 0000001E21AD04D8 1 LMSO global cache service process TS 0
- 0000001E21AD2F98 1 LMSV global cache service process TS 0
- 0000001E41A66B58 1 LMS6 global cache service process TS 0
- ...
db alert log
- 2021-06-03T10:50:19.500768+08:00
- LMON started with pid=22, OS id=98747
- Starting background process LMD0
- 2021-06-03T10:50:19.527437+08:00
- LMD0 started with pid=23, OS id=98749
- Starting background process LMD1
- 2021-06-03T10:50:19.528918+08:00
- * Load Monitor used for high load check
- * New Low - High Load Threshold Range = [230400 - 307200]
- 2021-06-03T10:50:19.703222+08:00
- Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lms0_98751_98758.trc (incident=873064):
- ORA-00800: soft external error, arguments: [Set Priority Failed], [LMS0], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
- Incident details in: /u01/oracle/diag/rdbms/anbob1/anbob11/incident/incdir_873064/anbob11_lms0_98751_98758_i873064.trc
- 2021-06-03T10:50:19.711460+08:00
- Error attempting to elevate LMS0's priority: no further priority changes will be attempted for this process
- LMS0 started with pid=24, OS id=98751_98758
- 2021-06-03T10:50:19.800751+08:00
- Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsd_98808_98825.trc (incident=873065):
- ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSD], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
- 2021-06-03T10:50:19.815049+08:00
- Error attempting to elevate LMSD's priority: no further priority changes will be attempted for this process
- LMSD started with pid=50, OS id=98808_98825
- 2021-06-03T10:50:19.924836+08:00
- LMD1 started with pid=104, OS id=98950
- 2021-06-03T10:50:19.924929+08:00
- Starting background process LMD2
- 2021-06-03T10:50:19.944617+08:00
- Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsb_98797_98815.trc (incident=873066):
- ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSB], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
- 2021-06-03T10:50:19.945838+08:00
- Error attempting to elevate LMSB's priority: no further priority changes will be attempted for this process
- Starting background process LMD3
- 2021-06-03T10:50:19.949748+08:00
Note:
這套環(huán)境的LMS進程運行在TS模式,是因為在實例啟動時遇到了ORA-800錯誤[Set Priority Failed]失敗了。
檢查oradism
- oracle@anbob1a:/home/oracle/scripts_oracle$ ls -l $ORACLE_HOME/bin/oradism
- -rwxr-x--- 1 oracle oinstall 147848 Apr 17 2019 /u01/oracle/product/bin/oradism
對于這個環(huán)境的owner和權限都是錯的,修正后重啟實例就可以解決。
也可以root用戶使用chrt在線修改進程為RR mode。
- # chrt -r -p 1 [lms pid]
關于作者
張維照,云和恩墨技術總監(jiān),Oracle ACE-A。2006年起從事數(shù)據(jù)庫管理工作,2009年起從事ORACLE DBA維護工作,十余年來專注于Database 技術和架構的研究,熱衷于oracle數(shù)據(jù)庫故障診斷、性能優(yōu)化、內(nèi)部原理、新特性的學習與分享,在BLOG分享大量的學習和案例經(jīng)驗。從事過多套TB級省級工商、醫(yī)療、交通、人社、政府、電信運營商等行業(yè)數(shù)據(jù)庫項目從業(yè)經(jīng)驗。