【博文推薦】Oracle DataGuard 學(xué)習(xí)之 DataGuard FailOver案例
本博文出自51CTO博客客居天涯博主,有任何問(wèn)題請(qǐng)進(jìn)入博主頁(yè)面互動(dòng)討論!
博文地址:http://tiany.blog.51cto.com/513694/1617646
Oracle DG(Dataguard)是目前比較常見(jiàn)的數(shù)據(jù)庫(kù)HA配置策略。通過(guò)實(shí)現(xiàn)Physical Standby和Logical Standby,可以實(shí)現(xiàn)數(shù)據(jù)冗余容錯(cuò)機(jī)制。防止在主庫(kù)出現(xiàn)嚴(yán)重故障,不能支持服務(wù)的時(shí)候,沒(méi)有快速的后備支持環(huán)境。
在DG中,switchover和failover是兩個(gè)重要的概念,也是DG實(shí)現(xiàn)的核心。兩者共同點(diǎn)都是Primary和Standby角色切換,差異在于Planned和UnPlanned之分。Switchover關(guān)鍵點(diǎn)在于Planned,這個(gè)切換動(dòng)作是在運(yùn)維機(jī)構(gòu)規(guī)劃范圍內(nèi)的動(dòng)作。比如,進(jìn)行定期系統(tǒng)軟硬件升級(jí)、設(shè)備維修等動(dòng)作。而Failover是真正出現(xiàn)嚴(yán)重系統(tǒng)故障,如數(shù)據(jù)庫(kù)宕機(jī)、軟硬件故障導(dǎo)致的Primary不能支持服務(wù),從而進(jìn)行的切換動(dòng)作。
根據(jù)不同的DG配置,switchover和failover也是有差異的。理論上,Switchover是不會(huì)造成數(shù)據(jù)丟失的,Primary在切換之后也是在DG配置環(huán)境中,作為Standby存在的。但是Failover則不同,除了運(yùn)行在***保護(hù)(Maximum Protection)模式下,Primary突發(fā)的故障可能引起一部分Redo Log不能及時(shí)的傳遞到Standby端,切換之后很可能有數(shù)據(jù)損失的情況。更重要的是,Primary端在發(fā)生Failover之后,是不能夠直接加入回DG配置的!也就是說(shuō),F(xiàn)ailover之后,Primary實(shí)際上就是被“拋出”了DG環(huán)境。
那么,有什么方法實(shí)現(xiàn)Primary回到原有的環(huán)境呢?這個(gè)問(wèn)題的困難在于保持Primary和Standby一致。在正常情況下,Primary和Standby之間是關(guān)聯(lián)同步的,即使發(fā)生了Switchover,也在可控情況下。Failover過(guò)程中有數(shù)據(jù)的缺失,還有Primary修復(fù)問(wèn)題。在目前流行版本(11g)中,有三個(gè)方法:
- 環(huán)境重建:一種最簡(jiǎn)單的方法就是直接刪除原來(lái)的Primary庫(kù),引用DG重建方法,重新搭建Standby端;
- RMAN備份恢復(fù):如果Primary端保留過(guò)一份Failover之前的備份,則可以強(qiáng)制原來(lái)的Primary端恢復(fù)到進(jìn)行Failover的時(shí)間點(diǎn),之后作為Standby接收當(dāng)前Primary的redo log傳遞,應(yīng)用后可以跟上進(jìn)度;
- Flashback Database恢復(fù):Flashback技術(shù)是作為傳統(tǒng)備份還原技術(shù)的補(bǔ)充,提供了更加便捷的恢復(fù)策略。使用flashback,可以將數(shù)據(jù)庫(kù)恢復(fù)到failover之前的時(shí)間點(diǎn)。之后的過(guò)程和RMAN備份恢復(fù)策略相同;
案例分析:
一、在主庫(kù)端模擬數(shù)據(jù)庫(kù)意外宕機(jī)
- 7scott@bjdb>conn /as sysdba
- Connected.
- sys@bjdb>alter system switch logfile;
- System altered.
- sys@bjdb>shutdown abort
- ORACLE instance shut down.
二、在備庫(kù)端
1、查看切換信息
- 5sys@shdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PHYSICAL STANDBY NOT ALLOWED
- 可以看到此時(shí)備庫(kù)處于無(wú)法切換狀態(tài)
2、直接切換
- sys@shdb>alter database commit to switchover to primary;
- alert_log:(告警日志)
- Fatal NI connect error 12514, connecting to:
- (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=shsrv)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=shdb)(CID=(PROGRAM=oracle)(HOST=bjsrv)(USER=oracle))))
- VERSION INFORMATION:
- TNS for Linux: Version 11.2.0.3.0 - Production
- TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.3.0 - Production
- Time: 04-MAR-2015 21:25:13
- Tracing not turned on.
- Tns error struct:
- ns main err code: 12564
- TNS-12564: TNS:connection refused
- ns secondary err code: 0
- nt main err code: 0
- nt secondary err code: 0
- nt OS err code: 0
- Error 12514 received logging on to the standby
- FAL[client, MRP0]: Error 12514 connecting to shdb for fetching gap sequence
- Wed Mar 04 21:26:00 2015
- alter database commit to switchover to primary
- ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
- Maximum wait for role transition is 15 minutes.
- Switchover: Media recovery is still active
- Database not available for switchover
- End-Of-REDO archived log file has not been recovered
- Database not available for switchover
- End-Of-REDO archived log file has not been recovered
- Database not available for switchover
3、關(guān)閉standby MPR進(jìn)程
- 35sys@shdb>ALTER DATABASE RECOVER managed standby database finish;
- ALTER DATABASE RECOVER managed standby database finish
- Terminal Recovery: request posted (TestDB12)
- Wed Mar 04 21:34:34 2015
- Begin: Standby Redo Logfile archival
- End: Standby Redo Logfile archival
- Terminal Recovery timestamp is '03/04/2015 21:34:34'
- Terminal Recovery: applying standby redo logs.
- Terminal Recovery: thread 1 seq# 34 redo required
- Media Recovery Waiting for thread 1 sequence 34
- Terminal Recovery: End-Of-Redo log allocation
- Terminal Recovery: standby redo logfile 4 created '/dsk4/arch_bj/arch_1_0_820054583.log'
- This standby redo logfile is being created as part of the
- failover operation. This standby redo logfile should be
- deleted after the switchover to primary operation completes.
- Media Recovery Log /dsk4/arch_bj/arch_1_0_820054583.log
- Terminal Recovery: log 4 reserved for thread 1 sequence 34
- Recovery of Online Redo Log: Thread 1 Group 4 Seq 34 Reading mem 0
- Mem# 0: /dsk4/arch_bj/arch_1_0_820054583.log
- Identified End-Of-Redo (failover) for thread 1 sequence 34 at SCN 0xffff.ffffffff
- Incomplete Recovery applied until change 1234252 time 03/04/2015 21:23:43
- MRP0: Media Recovery Complete (TestDB12)
- Terminal Recovery: successful completion
- Wed Mar 04 21:34:35 2015
- ARCH: Archival stopped, error occurred. Will continue retrying
- ORACLE Instance TestDB12 - Archival Error
- ORA-16014: log 4 sequence# 34 not archived, no available destinations
- ORA-00312: online log 4 thread 1: '/dsk4/arch_bj/arch_1_0_820054583.log'
- Forcing ARSCN to IRSCN for TR 0:1234252
- Attempt to set limbo arscn 0:1234252 irscn 0:1234252
- Resetting standby activation ID 2865247982 (0xaac836ee)
- MRP0: Background Media Recovery process shutdown (TestDB12)
- Terminal Recovery: completion detected (TestDB12)
- Completed: ALTER DATABASE RECOVER managed standby database finish
4、切換數(shù)據(jù)庫(kù)到Primary
- sys@shdb>select status from v$instance;
- STATUS
- ------------
- OPEN
- sys@shdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PHYSICAL STANDBY TO PRIMARY
- sys@shdb>alter database commit to switchover to primary;
- Database altered.
- sys@shdb>alter database open;
- Database altered.
- 告警日志:
- alter database commit to switchover to primary
- ALTER DATABASE SWITCHOVER TO PRIMARY (TestDB12)
- Maximum wait for role transition is 15 minutes.
- All dispatchers and shared servers shutdown
- CLOSE: killing server sessions.
- CLOSE: all sessions shutdown successfully.
- Wed Mar 04 21:35:47 2015
- SMON: disabling cache recovery
- Backup controlfile written to trace file /u01/app/oracle/diag/rdbms/bjdb/TestDB12/trace/TestDB12_ora_3146.trc
- Standby terminal recovery start SCN: 1234251
- RESETLOGS after incomplete recovery UNTIL CHANGE 1234252
- Online log /dsk2/oradata/bjdb/redo01b.log: Thread 1 Group 1 was previously cleared
- Online log /dsk1/oradata/bjdb/redo01a.log: Thread 1 Group 1 was previously cleared
- Online log /dsk2/oradata/bjdb/redo02b.log: Thread 1 Group 2 was previously cleared
- Online log /dsk1/oradata/bjdb/redo02a.log: Thread 1 Group 2 was previously cleared
- Online log /dsk2/oradata/bjdb/redo03b.log: Thread 1 Group 3 was previously cleared
- Online log /dsk1/oradata/bjdb/redo03a.log: Thread 1 Group 3 was previously cleared
- Standby became primary SCN: 1234250
- Wed Mar 04 21:35:47 2015
- Setting recovery target incarnation to 3
- AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
- Switchover: Complete - Database mounted as primary
- Completed: alter database commit to switchover to primary
三、原主庫(kù)修復(fù)后,開(kāi)機(jī)
- sys@bjdb>startup
- ORACLE instance started.
- Total System Global Area 442601472 bytes
- Fixed Size 2229184 bytes
- Variable Size 281021504 bytes
- Database Buffers 155189248 bytes
- Redo Buffers 4161536 bytes
- Database mounted.
- Database opened.
- sys@bjdb>select name,database_role,switchover_status from v$database;
- NAME DATABASE_ROLE SWITCHOVER_STATUS
- --------- ---------------- --------------------
- TESTDB12 PRIMARY FAILED DESTINATION
現(xiàn)在原來(lái)的主庫(kù)被修復(fù)后,整個(gè)DataGuara架構(gòu)已經(jīng)被破壞了,所以必須把原來(lái)的主庫(kù)構(gòu)建成新的備庫(kù),重新恢復(fù)DataGuard的環(huán)境。
四、重新構(gòu)建DataGuard
- 1sys@bjdb>select name,database_role from v$database;
NAME DATABASE_ROLE
-------------------------------------------------- ----------------
TESTDB12 PHYSICAL STANDBY