實(shí)戰(zhàn)經(jīng)驗(yàn):Oracle DG 的歸檔缺失修復(fù)
本文轉(zhuǎn)載自微信公眾號(hào)「數(shù)據(jù)和云」,作者AIQ。轉(zhuǎn)載本文請(qǐng)聯(lián)系數(shù)據(jù)和云公眾號(hào)。
故障檢查
檢查歸檔同步情況
一、查看數(shù)據(jù)庫(kù)的情
- select database_role,flashback_on,open_mode,current_scn from v$database
- DATABASE_ROLE FLASHBACK_ON OPEN_MODE CURRENT_SCN
- ---------------- ------------------ -------------------- ---------------
- PHYSICAL STANDBY NO READ ONLY WITH APPLY 16657544972059
二、查看歸檔的最大線程與最大接收的歸檔情況。
- select thread#,max(sequence#) from v$archived_log group by thread#;
- 生產(chǎn)庫(kù):
- SQL> select thread#,max(sequence#) from v$archived_log group by thread#;
- THREAD# MAX(SEQUENCE#)
- ---------- --------------
- 1 136973
- 2 132693
- 4 149599
- 3 133277
- --DG庫(kù)
- SYS@hisnewdb> select thread#,max(sequence#) from v$archived_log group by thread#;
- THREAD# MAX(SEQUENCE#)
- ---------- --------------
- 1 136973
- 2 132693
- 4 149598
- 3 133277
- 可見4個(gè)節(jié)點(diǎn)歸檔是都有會(huì)過來的,sequence都能對(duì)得上。
三、查是否存在GAP
- select * from v$archived_gap;
日志應(yīng)用情況
查看延時(shí)的應(yīng)用情況
- select name ,value,time_computed from v$dataguard_stats where rownum<33;
- NAME VALUE TIME_COMPUTED
- -------------------------------- ---------------------------------------------------------------- ------------------------------
- transport lag +11 06:41:27 03/04/2021 16:41:20
- apply lag +11 06:41:27 03/04/2021 16:41:20
- apply finish time +00 04:23:39.868 03/04/2021 16:41:20
- estimated startup time 37 03/04/2021 16:41:20
- 可看到apply lag的應(yīng)用已經(jīng)延時(shí)11天6小時(shí)了。
- apply finish time應(yīng)用最快的恢復(fù)時(shí)長(zhǎng)為4小時(shí)。
恢復(fù)思路
應(yīng)用日志
- alter database recover managed standby databse cancel; --取消應(yīng)用日志
- alter database open read only; --打開只讀庫(kù)
- alter database recover managed standby ;
- alter database recover managed standby disconnect from session; -- 后臺(tái)應(yīng)用,建議上面命令,放前臺(tái)應(yīng)用。
歸檔還保留或者GAP較少的情況
1)歸檔還在主庫(kù)
方法一:
首先通過備庫(kù)sql查出相應(yīng)的 node[thread#] 和歸檔位置 name:
- select name from v$archived_log where sequence# between &1 and &2 and thread# = &3;
傳輸上面文件到備庫(kù)歸檔位置 archive log list
- #1.asmcd命令 本地環(huán)境與asm存儲(chǔ)cp自由。
- cp arch*.pdf /home/oracle/1.dbf
備庫(kù)上注冊(cè)歸檔文件alter database register logfile '歸檔文件絕對(duì)路徑' 或rman注冊(cè)日志catalog start with '';
應(yīng)用日志,查看select * from V$ARCHIVE_GAP;,監(jiān)視是否還存在其它的GAP出現(xiàn)。如出現(xiàn),如上面步驟循環(huán)操作。
方法二:
在配置 fal_client=${備庫(kù)的監(jiān)聽} 和 fal_server=${主庫(kù)的監(jiān)聽}
直接應(yīng)用日志,由備庫(kù)尋找日志。
2)歸檔已在備庫(kù)
應(yīng)用日志
歸檔已經(jīng)被刪除或GAP較多的情況
查看歸檔所在的位置
alert.log日志:
- 提供等thread 線程4的序列為148164的歸檔,獲取的序號(hào)有148164-148165
- control_keep_record_keep_time是控制文件的重用記錄數(shù)據(jù)。提示在這個(gè)記錄天數(shù)內(nèi)沒找到歸檔文件,建議設(shè)置更長(zhǎng)些天數(shù)。以便GAP找到缺失的日志。
- 默認(rèn)7天,1-365天范圍。
- 記錄的是歸檔日志,各種備份記錄。
- 不記錄數(shù)據(jù)文件,表空間,redo thread記錄。除非被drop,否則不會(huì)重用這部分記錄
- started logmerger process
- Thu Mar 04 16:19:53 2021
- Managed Standby Recovery not using Real Time Apply
- Parallel Media Recovery started with 16 slaves
- Waiting for all non-current ORLs to be archived...
- All non-current ORLs have been archived.
- Media Recovery Waiting for thread 4 sequence 148164
- Fetching gap sequence in thread 4, gap sequence 148164-148165
- Thu Mar 04 16:19:57 2021
- Completed: alter database recover managed standby database disconnect from session
- ----------
- Thu Mar 04 16:21:50 2021
- FAL[client]: Failed to request gap sequence
- GAP - thread 4 sequence 148164-148165
- DBID 3828421454 branch 984679630
- FAL[client]: All defined FAL servers have been attempted.
- ------------------------------------------------------------
- Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
- parameter is defined to a value that's sufficiently large
- enough to maintain adequate log switch information to resolve
- archivelog gaps.
- ------------------------------------------------------------
- Thu Mar 04 16:22:25 2021
- RFS[18]: Selected log 29 for thread 4 sequence 149600 dbid -466545842 branch 984679630
- Thu Mar 04 16:22:25 2021
1)找到當(dāng)前的最小SCN
對(duì)比數(shù)據(jù)文件最后檢查點(diǎn)的scn,數(shù)據(jù)文件頭部檢查點(diǎn)的scn,缺失歸檔的對(duì)應(yīng)scn(下個(gè)日志文件第一個(gè)更改號(hào)),當(dāng)前數(shù)據(jù)庫(kù)的scn:
- select thread#,low_sequence#,high_sequence# from v$archive_gap;
- col datafile_scn for 999999999999999
- col DATAFILE_HEADER_SCN for 999999999999999
- col current_scn for 999999999999999
- col next_change# for 999999999999999
- select ( select min(d.checkpoint_change#) from v$datafile d ) datafile_scn ,
- ( select min(d.checkpoint_change#) from v$datafile_header d where rownum=1) datafile_header_scn,
- (select current_scn from v$database) current_scn,
- (select next_change# from v$archived_log where sequence#=148164 and resetlogs_change# = (select d.resetlogs_change# from v$database d ) and rownum=1 ) next_change#
- from dual;
- DATAFILE_SCN DATAFILE_HEADER_SCN CURRENT_SCN NEXT_CHANGE#
- ---------------- ------------------- ---------------- ----------------
- 16657544969028 16657544972060 16657544972059
取上面最小的scn作為增量備份的SCN
2)主庫(kù)做SCN增量備份
停用備庫(kù)的日志應(yīng)用
- alter database recover managed standby database cancel;
rman備份
- 切換日志
- 切記備份當(dāng)前控制文件
- 增量scn備份
- run {
- allocate channel c1 device type disk;
- allocate channel c2 device type disk;
- allocate channel c3 device type disk;
- allocate channel c4 device type disk;
- allocate channel c5 device type disk;
- allocate channel c6 device type disk;
- CONFIGURE DEVICE TYPE DISK PARALLELISM 6 BACKUP TYPE TO BACKUPSET;
- backup as compressed backupset current controlfile for standby format '/home/oracle/backup/backup_ctl_%U.rman';
- backup as compressed backupset incremental from scn 16657544969028 database format '/home/oracle/backup/backup_%d_%s_%c_%U_%T.rman' include
- current controlfile for standby filesperset 10 tag 'forsdb_16657544969028_0304';
- release channel c1 ;
- release channel c2 ;
- release channel c3 ;
- release channel c4 ;
- release channel c5 ;
- release channel c6 ;
傳輸備份文件到備庫(kù)
- scp -rp /home/oracle/backup/backup host2:/home/oracle
介質(zhì)恢復(fù)備庫(kù)
- 查出控制文件的絕對(duì)目錄位置,后停備庫(kù)
- 啟動(dòng)到nomount
- 恢復(fù)控制文件
- 啟動(dòng)到mount
- 恢復(fù)數(shù)據(jù)文件
- 檢查rman進(jìn)展
- select name from v$controlfile;
- shu immediate;
- startup nomount;
- rman target / <<eof
- restore standby controlfile from '/home/oracle/backup/backup_ctl_%U.rman';
- alter database mount;
- eof
- 如果沒有單獨(dú)備份standby controlfile,就一個(gè)一個(gè)文件來測(cè)試恢復(fù)standby controflie
- restore standby controlfile to '/oradata/hisnewdb/control01.ctl' from '/home/oracle/backup/某個(gè)文件';
- 如果文件太多,可以先rman注冊(cè)文件后,再恢復(fù)控制文件。
- 要找開備庫(kù)mount狀態(tài)才能注冊(cè)
- rmant target / <<eof
- startup mount;
- catalog start with '/home/oracle/backup/';
- list backup of controlfile;
- restore standby controlfile automatic;
- eof
- #大概是這樣。restore standby controlfile automatic;如果不通,就采用上面list的信息,找到具體含有standby controflile的備份文件,再通過restore standby controfile from '';來恢復(fù) 。
- catalog start with '/home/oracle/backup/';
- recover database noredo;
查看rman的恢復(fù)進(jìn)展:
- set line 9999
- select sid,serial#,opname,round(sofar/totalwork*100) completed,trunc(elapsed_seconds/60) elapsed ,trunc(time_remaining/60) remaining,context ,target,sofar,totalwork
- from v$session_longops
- where opname like 'RMAN%' and opname not like '%aggregate%' and totalwork!=0 and sofar<>totalwork;
應(yīng)用日志
檢查standby redo files是否存在:
- select * from v$standby_log;
注冊(cè)standby redolog files
- -- 添加單個(gè)文件:
- alter database add standby logfile group {組號(hào)} 'standby redo logs files 絕對(duì)目錄文件';
- -- 添加多個(gè)standby redologs file
- alter database add standby logfile group {組號(hào)} ('standby redo logs file 1','logfiles2');
應(yīng)用日志
- alter database recover managed standby database cancel ;
- startup mount;
- alter database open read only;
- select open_mode,status,protection_level,protection_mode from v$database ;
- --前臺(tái)應(yīng)用日志
- alter database recover managed standby database ;
- -- 8 parallel 后臺(tái)應(yīng)用日志
- alter database recover managed standby database parallel 8 disconnect from session;
檢查應(yīng)用日志的情況
檢查各個(gè)線程thread#的最大應(yīng)用日志的序列,與主庫(kù)進(jìn)行對(duì)比。
- select thread#,max(sequence#) from v$archived_log where applied='YES' group by thread#;