Oracle 启动故障案例之--ORA-600 [4193]错误
Oracle 启动故障案例之--ORA-600 [4193]错误
操作系统:Oracle Linux 5
数据库: Oracle 11gR2(11.2.0.3.0)
一、故障现象: 1、在做了redo log当前日志组被破坏恢复的测试后 2、启动数据库后出现ORA-600 【4193】的错误 3、数据库被强制关闭 查看告警日志: [oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log Block recovery completed at rba 5.111.16,scn 0.1430641 Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc (incident=36027): ORA-00600: internal error code,arguments: [4193],[],[] Incident details in: /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_36027/enmoedu_pmon_10635_i36027.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue Dec 13 12:54:04 2016 Dumping diagnostic data in directory=[cdmp_20161213125404],requested by (instance=1,osid=10635 (PMON)),summary=[incident=36027]. Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_pmon_10635.trc: ORA-00600: internal error code,[] PMON (ospid: 10635): terminating the instance due to error 472 System state dump requested by (instance=1,summary=[abnormal instance termination]. System State dumped to trace file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_diag_10653.trc Dumping diagnostic data in directory=[cdmp_20161213125405],summary=[abnormal instance termination]. Instance terminated by PMON,pid = 10635 查看trace文件: [oracle@ocm1 ~]$more /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/incident/incdir_38544/enmoedu_mmon_3181_i38544.trc ...... ----- START DDE Action: 'dumpKernelDiagState' (Sync) ----- ------- Kernel Diag Dump ------- dbkcBSExt: 0 dbkedDefDump info: Internal err count: 4 Error Flags: 0x0 Exception: FALSE Bootstrapping info: Flags: 0x17 Options: 0x806 Diag Dest: /u01/app/oracle DB Unique name: enmoedu Instance Name: enmoedu ------- END Kernel Diag Dump ------- ----- END DDE Action: 'dumpKernelDiagState' (SUCCESS,0 csec) ----- ----- START DDE Action: 'xdb_dump_buckets' (Sync) ----- ----- END DDE Action: 'xdb_dump_buckets' (FAILURE,0 csec) ----- ----- START DDE Action: 'dumpKGERing' (Sync) ----- ----- END DDE Action: 'dumpKGERing' (SUCCESS,0 csec) ----- ----- START DDE Action: 'dumpKGEState' (Sync) ----- kgepgtfr 0x7fffb09068d0 kgepgtba 0x7fffb09107a8 kgepgter 5 kgepgpar kgepgbpa 0xbaf48c5 kgepgepa 0xbaf5064 kgepgtfd 21 kgepgdmc 0 kgepgflg 0x8 kgepg_stkgfr (nil) kgepgkgsmp 0xbaf3fa0 kgepgspm 4 kgepg_ba_set_in_eh 0x7fffb09114b0 kgepg_kgecatch_set_in_eh_ba (nil) kge_ba_set_in_eh_funcloc 0x9b975bc kge_ba_set_in_eh_fileloc 0x9b97890 ------------------- start error stack dump with barriers <error barrier> at 0x7fffb09107a8 ORA-00603: ORACLE server session terminated by fatal error ORA-24557: error 600 encountered while handling error 600; exiting server process ORA-00600: internal error code,[] <error barrier> at 0x7fffb09114b0 ORA-00600: internal error code,[] ORA-00600: internal error code,[] <error barrier> at 0x7fffb0915ed8 ------------------- end error stack dump with barriers ----- END DDE Action: 'dumpKGEState' (SUCCESS,0 csec) ----- ----- START DDE Action: 'kpuActionDefault' (Sync) ----- Begin OCI Current State Dump End OCI Current State Dump Begin OCI Call Context Dump End OCI Call Context Dump Begin Process state dump. ttcdrvdmplocation: msg-0 ln-0 reporting 0 HST is NULL or no two task connection End Process state dump. ----- END DDE Action: 'kpuActionDefault' (SUCCESS,0 csec) ----- ----- END DDE Actions Dump (total 1 csec) ----- End of Incident Dump 根据MOS介绍,此故障一般和undo segment有关 二、解决方法: 1、通过spfile生成pfile 01:55:31 SYS@ enmoedu>create pfile from spfile; File created. 2、编辑pfile文件 [oracle@ocm1 dbs]$ vi initenmoedu.ora #*.undo_tablespace='UNDOTBS1' undo_management = 'MANUAL' rollback_segments = 'SYSTEM' 3、通过pfile启动Instance 01:58:48 SYS@ enmoedu>startup mount pfile='$ORACLE_HOME/dbs/initenmoedu.ora'; ORACLE instance started. Total System Global Area 521936896 bytes Fixed Size 2229944 bytes Variable Size 360712520 bytes Database Buffers 155189248 bytes Redo Buffers 3805184 bytes Database mounted. Elapsed: 00:00:00.00 02:00:07 SYS@ enmoedu>show parameter undo NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ undo_management string MANUAL undo_retention integer 900 undo_tablespace string 4、打开数据库 02:00:16 SYS@ enmoedu>alter database open; Database altered. 此时打开数据库正常: [oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log alter database open Beginning crash recovery of 1 threads Started redo scan Completed redo scan read 43 KB redo,36 data blocks need recovery Started redo application at Thread 1: logseq 7,block 3 Recovery of Online Redo Log: Thread 1 Group 1 Seq 7 Reading mem 0 Mem# 0: /u01/app/oracle/oradata/enmoedu/redo01.log Completed redo application of 0.03MB Completed crash recovery at Thread 1: logseq 7,block 90,scn 1491526 36 data blocks read,36 data blocks written,43 redo k-bytes read Wed Dec 14 02:00:26 2016 Thread 1 advanced to log sequence 8 (thread open) Thread 1 opened at log sequence 8 Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Wed Dec 14 02:00:26 2016 SMON: enabling cache recovery Undo initialization finished serial:0 start:479574 end:479584 diff:10 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Wed Dec 14 02:00:27 2016 QMNC started with pid=20,OS id=3522 Completed: alter database open Wed Dec 14 02:00:28 2016 Starting background process CJQ0 Wed Dec 14 02:00:28 2016 CJQ0 started with pid=26,OS id=3554 Wed Dec 14 02:00:30 2016 db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files,and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Wed Dec 14 02:00:51 2016 Errors in file /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/enmoedu_j001_3562.trc: ORA-01552: cannot use system rollback segment for non-system tablespace 'TEMP' 5、删除原有的undo tablespace创建新的undo tablespace 02:00:27 SYS@ enmoedu>drop tablespace undotbs1 including contents and datafiles; Tablespace dropped. 02:03:32 SYS@ enmoedu>create undo tablespace undotbs1 02:03:39 2 datafile '/u01/app/oracle/oradata/enmoedu/undotbs01.dbf' size 100m 02:03:50 3 autoextend on; Tablespace created. 6、关闭数据库,重新通过spfle启动 02:04:02 SYS@ enmoedu>shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. 02:05:45 SYS@ enmoedu>startup ORACLE instance started. Total System Global Area 521936896 bytes Fixed Size 2229944 bytes Variable Size 360712520 bytes Database Buffers 155189248 bytes Redo Buffers 3805184 bytes Database mounted. Database opened. 查看告警日志,数据库启动正常,问题解决! [oracle@ocm1 ~]$ tail -f /u01/app/oracle/diag/rdbms/enmoedu/enmoedu/trace/alert_enmoedu.log ALTER DATABASE OPEN Thread 1 opened at log sequence 8 Current log# 2 seq# 8 mem# 0: /u01/app/oracle/oradata/enmoedu/redo02.log Successful open of redo thread 1 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set SMON: enabling cache recovery [3870] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:808234 end:808274 diff:40 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Wed Dec 14 02:05:55 2016 QMNC started with pid=20,OS id=3874 Completed: ALTER DATABASE OPEN Wed Dec 14 02:05:56 2016 db_recovery_file_dest_size of 2048 MB is 0.00% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files,and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Starting background process CJQ0 Wed Dec 14 02:05:56 2016 CJQ0 started with pid=22,OS id=3902 附:(转MOS文档) ORA-600[4193] 这个错误也是与UNDO 有关系,MOS 上有几篇相关的说明文章. 一.MOS说明1.1 ORA-600 [4193] WhenTrying To Open The Database [ID 763566.1] Symptoms Copying database from one server to another server and getting an ORA-600 [4193] error when trying to open the database on the destination server. --copy 数据库从一个server 到另一个server 后,尝试打开时报这个错误。 Cause The online redo logs were copied when the source database was open,online redo logs should never be copied when the database is open. --导致原因是因为在数据库open时把online redo logs 也一起copy 过去了。 在数据库open状态,online redo log 不应该copy。 Solution In this instance the datafiles were being copied properly after the tablespaces were put in to backup mode,however,online redo logs should only be copied if the source database is shutdown first before copying the online redo logs. The source database needed to remain open so,the datafiles were copied again (withthe tablespaces in backup mode) and then a number of archive logs were transferred over to the new server and after the last archivelog was applied the database could be opened with resetlogs and new online redo logs were created on the destination server. --当表空间被设置为backup 模式之后,可以copy 数据文件,但是onlineredo log 只能是在数据库shutdown 之后才能copy,如果数据库一直是open 状态,那么只能把datafile copy 过去,然后把归档文件传送过去,最后用openresetlogs的方式打开数据库,在open时online redo log 会自动重建。 1.2 Ora-600 [4193] WhenOpening Or Shutting Down A Database [ID 452662.1]1.2.1 Symptoms Errors in alert.log: TueJul1713:38:132007 yms_smon_8337.trc: SO:0xdfaec728,type:24,owner:0xdf266580,flag:INIT/-/-/0x00 (buffer)PR:0xdf1f1338FLG:0x1000 UNDO BLK: 1.2.2 Cause When we try toapply redo to an undo block (forward changes are made by the applicationof redo to a block) we check that the seq# in the undo record matches the seq# in the redo record. --数据库在启动时需要进行一个前滚的操作,在前滚时会应用redo 到undo block上,操作时会检查undorecord里的seq#和 redo record里的seq#. These seq# should be the same because when we apply a redo record we must apply itto thecorrect version of the block. --正常情况下,这2者的seq# 应该是一致的。 We can only apply a redo record to a block that contains the same seq# as in the redo record. --在一致的情况下,我们才应用redo record 到undo record。 If the seq# do not match then ORA-600[4193][a].[b] is raised. . Arg [a] Undorecord seq number --> seq: 0xde0 = 3552 --如果不一致就会出现ORA-600[4193][a][b]的错误。其中a 是undo 里的seq#记录,b是redo 里的seq# 值。 这里的值都是十六进程,我们可以通过to_number() 这个函数来转换一下: SYS@anqing1(rac1)> Select to_number('de0','xxxx') from dual; TO_NUMBER('DE0','XXXX') ----------------------- 3552 This implies some kind of block corruptionin either the redo or the undo block. --当redo record 和 undo record 不一致时,就会抛出ORA-600[4193]的错误。 相关的文章参考: Oracledatafile block 格式 说明http://www.linuxidc.com/Linux/2012-08/66994.htm 1.2.3 Solution 1.2.3.1 If Database is opened: --在db open 状态下,解决的方法如下: 1) Find out the rollback segment,based onthe first part of the xid: 0x0002.045.00006c61 usn=2 is the segment_id select segment_name,status from dba_rollback_segs where segment_id=2; RS_DATA1ONLINE 2) Dump the transaction table of the rollbacksegment to see if all TX are commited: alter system dump undoheader RS_DATA1; Oracle dumpundo 说明http://www.linuxidc.com/Linux/2012-08/66995.htm 3) check the trace file created underuser_dump_dest In the trace file search for the Keyword "TRN TBL" TRNTBL:: state=9 means transaction is committed 4) offline the rollback segment: alter rollback segment rs_data1 offline; drop rollback segment RS_DATA1; 1.2.3.2 If Database doesn't open: --如果数据库不是open状态,处理方法如下: 1. a) If using rollback segments,remove the rollback_segments line from init.ora,and open database b) If using undo segments set undo_management = manual in init.ora/spfile,and try to opendatabase. 2. If database opens means all transactions are committed,and you can drop the rollback segment or the undo tablespace 1.3 bug 导致的ORA-600[4193]MOS: ORA-600 [4193] "seq# mismatch while adding undo record" [ID 39282.1] Bug 8240762 - Undo corruptions with ORA-600[4193]/ORA-600 [4194] or ORA-600 [4137] [ID 8240762.8] Undo corruptionmay be caused after a shrink and the same undo block may be used for two different transactions causing several internal errors like: ORA-600 [4193] / ORA-600 [4194] for new transactions ORA-600 [4137] for a transaction rollback Undo segment shrink is internally done by Oracle. --undo shrink 导致的undo corruptions Workaround Drop the undo segment. Affects: |