[转]如何阅读systemstate dump
转自老白的<oracle rac 日记>一书, dump systemstate产生的跟踪文件包含了系统中所有进程的进程状态等信息。每个进程对应跟踪文件中的一段内容,反映该进程的状态信息,包括进程信息,会话信息,enqueues信息(主要是lock的信息),缓冲区的信息和该进程在SGA区中持有的(held)对象的状态等信息。dump systemstate产生的跟踪文件是从dump那一刻开始到dump任务完成之间一段事件内的系统内所有进程的信息。 那么通常在什么情况下使用systemstate比较合适呢? Oracle推荐的使用systemstate事件的几种情况是: 数据库hang住了 数据库很慢 进程正在hang 数据库出现某些错误 资源争用 dump systemstate的语法为: ALTER SESSION SET EVENTS ‘immediate trace name systemstate level 10’; 也可以使用ORADEBUG实现这个功能: sqlplus -prelim / as sysdba oradebug setmypid oradebug unlimit; oradebug dump systemstate 10 如果希望在数据库发生某种错误时调用systemstate事件,可以在参数文件(spfile或者pfile)中设置event参数, 例如,当系统发生死锁(出现ORA-00060错误)时dump systemstate: event = “60 trace name systemstate level 10” LEVEL参数: 10Dump all processes (IGN state) 5Level 4 + Dump all processes involved in wait chains (NLEAF state) 4Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state) 3Level 2 + Dump only processes thought to be in a hang (IN_HANG state) 1-2Only HANGANALYZE output,no process dump at all 如果Level过大的话会产生大量的跟踪文件并影响系统的I/O性能,建议不要采用3级以上的跟踪。Hanganalyze报告会分作许多片断,会话片断信息总是由一个header详尽描述被提取的的会话信息。 一般来说,一份systemstate dump中包含了以下内容: dump header文件头 process dump dump时所有的process的dump信息,每个process一个专门的章节。 call dump在process dump中,包含call dump session dump每个process中,都有1个或多个(MTS时)session dump enqueue dump buffer dump在session dump中可能包含buffer dump 在阅读systemstate dump时,一般首先使用ASS工具来进行分析。ASS是oracle工程师编写的一个AWK脚本,用于分析systemstate dump文件,找出dump中可能存在问题的地方。通过ASS的输出结果,我们就可以发现一些blocker的线索,这些线索就是我们重点要查看的地方。 我们可以通过搜索SO的地址信息来定位某个SO,找到后分析这个SO的信息,并且通过PARENT SO的地址找到其PARENT,建立这些SO的关系图。比如我们找到一个SESSION的SO,就可以看看这个session属于哪个process,这个session正在执行的sql是什么,等等。通过这种分析,就把可能存在问题的SO及关联的SO全部找出来,这样就为进一步分析提供了素材。 1、标准的state object header(SO) state object header中包含了一些基本的信息,比如: SO: c00004ti4jierj,type: 2. owner: 0000000000,flag: init/-/-/0x00 其中SO是state object的号码; type表示state object的类别; TYPE: state object的已知类别: 2 process(进程) 3 call 4 session(会话信息) 5 enqueue(锁信息) 6 file infomation block(文件信息块,每个FIB标识一个文件) 11 broadcast handle(广播消息句柄) 12 KSV slave class state 13 ksvslvm 16 osp req holder(会话执行os操作的holder) 18 enqueue resource detail(锁资源详细资料) 19 ges message(ges消息) 20 namespace [ksxp] key 24 buffer [db buffer] 36 dml lock 37 temp table lock(临时表锁) 39 list of blocks(用于block cleanout的块列表清单) 40 transaction(事务) 41 dummy 44 sort segment handle(排序段句柄) 50 row cache enqueue 52 user lock 53 library cache lock 54 library cache pin 55 library cache load lock 59 cursor enqueue 61 process queue 62 queue reference 75 queue monitor sob owner是这个SO的父节点(如果为0,说明是最顶层的SO);flag表示状态,值有以下三种: kssoinit;state object被初始化了 kssoflst;state object在freelist上 kssofcln;state object已经被pmon释放了。 State object header的数据结构如下: struct kssob { unsigned charkssobtyp; /* state object的类别*/ unsigned charkssobflg; /* flags */ unsigned charkssobdelstage; struct kssob *kssobown; /*拥有者的SO指针*/ kgglkkssoblnk; /*在父对象成员链中的指针*/ } 2、processstate dump(ksupr) processstate dump转储了进程的状态,从这些信息中我们可以了解进程的基本属性以及进程的状态。 在阅读processstate dump时,我们主要关注的进程的标识(FLAG),从中也可以知道进程的类别。从”(latch info)”中可以看到进程等待latch的情况,这也有助于了解进程故障的原因。另外,进程的OS信息对于进一步了解进程情况也是很有帮助的。 实际上,x$ksupr包含了进程的信息,通过该内存视图可以更进一步了解processstate dump的内容。 ADDR地址 INDX序号 INST_ID实例ID KSSPAFLGstate object的状态: KSSOINIT 0x01 // state object initialized KSSOFLST 0x02 // state object is on free list KSSOFCLN 0x04 // state object freed by PMON(for debugging) KSSPAOWN该SO的OWNER,如果自己是顶层的SO,那么owner为0 KSUPRFLG该process的状态: KSUPDEAD 0x01process is dead and should be cleaned up KSUPDSYS 0x02detached,system process KSUPDFAT 0x04detached,fatal(system) process KSUPDCLN 0x08process is cleanup(pmon) KSUPDSMN 0x10process is smon KSUPDPSU 0x20pseudo process KSUPDMSS 0x40muti-stated server KSUPDDPC 0x80dispatcher process KSUPRSER进程的序号(SERIAL NO) KSUPRIOC KSLLALAQ持有的latch KSLLAWAT正在等待的latch KSLLAWHYlatch请求的上下文(用于debug) KSLLAWERlatch请求的位置(用于debug) KSLLASPN本进程正在spin的latch KSLLALOW所持有latch级别的位图(0~9级) KSLLAPSC进程发出的POST消息的计数 KSLLAPRC进程收到的POST消息的计数 KSLLAPRV收到的最后一个POST的LOC ID,参考图中的① KSLLAPSN最后一个发送POST的LOC ID,参考图中的② KSLLID1RRESOURCE ID的第一部分 KSLLID2RRESOURCE ID的第二部分 KSLLRTYPRESOURCE TYPE+RESOURCE FLAG KSLLRMTYRESOURCE MANAGE的类型: KRMENQ0x01enqueues KRMLATCH0x02latches KRMLIBCALK 0x03library cache locks KRMBUFLK0x04buffer locks KSLLARPO最后一个发送消息给这个进程的OS进程 KSLLASPO这个进程最后一个发送信息过去的OS进程 KSUPRPIDOS进程号 KSUPRWID等待事件的ID KSUPRUNMOS用户名 KSUPRMNM用户的机器名 KSUPRPNM用户程序名 KSUPRTID用户终端名 KSSRCOBJSTATE OBJECT RECOVERY数据中的正在被操作的对象 KSSRCFRESTATE OBJECT RECOVERY数据中的FREELIST的地址 KSSRCSRCSTATE OBJECT RECOVERY数据中的SOURCE PARENT KSSRCDSTSTATE OBJECT RECOVERY数据中的DESTINATION PARENT KSASTQNXMESSAGE STATE中的前向指针 KSASTQPRMESSAGE STATE中的后向指针 KSASTRPLMESSAGE STATE中的REPLY VALUE KSUPRPGPPROCESS GROUP的名字 KSUPRTFI进程的trace文件名 KSUPRPUMPGA使用的内存 KSUPRPNAM KSUPRPNAM+KSUPRPRAM是pga分配内存的总和 KSUPRPRAM KSUPRPFMpga可释放的内存 KSUPRPMMpga使用的最大内存 3、session state object 会话信息中包含了大量我们所需要的信息,一般来说会话状态块是我们分析会话情况的重点。 在会话状态信息中,flag是十分重要的,我们可以从flag中了解会话目前的情况,以及flag位图的详细信息。该会话正在执行的sql和pl/sql的SO地址可以让我们找到当前会话正在做的工作,有助于进一步 分析。另外,会话的等待事件和历史等待事件可以让用户了解会话在现在和过去一段时间里等待的情况,如果要分析会话故障原因的话,这些资料都是十分重要的。 flag的位图如下: KSUSFUSR0x00000001user session (as opposed to recursive session) KSUSFREC0x00000002recursive session(always internal) KSUSFAUD0x00000004audit logon/logoff,used by cleanup KSUSFDCO0x00000008disable commit/rollback from plsql KSUSFSYS0x00000010user session created by system processes KSUSFSGA0x00000020whether UGA is allocate in sga KSUSFLOG0x00000040whether user session logs on to ORACLE KSUSFMSS0x00000080user session created by multi-stated server KSUSFDIT0x00000100disable (defer) interrupt KSUSFCLC0x00000200counted for current license count decrement KSUSFDET0x00000400session has been detached KSUSFFEX0x00000800“forced exit”during shutdown normal KSUSFCAC0x00001000(cloned) session is cached KSUSFILS0x00002000default tx isolation level is serializable KSUSFOIL0x00004000override serializable for READ COMMITTED KSUSFIDL0x00008000idle session scheduler KSUSFSKP0x00010000SKIP unusable indexes maintenance KSUSFCDF0x00020000defer all deferrable constraint by default KSUSFCND0x00040000deferable constraints are immediate KSUSFIDT0x00080000session to be implicitly detached KSUSFTLA0x00100000transaction audit logged KSUSFJQR0x00200000recource checking in job q process enabled KSUSFMGS0x00400000session is migratable KSUSFGOD0x00800000migratable session need to get ownership id KSUSFSDS0x01000000suppress/enable TDSCNcomputations KSUSFMSP0x02000000parent of migratable session KSUSFMVC0x04000000MV container update progress KSUSFNAS0x08000000an NLS alter session call was done KSUSFTRU 0x10000000 a trusted callout was performed KSUSFHOA0x20000000an HO agent was called KSUSFSTZ0x40000000an alter session set time_zone was done KSUSFSRF0x80000000summary refresh 4、call state object Call state object是针对一个call的,我们查看call state object的时候一定要注意depth值,以此判断该call是用户调用还是递归调用。 5、enqueue state object 从enqueue state object中,我们主要可以查看锁的类型、锁的模式以及flag。 6、transaction dump Transaction dump对应的oracle内存结构是KTCXB,可以通过X$KTCXB来了解更详细的情况。 flag的描述如下(资料来源早期版本,针对10g可能略有不同): 1allocated but no transaction 2transaction active 4state object no longer valid 8transaction about to commit/abort 10space management transaction 20recursive transaction 40no undo logging 80no change/commit,must rollback 100use system undo segment (0) 200valid undo segment assigned 400undo seg assigned,lock acquired 800change may have been made 1000assigned undo seg 2000required lock in cleanup 4000is a pseudo space extent 8000save the tx table & tx ctl block 10000no read-only optimize for 2pc 20000multiple sess attached to this tx 40000commit scn future set 80000dependent scn future set 100000dist call failed,force rollback 200000remote uncoordinated ddl tx 400000coordinated global tx 800000pdml transaction 1000000next must be commit or rollback 2000000coordinator in pdml 4000000disable block level recovery 8000000library and/or row caches dirty 10000000serializable transaction 20000000waiting for unbound transaction 40000000loosely coupled transaction branch 80000000long-running transaction flag2的描述如下(资料来源早期版本,针对10g可能略有不同): 1tx needs refresh on commit 2delete performed in tx 4concurrency check enabled 8insert performed 10dir path insert performed 20fast rollback on net disconnect 40do not commit this tx 80this tx made remote change 100all read-only optim enabled 事务环境的结构如下: Struct ktcev { kenvktcevenv; kubaUBA的高水位; kubaktcevucl; sb2在undo高水位块中的剩余空间; kcbdsundo block的描述; kdbafrundo段头的DBA地址; kturt *指向undo seg的KTURT结构; 7、library object lock/handle library object lock如下: Flags的描述如下: KGLLKBRO0x0100this lock is broken KGLLKCBB0x0200this lock can be broken KGLLKPNC0x0400“kgllkpnc” is a valid pin for the call KGLLKPNS0x0800“kgllkpns” is a valid pin for the session KGLLKCGA0x1000this lock is in CGA memory KGLLKINH0x2000the instance lock is inherited KGLLKLRU0x4000lock protects an object on the session cache lru KGLLKKPC0x8000lock protects an object in the session keep cache KGLLKRES0x0010reserved lock preventing handle from being freed KGLLKCBK0x0020need to callback the client for delete/dump 作为library object的主体,handle的信息如图: 其中namespace的取值包括: CRSRcursor TABLtable/view/sequence/synonym BODYbody(e.g.,package body) TRGRtrigger INDXindex CLSTcluster KGLTinternal KGL testing PIPEpipe LOB lob DIRdirectory QUEUqueue OBJGreplication object group PROPreplication propagator JVSCjava source JVREjava resource ROBJreserved for server-side RepAPI REIPreplication internal package CPOBcontext policy object EVNTpub_sud internal information SUMMsummary DIMNdimension CTXapp context OUTLstored outlines TULSruleset objects RMGRresource manager XDBSxdb schema PPLNpending scheduler plan PCLSpending scheduler class SUBSsubscription information LOCSlocation information RMOBremote objects info RSMDRepAPI snapshot metadata JVSDjava shared data STFGfile group TRANStransformation RELCreplication ? log based child STRMstream:capture process in log-based replication REVCrule evaluation context STAPstream:apply process in log-based replication RELSsource inlog-based replication RELDdestination in log-based replication IFSDIFS schema XDBCXDB configuration management USAGuser agent mapping VOMDTABLmulti-versioned object for table JSQIscheduler-event queue info object CDCSchange set VOMDINDXmulti-versioned object for index STBOsql tuing base object HTSPhintsetobject JSGAscheduler global attributes JSETscheduler start time namespace TABL_Ttemporary table CLST_Ttemporary cluster INDX_Ttemporary index SCPDsratch pad JSLVscheduler job slave MODLmining models 状态标志位的取值: EXSexistent NEXno-existent LOC CRTbeing created ALTbeing altered DRPbeing dropped PRGbeing purged UPDbeing uodated RIVmarked for rolling invalidation NRCdon’t recover when an exclusive pin fails UDPdep being updated BOWbad owner of database link MEMhas frame memory associated with heap 0 REA protected with read-only access at least once NOAprotected with no access at least once 通过对library cache object/handle的分析,可以找到相关的sql以及cursor的状态。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |