诊断 Grid Infrastructure 启动问题 (文档 ID 1623340.1)
适用于:Oracle Database - Enterprise Edition - 版本 11.2.0.1 和更高版本 用途本文提供了诊断 11GR2 和 12C Grid Infrastructure 启动问题的方法。对于新安装的环境(root.sh 和 rootupgrade.sh 执行过程中)和有故障的旧环境都适用。针对 root.sh 的问题,我们可以参考?note 1053970.1?来获取更多的信息。 适用范围本文适用于集群/RAC数据库管理员和 Oracle 支持工程师。 详细信息
启动顺序:简而言之,操作系统负责启动 ohasd 进程,ohasd 进程启动 agents 用来启动守护进程(gipcd,mdnsd,gpnpd,ctssd,ocssd,crsd,evmd,asm …) ,crsd 启动 agents 用来启动用户资源(database,SCAN,Listener 等)。
集群状态
CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online $GRID_HOME/bin/crsctl stat res -t -init -------------------------------------------------------------------------------- NAME?????????? TARGET? STATE??????? SERVER?????????????????? STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm ????? 1??????? ONLINE? ONLINE?????? rac1????????????????? Started ora.crsd ????? 1??????? ONLINE? ONLINE?????? rac1 ora.cssd ????? 1??????? ONLINE? ONLINE?????? rac1 ora.cssdmonitor ????? 1??????? ONLINE? ONLINE?????? rac1 ora.ctssd ????? 1??????? ONLINE? ONLINE?????? rac1????????????????? OBSERVER ora.diskmon ????? 1??????? ONLINE? ONLINE?????? rac1 ora.drivers.acfs ????? 1??????? ONLINE? ONLINE?????? rac1 ora.evmd ????? 1??????? ONLINE? ONLINE?????? rac1 ora.gipcd ????? 1??????? ONLINE? ONLINE?????? rac1 ora.gpnpd ????? 1??????? ONLINE? ONLINE?????? rac1 ora.mdnsd ????? 1??????? ONLINE? ONLINE?????? rac1 对于11.2.0.2 和以上的版本,会有以下两个额外的进程: ora.cluster_interconnect.haip ????? 1??????? ONLINE? ONLINE?????? rac1 ora.crf ????? 1??????? ONLINE? ONLINE?????? rac1 对于11.2.0.3 以上的非EXADATA的系统,ora.diskmon会处于offline的状态,如下: 对于 12c 以上的版本,会出现ora.storage资源: ora.storage
?
问题 1: OHASD 无法启动
CRS-4000: Command Start failed,or completed with errors.
h1: 35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
root????? 2279???? 1? 0 18:14 ???????? 00:00:00 /bin/sh /etc/init.d/init.ohasd run 注意:Oracle Linux (OL6)以及 Red Hat Linux 6 (RHEL6) 已经不再支持 inittab 了,所以 init.ohasd 会被配置在 /etc/init 中,并被 /etc/init 启动,尽管如此,我们还是应该能看到进程 "/etc/init.d/init.ohasd run" 被启动; ?nohup ./init.ohasd run &
Feb 29 16:20:36 racnode1 logger: Could not access /var/opt/oracle/scls_scr/racnode1/root/ohasdstr
????? enable*) ??????? $LOGERR "Oracle HA daemon is enabled for autostart."
????? enable*) ??????? /bin/touch /tmp/ohasd.start."`date`" ??????? $LOGERR "Oracle HA daemon is enabled for autostart."
????? enable*) ??????? $LOGERR "Oracle HA daemon is enabled for autostart."
????? enable*) ??????? /bin/sleep 120 ??????? $LOGERR "Oracle HA daemon is enabled for autostart."
.. Jan 20 20:46:57 rac1 logger: exec /ocw/grid/perl/bin/perl -I/ocw/grid/perl/lib /ocw/grid/bin/crswrapexece.pl /ocw/grid/crs/install/s_crsconfig_rac1_env.txt /ocw/grid/bin/ohasd.bin "reboot"
-rw------- 1 root? oinstall 272756736 Feb? 2 18:20 rac1.olr
2010-01-24 22:59:10.470: [ default][1373676464] Initializing OLR 2010-01-24 22:59:10.472: [? OCROSD][1373676464]utopen:6m‘:failed in stat OCR file/disk /ocw/grid/cdata/rac1.olr,errno=2,os err string=No such file or directory 2010-01-24 22:59:10.472: [? OCROSD][1373676464]utopen:7:failed to open any OCR file/disk,os err string=No such file or directory 2010-01-24 22:59:10.473: [? OCRRAW][1373676464]proprinit: Could not open raw device 2010-01-24 22:59:10.473: [? OCRAPI][1373676464]a_init:16!: Backend init unsuccessful : [26] 2010-01-24 22:59:10.473: [? CRSOCR][1373676464] OCR context init failure.? Error: PROCL-26: Error while accessing the physical storage Operating System error [No such file or directory] [2] 2010-01-24 22:59:10.473: [ default][1373676464] OLR initalization failured,rc=26 2010-01-24 22:59:10.474: [ default][1373676464]Created alert : (:OHAS00106:) :? Failed to initialize Oracle Local Registry 2010-01-24 22:59:10.474: [ default][1373676464][PANIC] OHASD exiting; Could not init OLR
2010-01-24 23:01:46.275: [? OCROSD][1228334000]utread:3: Problem reading buffer 1907f000 buflen 4096 retval 0 phy_offset 102400 retry 5 2010-01-24 23:01:46.275: [? OCRRAW][1228334000]propriogid:1_1: Failed to read the whole bootblock. Assumes invalid format. 2010-01-24 23:01:46.275: [? OCRRAW][1228334000]proprioini: all disks are not OCR/OLR formatted 2010-01-24 23:01:46.275: [? OCRRAW][1228334000]proprinit: Could not open raw device 2010-01-24 23:01:46.275: [? OCRAPI][1228334000]a_init:16!: Backend init unsuccessful : [26] 2010-01-24 23:01:46.276: [? CRSOCR][1228334000] OCR context init failure.? Error: PROCL-26: Error while accessing the physical storage 2010-01-24 23:01:46.276: [ default][1228334000] OLR initalization failured,rc=26 2010-01-24 23:01:46.276: [ default][1228334000]Created alert : (:OHAS00106:) :? Failed to initialize Oracle Local Registry 2010-01-24 23:01:46.277: [ default][1228334000][PANIC] OHASD exiting; Could not init OLR
2010-11-07 03:00:08.932: [ default][1] Created alert : (:OHAS00102:) : OHASD is not running as privileged user 2010-11-07 03:00:08.932: [ default][1][PANIC] OHASD exiting: must be run as privileged user
2010-08-04 13:13:11.102: [?? CRSPE][35] Resources parsed 2010-08-04 13:13:11.103: [?? CRSPE][35] Server [] has been registered with the PE data model 2010-08-04 13:13:11.103: [?? CRSPE][35] STARTUPCMD_REQ = false: 2010-08-04 13:13:11.103: [?? CRSPE][35] Server [] has changed state from [Invalid/unitialized] to [VISIBLE] 2010-08-04 13:13:11.103: [? CRSOCR][31] Multi Write Batch processing... 2010-08-04 13:13:11.103: [ default][35] Dump State Starting ... .. 2010-08-04 13:13:11.112: [?? CRSPE][35] SERVERS: :VISIBLE:address{{Absolute|Node:0|Process:-1|Type:1}}; recovered state:VISIBLE. Assigned to no pool ------------- SERVER POOLS: Free [min:0][max:-1][importance:0] NO SERVERS ASSIGNED 2010-08-04 13:13:11.113: [?? CRSPE][35] Dumping ICE contents...:ICE operation count: 0 2010-08-04 13:13:11.113: [ default][35] Dump State Done.
2010-06-29 10:31:01.571: [? OCRSRV][1217390912] th_listen: CLSCLISTEN failed?clsc_ret= 3,addr= [(ADDRESS=(PROTOCOL=ipc)(KEY=procr_local_conn_0_PROL))] 2010-06-29 10:31:01.571: [? OCRSRV][3267002960]th_init: Local listener did not reach valid state
15058/1:???????? 0.1995 close(2147483646)?????????????????????????????? Err#9 EBADF 15058/1:???????? 0.1996 close(2147483645)?????????????????????????????? Err#9 EBADF ..
12. ohasd.bin 正常启动,但是,"crsctl check crs" 只显示以下一行信息: CRS-4638: Oracle High Availability Services is online并且命令 "crsctl stat res -p -init" 无法显示任何信息 这个问题是由于 OLR 损坏导致的,请参考?note 1193643.1?进行恢复。 13.?如果 ohasd 仍然无法启动,请参见 ohasd 的日志 <grid-home>/log/<nodename>/ohasd/ohasd.log 和 ohasdOUT.log 来获取更多的信息;
问题 2: OHASD Agents? 未启动
[ohasd(25303)] CRS-5828:Could not start agent?‘/ocw/grid/bin/orarootagent_grid‘. Details at (:CRSAGF00130:) {0:0:2} in /ocw/grid/log/racnode1/ohasd/ohasd.log. 2011-05-03 12:03:17.491: [??? AGFW][1117866336] {0:0:184} Created alert : (:CRSAGF00130:) :?? Failed to start the agent?/ocw/grid/bin/orarootagent_grid 2011-05-03 12:03:17.491: [??? AGFW][1117866336] {0:0:184} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.diskmon 1 1] ID 4098:403 2011-05-03 12:03:17.491: [??? AGFW][1117866336] {0:0:184} Can not stop the agent: /ocw/grid/bin/orarootagent_grid because pid is not initialized .. 2011-05-03 12:03:17.492: [?? CRSPE][1128372576] {0:0:184} Fatal Error from AGFW Proxy: Unable to start the agent process 2011-05-03 12:03:17.492: [?? CRSPE][1128372576] {0:0:184} CRS-2674: Start of ‘ora.diskmon‘ on ‘racnode1‘ failed .. 2011-06-27 22:34:57.805: [??? AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00123:) :?? Failed to start the agent?process: /ocw/grid/bin/cssdagent Category: -1 Operation: fail Loc: canexec2 OS error: 0 Other :? no exe permission,file [/ocw/grid/bin/cssdagent] 2011-06-27 22:34:57.805: [??? AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00126:) :? Agent start failed .. 2011-06-27 22:34:57.806: [??? AGFW][1131669824] {0:0:2} Created alert : (:CRSAGF00123:) :? Failed to start the agent process: /ocw/grid/bin/cssdmonitor Category: -1 Operation: fail Loc: canexec2 OS error: 0 Other : no exe permission,file [/ocw/grid/bin/cssdmonitor]
问题 3: OCSSD.BIN 无法启动
2010-02-02 18:00:16.263: [??? GPnP][408926240]clsgpnp_profileVerifyForCall: [at clsgpnp.c:1867] Result: (87) CLSGPNP_SIG_VALPEER. Profile verified.? prf=0x165160d0 2010-02-02 18:00:16.263: [??? GPnP][408926240]clsgpnp_profileGetSequenceRef: [at clsgpnp.c:841] Result: (0) CLSGPNP_OK. seq of p=0x165160d0 is ‘6‘=6 2010-02-02 18:00:16.263: [??? GPnP][408926240]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2186] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote "ipc://GPNPD_rac1" disco ""
2010-02-03 22:26:17.057: [??? GPnP][3852126240]clsgpnpm_connect: [at clsgpnpm.c:1101] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "ipc://GPNPD_rac1" 2010-02-03 22:26:17.057: [??? GPnP][3852126240]clsgpnp_getProfileEx: [at clsgpnp.c:546] Result: (13) CLSGPNP_NO_DAEMON. Can‘t get GPnP service profile from local GPnP daemon 2010-02-03 22:26:17.057: [ default][3852126240]Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running). 2010-02-03 22:26:17.057: [??? CSSD][3852126240] clsgpnp_getProfile failed,rc(13)
.. 2010-02-03 22:37:22.227: [??? CSSD][1145538880] clssnmvDiskVerify: Successful discovery of 0 disks 2010-02-03 22:37:22.227: [??? CSSD][1145538880]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery 2010-02-03 22:37:22.227: [??? CSSD][1145538880]clssnmvFindInitialConfigs: No voting files found 2010-02-03 22:37:22.228: [??? CSSD][1145538880]################################### 2010-02-03 22:37:22.228: [??? CSSD][1145538880]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2010-02-03 23:26:25.804: [GIPCGMOD][1206540320]gipcmodGipcPassInitializeNetwork: EXCEPTION[ ret gipcretFail (1) ]? failed to determine host from clsinet,using default .. 2010-02-03 23:26:25.810: [??? CSSD][1206540320]clsssclsnrsetup: gipcEndpoint failed,rc 39 2010-02-03 23:26:25.811: [??? CSSD][1206540320]clssnmOpenGIPCEndp: failed to listen on gipc addr gipc://rac1:nm_eotcs- ret 39 2010-02-03 23:26:25.811: [??? CSSD][1206540320]clssscmain:? failed to open gipc endp
2010-09-20 11:52:54.016: [??? CSSD][1078421824]clssgmWaitOnEventValue: after CmInfo State? val 3,eval 1 waited 0 ..? >>>>? after a long delay 2010-09-20 12:02:39.578: [??? CSSD][1103055168]clssnmvDHBValidateNCopy: node 1,has a disk HB,but no network HB,1037,LATS 328883434,lastSeqNo 1036,timestamp 1284980558/329930254 2010-09-20 12:02:39.895: [??? CSSD][1107286336]clssgmExecuteClientRequest: MAINT recvd from proc 2 (0xe1ad870) 2010-09-20 12:02:39.895: [??? CSSD][1107286336]clssgmShutDown: Received abortive shutdown request from client. 2010-09-20 12:02:39.895: [??? CSSD][1107286336]################################### 2010-09-20 12:02:39.895: [??? CSSD][1107286336]clssscExit: CSSD aborting from thread GMClientListener 2010-09-20 12:02:39.895: [??? CSSD][1107286336]###################################
racnode1??? 1 racnode1??? 0
2010-08-30 18:28:14.207: [??? CSSD][36]clssnm_skgxnmon: skgxn init failed 2010-08-30 18:28:14.208: [??? CSSD][36]################################### 2010-08-30 18:28:14.208: [??? CSSD][36]clssscExit:? CSSD signal 11 in thread skgxnmon
5.?在错误的 GRID_HOME 下执行命令"crsctl"? 2012-11-14 10:21:44.014: [??? CSSD][1086675264](:CSSNM00056:)clssnmvStartDiscovery: Terminating because of the release version(11.2.0.2.0) of this node being lesser than the active version(11.2.0.3.0) that the cluster is at 2012-11-14 10:21:44.014: [??? CSSD][1086675264]################################### 2012-11-14 10:21:44.014: [??? CSSD][1086675264]clssscExit: CSSD aborting from thread clssnmvDDiscThread# ? ?
问题 4: CRSD.BIN 无法启动
2010-02-03 22:37:51.638: [ CSSCLNT][1548456880]clsssInitNative: connect failed,rc 29 2010-02-03 22:37:51.639: [? CRSRTI][1548456880] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
[? OCRASM][2603807664]SLOS : SLOS: cat=7,opn=kgfoAl06,dep=15077,loc=kgfokge ORA-15077: could not locate ASM instance serving a required diskgroup 2010-02-03 22:22:55.189: [? OCRASM][2603807664]proprasmo: kgfoCheckMount returned [7] 2010-02-03 22:22:55.189: [? OCRASM][2603807664]proprasmo: The ASM instance is down 2010-02-03 22:22:55.190: [? OCRRAW][2603807664]proprioo: Failed to open [+GI]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2010-02-03 22:22:55.190: [? OCRRAW][2603807664]proprioo: No OCR/OLR devices are usable 2010-02-03 22:22:55.190: [? OCRASM][2603807664]proprasmcl: asmhandle is NULL 2010-02-03 22:22:55.190: [? OCRRAW][2603807664]proprinit: Could not open raw device 2010-02-03 22:22:55.190: [? OCRASM][2603807664]proprasmcl: asmhandle is NULL 2010-02-03 22:22:55.190: [? OCRAPI][2603807664]a_init:16!: Backend init unsuccessful : [26] 2010-02-03 22:22:55.190: [? CRSOCR][2603807664] OCR context init failure.? Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7,loc=kgfokge ORA-15077: could not locate ASM instance serving a required diskgroup ] [7] 2010-02-03 22:22:55.190: [??? CRSD][2603807664][PANIC] CRSD exiting: Could not init OCR,code: 26
2010-02-03 23:14:33.583: [? OCRRAW][2346668976]proprinit: Could not open raw device 2010-02-03 23:14:33.583: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:14:34.587: [? OCROSD][2346668976]utopen:6m‘:failed in stat OCR file/disk /share/storage/ocr,os err string=No such file or directory 2010-02-03 23:14:34.587: [? OCROSD][2346668976]utopen:7:failed to open any OCR file/disk,os err string=No such file or directory 2010-02-03 23:14:34.587: [? OCRRAW][2346668976]proprinit: Could not open raw device 2010-02-03 23:14:34.587: [ default][2346668976]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:14:35.589: [??? CRSD][2346668976][PANIC] CRSD exiting: OCR device cannot be initialized,error: 1:26
2010-02-03 23:19:39.429: [? OCRRAW][3360863152]propriogid:1_2: INVALID FORMAT 2010-02-03 23:19:39.429: [? OCRRAW][3360863152]proprioini: all disks are not OCR/OLR formatted 2010-02-03 23:19:39.429: [? OCRRAW][3360863152]proprinit: Could not open raw device 2010-02-03 23:19:39.429: [ default][3360863152]a_init:7!: Backend init unsuccessful : [26] 2010-02-03 23:19:40.432: [??? CRSD][3360863152][PANIC] CRSD exiting: OCR device cannot be initialized,error: 1:26
[? OCRASM][611467760]SLOS : SLOS: cat=7,dep=1031,loc=kgfokge ORA-01031: insufficient privileges 2010-03-10 11:45:12.528: [? OCRASM][611467760]proprasmo: kgfoCheckMount returned [7] 2010-03-10 11:45:12.529: [? OCRASM][611467760]proprasmo: The ASM instance is down 2010-03-10 11:45:12.529: [? OCRRAW][611467760]proprioo: Failed to open [+SYSTEMDG]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2010-03-10 11:45:12.529: [? OCRRAW][611467760]proprioo: No OCR/OLR devices are usable 2010-03-10 11:45:12.529: [? OCRASM][611467760]proprasmcl: asmhandle is NULL 2010-03-10 11:45:12.529: [? OCRRAW][611467760]proprinit: Could not open raw device 2010-03-10 11:45:12.529: [? OCRASM][611467760]proprasmcl: asmhandle is NULL 2010-03-10 11:45:12.529: [? OCRAPI][611467760]a_init:16!: Backend init unsuccessful : [26] 2010-03-10 11:45:12.530: [? CRSOCR][611467760] OCR context init failure.? Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7,loc=kgfokge ORA-01031: insufficient privileges ] [7]
[? OCRASM][3301265904]SLOS : SLOS: cat=7,dep=12547,loc=kgfokge 2012-03-04 21:34:23.139: [? OCRASM][3301265904]ASM Error Stack :? ORA-12547: TNS:lost contact 2012-03-04 21:34:23.633: [? OCRASM][3301265904]proprasmo: kgfoCheckMount returned [7] 2012-03-04 21:34:23.633: [? OCRASM][3301265904]proprasmo: The ASM instance is down 2012-03-04 21:34:23.634: [? OCRRAW][3301265904]proprioo: Failed to open [+OCR]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE. 2012-03-04 21:34:23.634: [? OCRRAW][3301265904]proprioo: No OCR/OLR devices are usable 2012-03-04 21:34:23.635: [? OCRASM][3301265904]proprasmcl: asmhandle is NULL 2012-03-04 21:34:23.636: [??? GIPC][3301265904] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690],original from [clsss.c : 5326] 2012-03-04 21:34:23.639: [ default][3301265904]clsvactversion:4: Retrieving Active Version from local storage. 2012-03-04 21:34:23.643: [? OCRRAW][3301265904]proprrepauto: The local OCR configuration matches with the configuration published by OCR Cache Writer. No repair required. 2012-03-04 21:34:23.645: [? OCRRAW][3301265904]proprinit: Could not open raw device 2012-03-04 21:34:23.646: [? OCRASM][3301265904]proprasmcl: asmhandle is NULL 2012-03-04 21:34:23.650: [? OCRAPI][3301265904]a_init:16!: Backend init unsuccessful : [26] 2012-03-04 21:34:23.651: [? CRSOCR][3301265904] OCR context init failure.? Error:? PROC-26: Error while accessing the physical storage ORA-12547: TNS:lost contact 2012-03-04 21:34:23.652: [ CRSMAIN][3301265904] Created alert : (:CRSD00111:) :? Could not init OCR,error: PROC-26: Error while accessing the physical storage ORA-12547: TNS:lost contact 2012-03-04 21:34:23.652: [??? CRSD][3301265904][PANIC] CRSD exiting: Could not init OCR,code: 26
[? OCRASM][18]SLOS : SLOS: cat=8,opn=kgfoOpenFile01,dep=15056,loc=kgfokge ORA-17503: ksfdopn:DGOpenFile05 Failed to open file +OCRMIR.255.4294967295 ORA-17503: ksfdopn:2 Failed to open file +OCRMIR.255.4294967295 ORA-15001: diskgroup "OCRMIR .. 2010-05-11 11:16:38.647: [? OCRASM][18]proprasmo: kgfoCheckMount returned [6] 2010-05-11 11:16:38.648: [? OCRASM][18]proprasmo: The ASM disk group OCRMIR is not found or not mounted 2010-05-11 11:16:38.648: [? OCRASM][18]proprasmdvch: Failed to open OCR location [+OCRMIR] error [26] 2010-05-11 11:16:38.648: [? OCRRAW][18]propriodvch: Error? [8] returned device check for [+OCRMIR] 2010-05-11 11:16:38.648: [? OCRRAW][18]dev_replace: non-master could not verify the new disk (8) [? OCRSRV][18]proath_invalidate_action: Failed to replace [+OCRMIR] [8] [? OCRAPI][18]procr_ctx_set_invalid_no_abort: ctx set to invalid .. 2010-05-11 11:16:46.587: [? OCRMAS][19]th_master:91: Comparing device hash ids between local and master failed 2010-05-11 11:16:46.587: [? OCRMAS][19]th_master:91 Local dev (1862408427,1028247821,0) 2010-05-11 11:16:46.587: [? OCRMAS][19]th_master:91 Master dev (1862408427,1859478705,0) 2010-05-11 11:16:46.587: [? OCRMAS][19]th_master:9: Shutdown CacheLocal. my hash ids don‘t match [? OCRAPI][19]procr_ctx_set_invalid_no_abort: ctx set to invalid [? OCRAPI][19]procr_ctx_set_invalid: aborting... 2010-05-11 11:16:46.587: [??? CRSD][19] Dump State Starting ...
.. 2010-02-14 17:41:57.927: [? clsdmt][1092499776]Creating PID [30269] file for home /ocw/grid host racnode1 bin crs to /ocw/grid/crs/init/ 2010-02-14 17:41:57.927: [? clsdmt][1092499776]Error3 -2 writing PID [30269] to the file [] 2010-02-14 17:41:57.927: [? clsdmt][1092499776]Failed to record pid for CRSD 2010-02-14 17:41:57.927: [? clsdmt][1092499776]Terminating process 2010-02-14 17:41:57.927: [ default][1092499776] CRSD exiting on stop request from clsdms_thdmai
2011-04-06 15:53:38.778: [ora.crsd][1160390976] [check] PID which will be monitored will be 1535?????????????????????????????? >> 1535 is output of "cat /ocw/grid/crs/init/racnode1.pid" 2011-04-06 15:53:38.965: [ COMMCRS][1191860544]clsc_connect: (0x2aaab400b0b0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CRSD)) [? clsdmc][1160390976]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=racnode1DBG_CRSD)) with status 9 2011-04-06 15:53:38.966: [ora.crsd][1160390976] [check] Error = error 9 encountered when connecting to CRSD 2011-04-06 15:53:39.023: [ora.crsd][1160390976] [check] Calling PID check for daemon 2011-04-06 15:53:39.023: [ora.crsd][1160390976] [check] Trying to check PID = 1535 2011-04-06 15:53:39.203: [ora.crsd][1160390976] [check] PID check returned ONLINE CLSDM returned OFFLINE 2011-04-06 15:53:39.203: [ora.crsd][1160390976] [check] DaemonAgent::check returned 5 2011-04-06 15:53:39.203: [??? AGFW][1160390976] check for resource: ora.crsd 1 1 completed with status: FAILED 2011-04-06 15:53:39.203: [??? AGFW][1170880832] ora.crsd 1 1 state changed from: UNKNOWN to: FAILED .. 2011-04-06 15:54:10.511: [??? AGFW][1167522112] ora.crsd 1 1 state changed from: UNKNOWN to: CLEANING .. 2011-04-06 15:54:10.513: [ora.crsd][1146542400] [clean] Trying to stop PID = 1535 .. 2011-04-06 15:54:11.514: [ora.crsd][1146542400] [clean] Trying to check PID = 1535
-rwxr-xr-x 1 ogrid oinstall 5 Feb 17 11:00 /ocw/grid/crs/init/racnode1.pid cat /ocw/grid/crs/init/*pid 1535 ps -ef| grep 1535 root????? 1535???? 1? 0 Mar30 ???????? 00:00:00 iscsid??????????? ?????? >> 注意:进程 1535 不是 crsd.bin
# $GRID_HOME/bin/crsctl stop res ora.crsd -init # $GRID_HOME/bin/crsctl start res ora.crsd -init
2010-02-03 23:34:28.428: [? OCRAPI][2235814832]clsu_get_private_ip_addresses: no ip addresses found. .. 2010-02-03 23:34:28.434: [? OCRAPI][2235814832]a_init:13!: Clusterware init unsuccessful : [44] 2010-02-03 23:34:28.434: [? CRSOCR][2235814832] OCR context init failure.? Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7] 2010-02-03 23:34:28.434: [??? CRSD][2235814832][PANIC] CRSD exiting: Could not init OCR,code: 44
2009-12-10 06:28:31.974: [? OCRMAS][20]th_master:11: Could not connect to the new master 2009-12-10 06:29:01.450: [ CRSMAIN][2] Policy Engine is not initialized yet! 2009-12-10 06:29:31.489: [ CRSMAIN][2] Policy Engine is not initialized yet!
问题 5: GPNPD.BIN 无法启动1.?网络的域名解析不正常 2010-05-13 12:48:11.540: [??? GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1015] ENTRY 2010-05-13 12:48:11.541: [??? GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1066] GIPC gipcretFail (1) gipcConnect(tcp-tcp://node2:9393) 2010-05-13 12:48:11.541: [??? GPnP][1171126592]clsgpnpm_connect: [at clsgpnpm.c:1067] Result: (48) CLSGPNP_COMM_ERR. Failed to connect to call url "tcp://node2:9393"
问题 6: 其它的一些守护进程无法启动常见原因: 2010-02-02 12:55:20.485: [? clsdmt][1110944064]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=rac1DBG_GIPCD))
2012-07-22 00:15:16.575: [??? CTSS][1]clsctss_r_av3:? Invalid active version [] retrieved from OLR. Returns [19]. 2012-07-22 00:15:16.585: [??? CTSS][1](:ctss_init16:):? Error [19] retrieving active version. Returns [19]. 2012-07-22 00:15:16.585: [??? CTSS][1]ctss_main: CTSS init failed [19] 2012-07-22 00:15:16.585: [??? CTSS][1]ctss_main:? CTSS daemon aborting?[19]. 2012-07-22 00:15:16.585: [??? CTSS][1]CTSS daemon aborting ?
问题 7: CRSD Agents 无法启动
问题 8: HAIP 无法启动HAIP 无法启动的原因有很多,例如: [ohasd(891)]CRS-2807:Resource ‘ora.cluster_interconnect.haip‘ failed to start automatically.请参见?note?1210883.1?获取更多关于 HAIP 的信息。
?
网络和域名解析的验证
日志文件位置,属主和权限
在 Grid Infrastructure 的环境中:我们假设一个 Grid Infrastructure 环境,节点名字为 rac1,CRS 的属主是 grid,并且有两个单独的 RDBMS 属主分别为: rdbmsap 和 rdbmsar,以下是 $GRID_HOME/log 中正常的设置情况: drwxrwxr-x 5 grid oinstall 4096 Dec? 6 09:20 log 请注意,绝大部分的子目录都继承了父目录的属主和权限,以上仅作为一个参考,来判断 CRS HOME 中是否有一些递归的权限和属主改变,如果您已经有一个相同版本的正在运行的工作节点,您可以把该运行的节点作为参考。
在 Oracle Restart 的环境中:这里显示了在?Oracle Restart?环境中 $GRID_HOME/log 目录下的权限和属主设置: drwxrwxr-x 5 grid oinstall 4096 Oct 31? 2009 log
网络socket文件的位置,属主和权限
2011-06-18 14:07:28.545: [? clsdmt][515]Fail to listen to (ADDRESS=(PROTOCOL=ipc)(KEY=lena042DBG_EVMD)) 2011-06-18 14:07:28.545: [? clsdmt][515]Terminating process 2011-06-18 14:07:28.559: [ default][515] EVMD exiting on stop request from clsdms_thdmai
CRS-2674: Start of ‘ora.evmd‘ on ‘racnode1‘ failed ..
在 Grid Infrastructure cluster 环境中:以下例子是集群环境中的例子: ./.oracle: drwxrwxrwt 2 root? oinstall 4096 Feb? 2 21:25 . srwxrwx--- 1 grid oinstall??? 0 Feb? 2 18:00 master_diskmon srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 mdnsd -rw-r--r-- 1 grid oinstall??? 5 Feb? 2 18:00 mdnsd.pid prw-r--r-- 1 root? root??????? 0 Feb? 2 13:33 npohasd srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 ora_gipc_GPNPD_rac1 -rw-r--r-- 1 grid oinstall??? 0 Feb? 2 13:34 ora_gipc_GPNPD_rac1_lock srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:39 s#11724.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:39 s#11724.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:39 s#11735.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:39 s#11735.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:45 s#12339.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 13:45 s#12339.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6275.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6275.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6276.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6276.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6278.1 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 s#6278.2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sAevm srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sCevm srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:01 sCRSD_IPC_SOCKET_11 srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:01 sCRSD_UI_SOCKET srwxrwxrwx 1 root? root??????? 0 Feb? 2 21:25 srac1DBG_CRSD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 srac1DBG_CSSD srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 srac1DBG_CTSSD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 srac1DBG_EVMD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 srac1DBG_GIPCD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 srac1DBG_GPNPD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 srac1DBG_MDNSD srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 srac1DBG_OHASD srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 sLISTENER srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 sLISTENER_SCAN2 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:01 sLISTENER_SCAN3 srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sOCSSD_LL_rac1_ srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sOCSSD_LL_rac1_eotcs -rw-r--r-- 1 grid oinstall??? 0 Feb? 2 18:00 sOCSSD_LL_rac1_eotcs_lock -rw-r--r-- 1 grid oinstall??? 0 Feb? 2 18:00 sOCSSD_LL_rac1__lock srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sOracle_CSS_LclLstnr_eotcs_1 -rw-r--r-- 1 grid oinstall??? 0 Feb? 2 18:00 sOracle_CSS_LclLstnr_eotcs_1_lock srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:01 sora_crsqs srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 sprocr_local_conn_0_PROC srwxrwxrwx 1 root? root??????? 0 Feb? 2 18:00 sprocr_local_conn_0_PROL srwxrwxrwx 1 grid oinstall??? 0 Feb? 2 18:00 sSYSTEM.evm.acceptor.auth ?
在 Oracle Restart 环境中:
以下是 Oracle Restart 环境中的输出例子: ./.oracle: srwxrwx--- 1 grid oinstall 0 Aug? 1 17:23 master_diskmon prw-r--r-- 1 grid oinstall 0 Oct 31? 2009 npohasd srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 s#14478.1 srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 s#14478.2 srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.1 srwxrwxrwx 1 grid oinstall 0 Jul 14 08:02 s#2266.2 srwxrwxrwx 1 grid oinstall 0 Jul? 7 10:59 s#2269.1 srwxrwxrwx 1 grid oinstall 0 Jul? 7 10:59 s#2269.2 srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.1 srwxrwxrwx 1 grid oinstall 0 Jul 31 22:10 s#2313.2 srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.1 srwxrwxrwx 1 grid oinstall 0 Jun 29 21:58 s#2851.2 srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sCRSD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 srac1DBG_CSSD srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 srac1DBG_OHASD srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sEXTPROC1521 srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sOCSSD_LL_rac1_ srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sOCSSD_LL_rac1_localhost -rw-r--r-- 1 grid oinstall 0 Aug? 1 17:23 sOCSSD_LL_rac1_localhost_lock -rw-r--r-- 1 grid oinstall 0 Aug? 1 17:23 sOCSSD_LL_rac1__lock srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sOHASD_IPC_SOCKET_11 srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sOHASD_UI_SOCKET srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sgrid_CSS_LclLstnr_localhost_1 -rw-r--r-- 1 grid oinstall 0 Aug? 1 17:23 sgrid_CSS_LclLstnr_localhost_1_lock srwxrwxrwx 1 grid oinstall 0 Aug? 1 17:23 sprocr_local_conn_0_PROL
诊断文件收集
参考 NOTE:1054902.1?- How to Validate Network and Name Resolution Setup for the Clusterware and RAC (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |