加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

2017-03-27Oracle故障gc buffer busy acquire导致数据库不可用

发布时间:2020-12-12 15:07:57 所属栏目:百科 来源:网络整理
导读:实施反馈系统有20分钟不可用,然后又自动恢复了。先查看alert日志,看到打开文件数不够,系统已经运行几年了,怎么可能呢。 Non critical error ORA-48180 caught while writing to trace file "/u01/app/ora/diag/rdbms/nwzcdb/nwzcdb2/trace/nwzcdb2_ora_1
Begin Snap:1315824-3月 -17 09:00:2715317.52End Snap:1315924-3月 -17 10:00:38181010.02Elapsed:
60.18 (mins)


DB Time:
32,066.54 (mins)

11g开始gc buffer busy分为gc buffer busy acquire和gc buffer busy release:
gc buffer busy acquire是当session#1尝试请求访问远程实例(remote instance) buffer,但是在session#1之前已经有相同实例上另外一个session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy acquire。
gc buffer busy release是在session#1之前已经有远程实例的session#2请求访问了相同的buffer,并且没有完成,那么session#1等待gc buffer busy release。


gc buffer busy acquire288,206568.5K197230.5Clusterenq: TX - index contention11,889325K2733517.5Concurrencybuffer busy waits54,302217.9K401211.7Concurrencygc buffer busy release228,772200.8K87810.8Clustergc current grant busy251,30173.6K2934.0Clustergc current block congested109,35671.2K6513.8Clustergc cr block congested25,92269.2K26693.7Clustergc cr grant congested30,96742.1K13602.3Cluster
我认为可能是两个原因造成的:

1. 低效SQL,逻辑读过大,且访问频繁,造成争用严重。

2. 数据库IO资源紧张,导致一些频繁访问的SQL语句响应慢,造成gc buffer busy acquire,gc buffer busy release等待事件。

定位是否是原因1的问题,就找Segments by Global Cache Buffer Busy。然后根据对象的名称去找对应的SQL,然后查看SQL的执行计划定位问题。

Segments by Global Cache Buffer Busy

  • % of Capture shows % of GC Buffer Busy for each top segment compared
  • with GC Buffer Busy for all segments captured by the Snapshot
LCSC LCSC_DATAINDEX_LOG_UO_OPERATE_TIMEINDEX266,04836.39LCSCLCSC_DATASS_SECURITY_RESPONSIBILITY
TABLE112,42515.38DIS_TRANSFERP_0501TABLE PARTITION62,4608.54LCSC_DATAINDEX_LOG_FUN_OPER_DATE
INDEX27,8163.80P_050727,7753.80


定位是否是问题2造成,先查看数据库IO的整体情况,如果是RAC,多个节点都要看,因为RAC是共享存储,消耗IO总量是多个节点之和。如果如下图所示,相比数据库正常的时刻是非常大的。

如何判断是否是问题2影响了问题1,就查看问题1找到的SQL是否有消耗IO,如果有,则有影响。

IOStat by Function summary

  • 'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,P are in multiples of 1000
  • ordered by (Data Read + Write) desc

Direct Reads475.2G209.64134.9211.8G3.97.503M0
Buffer Cache Reads34G327.399.649M0M0.000M616.6K9.91Direct Writes5.8G1.721.646M22.6G33.856.418M
Others5G7.071.407M5.1G11.221.441M26.1K2.24DBWR0M0.010M8.6G217.962.44M2034.20LGWR153M2.73.042M4.5G214.761.292M679.9K1.03TOTAL:520.1G548.55147.66542.6G481.7612.094M1322.6K5.20

问题2的定位是通过segments by physical reads来找到相应的SQL。

Segments by Physical Reads

  • Total Physical Reads: 67,312,485
  • Captured Segments account for 81.6% of Total
Physical ReadsLCSCLCSC_DATAFUNCTION_LOCATIONP_DEFAULT_SUB_P_0502TABLE SUBPARTITION12,612,70618.74LCSCLCSC_DATAFUNCTION_LOCATIONP_DEFAULT_SUB_P_0501TABLE SUBPARTITION7,703,11111.44P_DEFAULT_SUB_P_05057,258,62610.78LCSCLCSC_DATAFUNCTION_LOCATIONP_DEFAULT_SUB_P_0503TABLE SUBPARTITION6,005,6578.92P_DEFAULT_SUB_P_05074,008,9895.96

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

实施反馈系统有20分钟不可用,然后又自动恢复了。先查看alert日志,看到打开文件数不够,系统已经运行几年了,怎么可能呢。

Non critical error ORA-48180 caught while writing to trace file "/u01/app/ora/diag/rdbms/nwzcdb/nwzcdb2/trace/nwzcdb2_ora_195339.trc"
Error message: Linux-x86_64 Error: 23: Too many open files in system

检查数据库服务器的配置,ulimit -a ,发现oracle hard nofile 65536,应该是足够大的。

查看问题时段的数据库报告,发现数据库过载了。


Snap Id Snap Time Sessions Cursors/Session Instances
Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
Owner Tablespace Name Object Name Subobject Name Obj. Type GC Buffer Busy % of Capture
Function Name Reads: Data Reqs per sec Data per sec Writes: Data Waits: Count Avg Tm(ms)
%Total
    推荐文章
      热点阅读