1、故障现象
(1)一个业务系统输入用户名与密码后无法进入首页,表现为一直在运行等待,运行缓慢
(2)整个系统无法正常使用,接近停运状态
2、 故障解决方法
调整数据库参数alter system setevent='10949 trace name context forever,level 1'来关闭“direct path read”(直接路径读)特性,使SQL语句可以从缓存中查询数据,达到降低I/O读取量,使全表扫描的数据从缓存中读取,加快SQL语句运行速度的目的。
3、 故障原因总结
(1)由于部分SQL语句设计或编写效率低下,以及表缺少适应的索引,导致SQL语句需要全表扫描,在表较小时,ORACLE数据库将数据读取到缓存后,后续虽然是全表扫描,但均是从缓存中读取,所以问题未体现出来
(2)在表的大小不断增大后,根据ORACLE 11g数据库的算法,在表达到db_cache_size(GB)的2%(默认值)以后,认为采用直接路径读(跳过缓存,直接从磁盘文件中全扫描读取)
(3)DX_T_XXVIATE表大小为1GB,在大量反复以direct pathread磁盘重复读取的情况下,消耗大量的I/O资源,将服务器I/O几乎耗尽
(4)在主机I/O耗尽的情况下,系统的读、写,均几乎处于瘫痪状态
(5)在关闭ORACLE 11G数据库的direct path read新特性功能后,读取方式恢复到从缓存中读取,磁盘读降到“0”,系统恢复正常
4、 改进建议
(1)优化访问DX_T_XXVIATE 相关的SQL语句与设计合适的索引,避免大表全表扫描。
5、 故障原因分析
5.1 7月11日故障时段数据库服务器I/O等待严重
5.2 7月11日故障时段磁盘响应非常缓慢
5.3 对比故障当日(7月11日)与上周的I/O磁盘读取量,比上周大十倍
故障前、中、后磁盘读取量对比图:
|
上面高的蓝色线,是故障当日(2016年7月1日,周一)的磁盘Disk Read KB/s指标线
5.4 高度消耗I/O的SQL语句。
SQL_ID为b8m6wy846qgbk的SQL语句,physical reads鹤立鸡群,可见此SQL语句的影响最为严重。
5.5 全表扫描单次超过6秒的表与其SQL语句统计。
统计汇总时间:08:00—10:00
SQL_ID
target
出现次数
b8m6wy846qgbk
DX_T_XXVIATE
22
6ut0tkd7hvagc
DX_T_XXVIATE
6
1wq3kmd62x4x8
DX_T_XXVIATE
2
b5ma9qfs62bdm
DX_T_XXVIATE
2
f5hujtmsh3pd1
DX_T_XXVIATE
2
65wk2453uk930
DX_T_XXVIATE
2
6yxk40mn2rnzj
DX_T_XXVIATE
2
8h1bzhfm6purf
DX_T_XXVIATE
2
a1r7xr03naax2
DX_T_XXVIATE
2
6yn7ccy6bbw4h
DX_T_XXVIATE
2
6ar757tgxd1sf
DX_T_XXVIATE
2
gu8cjd8c35kys
DX_T_XXVIATE
2
41j3xyugbn6ck
DX_T_XXVIATE
2
98jagc6q2zr4x
DX_T_XXVIATE
2
1a4rwwt36w33d
DX_T_XXVIATE
2
b4bq40ybpt7a8
DX_T_XXVIATE
1
30za43khyxz6t
DX_T_XXVIATE
1
66j38hs9a0yzh
DX_T_XXVIATE
1
25djwmppxq9qt
DX_T_XXVIATE
1
84pdvmjyb9103
DX_T_XXVIATE
1
b69frxfps0k4t
DX_T_XXVIATE
1
统计时间:08:00—10:00 单次扫描超过6秒的SQL语句及时长详细清单:
SQL_ID
target
elapsed_seconds
start_time
f5hujtmsh3pd1
DX_T_XXVIATE
24
2016/7/11 10:02
f5hujtmsh3pd1
DX_T_XXVIATE
26
2016/7/11 10:02
66j38hs9a0yzh
DX_T_XXVIATE
28
2016/7/11 9:51
b8m6wy846qgbk
DX_T_XXVIATE
19
2016/7/11 9:48
b4bq40ybpt7a8
DX_T_XXVIATE
30
2016/7/11 9:41
6ut0tkd7hvagc
DX_T_XXVIATE
23
2016/7/11 9:38
b8m6wy846qgbk
DX_T_XXVIATE
27
2016/7/11 9:37
b8m6wy846qgbk
DX_T_XXVIATE
25
2016/7/11 9:35
b8m6wy846qgbk
DX_T_XXVIATE
28
2016/7/11 9:35
b8m6wy846qgbk
DX_T_XXVIATE
37
2016/7/11 9:34
65wk2453uk930
DX_T_XXVIATE
38
2016/7/11 9:34
8h1bzhfm6purf
DX_T_XXVIATE
31
2016/7/11 9:33
65wk2453uk930
DX_T_XXVIATE
34
2016/7/11 9:33
8h1bzhfm6purf
DX_T_XXVIATE
30
2016/7/11 9:33
b8m6wy846qgbk
DX_T_XXVIATE
32
2016/7/11 9:32
b8m6wy846qgbk
DX_T_XXVIATE
33
2016/7/11 9:32
b8m6wy846qgbk
DX_T_XXVIATE
30
2016/7/11 9:32
b8m6wy846qgbk
DX_T_XXVIATE
31
2016/7/11 9:32
b8m6wy846qgbk
DX_T_XXVIATE
31
2016/7/11 9:31
b8m6wy846qgbk
DX_T_XXVIATE
45
2016/7/11 9:30
84pdvmjyb9103
DX_T_XXVIATE
39
2016/7/11 9:29
b8m6wy846qgbk
DX_T_XXVIATE
39
2016/7/11 9:25
6ut0tkd7hvagc
DX_T_XXVIATE
33
2016/7/11 9:24
b8m6wy846qgbk
DX_T_XXVIATE
36
2016/7/11 9:21
6yxk40mn2rnzj
DX_T_XXVIATE
40
2016/7/11 9:21
6yxk40mn2rnzj
DX_T_XXVIATE
31
2016/7/11 9:20
b5ma9qfs62bdm
DX_T_XXVIATE
36
2016/7/11 9:19
b5ma9qfs62bdm
DX_T_XXVIATE
39
2016/7/11 9:19
b8m6wy846qgbk
DX_T_XXVIATE
98
2016/7/11 9:10
6ut0tkd7hvagc
DX_T_XXVIATE
41
2016/7/11 9:09
6ut0tkd7hvagc
DX_T_XXVIATE
41
2016/7/11 9:08
gu8cjd8c35kys
DX_T_XXVIATE
48
2016/7/11 9:06
gu8cjd8c35kys
DX_T_XXVIATE
47
2016/7/11 9:05
30za43khyxz6t
DX_T_XXVIATE
96
2016/7/11 9:00
b69frxfps0k4t
DX_T_XXVIATE
77
2016/7/11 8:59
6ut0tkd7hvagc
DX_T_XXVIATE
26
2016/7/11 8:48
98jagc6q2zr4x
DX_T_XXVIATE
23
2016/7/11 8:46
98jagc6q2zr4x
DX_T_XXVIATE
22
2016/7/11 8:46
6ar757tgxd1sf
DX_T_XXVIATE
28
2016/7/11 8:44
6ar757tgxd1sf
DX_T_XXVIATE
23
2016/7/11 8:44
6ut0tkd7hvagc
DX_T_XXVIATE
22
2016/7/11 8:43
a1r7xr03naax2
DX_T_XXVIATE
19
2016/7/11 8:42
a1r7xr03naax2
DX_T_XXVIATE
23
2016/7/11 8:42
b8m6wy846qgbk
DX_T_XXVIATE
25
2016/7/11 8:40
1wq3kmd62x4x8
DX_T_XXVIATE
23
2016/7/11 8:39
1wq3kmd62x4x8
DX_T_XXVIATE
21
2016/7/11 8:39
b8m6wy846qgbk
DX_T_XXVIATE
45
2016/7/11 8:30
b8m6wy846qgbk
DX_T_XXVIATE
41
2016/7/11 8:30
b8m6wy846qgbk
DX_T_XXVIATE
27
2016/7/11 8:27
41j3xyugbn6ck
DX_T_XXVIATE
29
2016/7/11 8:27
41j3xyugbn6ck
DX_T_XXVIATE
48
2016/7/11 8:26
1a4rwwt36w33d
DX_T_XXVIATE
53
2016/7/11 8:21
1a4rwwt36w33d
DX_T_XXVIATE
54
2016/7/11 8:20
b8m6wy846qgbk
DX_T_XXVIATE
42
2016/7/11 8:19
b8m6wy846qgbk
DX_T_XXVIATE
52
2016/7/11 8:18
b8m6wy846qgbk
DX_T_XXVIATE
54
2016/7/11 8:16
6yn7ccy6bbw4h
DX_T_XXVIATE
118
2016/7/11 8:14
6yn7ccy6bbw4h
DX_T_XXVIATE
81
2016/7/11 8:12
b8m6wy846qgbk
DX_T_XXVIATE
84
2016/7/11 8:10
25djwmppxq9qt
DX_T_XXVIATE
49
2016/7/11 8:09
上面数据显示,08:00—10:00统计时间内,所有全表扫描超过6秒的表,全部是DX_T_XXVIATE这一张表,涉及到的SQL语句有60多条,执行次数最多的数SQL_ID为b8m6wy846qgbk的语句。
5.6 全表扫描最严重SQL语句故障前、后、故障解决后磁盘读取数量对比
5.6.1 7月11日以前系统运行正常的情况下SQL_ID为b8m6wy846qgbk的语句执行统计信息
--执行统计信息(buffer get很大,但是disk reads为0,判定数据基本从buffer中读取):
--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒):
5.6.2 7月11日故障当日SQL_ID为b8m6wy846qgbk的语句执行统计信息
--执行统计信息(buffer get和disk reads都一样的巨大,基本判定每次数据全是从磁盘读取到BUFFER):
--执行计划(对DT_T_OBVIATE全表扫描,预计时间为5分钟30秒,从执行计划的PHV和plan均看出执行计划在系统故障时与正常时,是保持一致的):
5.6.3 故障解决后(取7月12日数据)SQL_ID为b8m6wy846qgbk的语句执行统计信息
--执行统计信息(故障解决后,PVH值不变,Disk Reads又恢复到了故障前的“0”,说明每次执行数据又是从BUFFER中读取的):
5.7 等待事件变化识别数据读取方式变化比较
SQL_ID
日期
direct path read
发生次数
b8m6wy846qgbk
2016/7/12
0
b8m6wy846qgbk
2016/7/11
4141
b8m6wy846qgbk
2016/7/10
663
b8m6wy846qgbk
2016/7/9
0
b8m6wy846qgbk
2016/7/8
0
b8m6wy846qgbk
2016/7/7
0
看来,系统实际上在2016年7月10日(周日),SQL语句的数据读取方式就发生了少量的direct path read,系统实际上已经处于间歇式缓慢状态,到了2016年7月11日(周一),问题特别严重,约99%左右的执行是direct path read,导致I/O耗尽,系统瘫痪。
本文作者:黎俊杰(网名:踩点),从事”系统架构、操作系统、存储设备、数据库、中间件、应用程序“六个层面系统性的性能优化工作
欢迎加入 系统性能优化专业群,共同探讨性能优化技术。群号:258187244
(编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!