加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > Linux > 正文

linux – 数据库导入时LSI RAID控制器错误 – 如何排除故障?

发布时间:2020-12-13 17:25:08 所属栏目:Linux 来源:网络整理
导读:我们正在Oracle系统上运行数据库转储导入 – (RHEL 5.9,2.6.18-348.6.1.el5).导入未完成,最终错误输出: ORA-15080: synchronous I/O operation to a disk failedWARNING: failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 28
我们正在Oracle系统上运行数据库转储导入 – (RHEL 5.9,2.6.18-348.6.1.el5).导入未完成,最终错误输出:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986
Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc:
ORA-00345: redo log write error block 509314 count 2023
ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

环形缓冲区和/ var / log / messages中存在相应的错误:

包含导入的驱动器阵列是使用300GB 10k磁盘的RAID 1 0中的10磁盘SAS阵列. RAID控制器是LSI MegaRAID SAS 9260-8i.通过MegaCLI报告没有磁盘或适配器错误.

>这是硬件问题吗?
>有什么方法可以排除故障吗? RAID控制器状态很好.磁盘和逻辑驱动器报告正常.
>这是Linux操作系统还是调优问题?我将尝试使用不同的I / O调度程序. CFQ是默认的.

编辑:

其他调度程序已尝试使用相同的结果.此设置中有一个third-party (Vormetric) filesystem encryption module正在运行.删除它可以完成导入.所以现在我想知道这是模块中的缺陷还是它在LSI驱动程序中触发了一个坏的情况.

在导入期间,我们达到了14,000次写入IOPS.

在最近的尝试中,系统在控制台上完全停止以下操作.

冻结前的最后一个输出.

Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: megasas: build_ld_io error,sge_count = 51 Jun 12 18:54:42 db1-test kernel: megasas: Err returned from build_and_issue_cmd Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: timing out command,waited 360s Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: Unhandled error code Jun 12 18:54:42 db1-test kernel: sd 0:2:1:0: SCSI error: return code = 0x
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986
Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc:
ORA-00345: redo log write error block 509314 count 2023
ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
ORA-15080: synchronous I/O operation to a disk failed WARNING: failed to write mirror side 1 of virtual extent 248 logical extent 0 of file 280 in group 1 on disk 1 allocation unit 986 Errors in file /u01/app/oracle/diag/rdbms/dbprod/DBPROD/trace/DBPROD_lgwr_24520.trc: ORA-00345: redo log write error block 509314 count 2023 ORA-00312: online log 1 thread 1: '+DATA/dbprod/redo01.log' ORA-15081: failed to submit an I/O operation to a disk ORA-15081: failed to submit an I/O operation to a disk000
Jun 12 18:54:42 db1-test kernel: Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT,SUGGEST_OK

解决方法

最终 Sergey是对的 – 这是一个驱动程序问题.但是让我们先检查一下:

首先,您需要使用截止时间I / O调度程序而不是CFQ.顾名思义,截止日期确保所有IOP及时完成.

从megaraid卡中抓取事件:

megacli -adpeventlog -getevents -f /tmp/megaraid-$(date +%F_%T) -aALL

检查磁盘上的SMART数据(您需要构建一个新的smartmontools才能使其工作):

# megacli -pdlist -a0 |grep 'Device Id'
Device Id: 10
Device Id: 9

# smartctl -a /dev/sda -d megaraid,9
?…?
# smartctl -a /dev/sda -d megaraid,10
?…?

如果一切正常,请继续尝试latest driver from LSI.

There is a third-party (Vormetric) filesystem encryption module running in this setup. Removing it allows the import to complete. So now I’m wondering if this is a deficiency in the module or if it is triggering a bad condition in the LSI driver.

Voretric模块可能会做一些不兼容的事情,是的.我首先要与他们讨论他们的模块如何在高负载下拧紧系统.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读