NBUversion:7.5
MediaServer:WindowsServer 2008R2
备份内容:SQLServer 数据
带库: IBM3584
在activity monitor中显示如下
|
Info nbjm(pid=7004) started backup (backupid=xxxx_1379096131) job for client xxxx,policy centralDWH,schedule full on storage unit xxxx-hcart2-robot-tld-0
9/14/2013 2:15:33 AM - started process bpbrm (14008)
9/14/2013 2:15:34 AM - connecting
9/14/2013 2:15:34 AM - connected; connect time: 00:00:00
9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)
9/14/2013 2:20:15 AM - Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up
9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.
9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.
9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.
9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.
9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:Program FilesVeritasNetBackupDbExtMsSqlcentralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.
9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - Results of executing <C:Program FilesVeritasNetBackupDbExtMsSqlcentralDWH.bch>:
9/14/2013 2:20:49 AM - Info dbclient(pid=18520) <0> operations succeeded. <1> operations failed.
9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - The following object(s) were not backed up successfully.
9/14/2013 2:20:49 AM - Info dbclient(pid=18520) INF - CentralDWH
同时间SQLserver log
Date
Source
Severity
Message
09/14/2013 02:20:15
Backup
Unknown
BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.
09/14/2013 02:20:15
Backup
Unknown
Error: 3041
Severity: 16
State: 1.
09/14/2013 02:04:57
Backup
Unknown
BACKUP failed to complete the command BACKUP DATABASE CentralDWH. Check the backup application log for detailed messages.
09/14/2013 02:04:57
Backup
Unknown
Error: 3041
Severity: 16
State: 1.
问题分析:
首先日志内容中
Error bpbrm(pid=14008) from client xxxx: ERR - command failed: none of the requested files were backed up (2)
Error bpbrm(pid=14008) from client xxxx: ERR - bphdb exit status = 2: none of the requested files were backed up
说明bch脚本运行失败,并没有找到数据库中需要备份的文件
然后这部分
9/14/2013 2:20:41 AM - Info dbclient(pid=18520) ERR - Error in GetConfiguration: 0x80770003.
9/14/2013 2:20:41 AM - Info dbclient(pid=18520) CONTINUATION: - The api was waiting and the timeout interval had elapsed.
9/14/2013 2:20:46 AM - Info dbclient(pid=18520) ERR - Error in VDS->Close: 0x80770004.
9/14/2013 2:20:47 AM - Info dbclient(pid=18520) CONTINUATION: - An abort request is preventing anything except termination actions.
9/14/2013 2:20:47 AM - Info dbclient(pid=18520) INF - OPERATION #1 of batch C:Program FilesVeritasNetBackupDbExtMsSqlcentralDWH.bch FAILED with STATUS 1 (0 is normal). Elapsed time = 310(310) seconds.
说明nbu连接vdi超时,一般vdi默认是300秒,因为没有请求到数据库的文件,所以脚本300秒后超时,vdi报错,与此同时在windows server日志中有一条error也记录这个信息:
SQLVDI: Loc=SignalAbort. Desc=Client initiates abort
既然脚本没执行就检查了一下bch脚本,并没有发现什么问题,然后手动重新运行了一下这个policy,NBU又报错了,不过这次不是脚本问题
INF - Created VDI object for SQL Server instance <xxxx>. Connection timeout is <300> seconds.
ERR - Error in GetConfiguration: 0x80770003.
在创建vdi后,等了300秒,又出现了Error in GetConfiguration 0x80770003,看来是创建vdi object出了问题,应该是nbu client调用SQLVDI.DLL来创建。
接下来看看dbclient log,这个日志必须在nerbackuplog下新建一个dbclient文件夹才会有:
<2> logconnections: BPRD CONNECT FROM media-ip.62961 TO master-ip.1556 fd = 1268
<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_34284_37776_1>,SQL userid <sa> handle <0x0080d1b0>.
<4> CDBbackrec::InitDeviceSet(): INF - Created VDI object for SQL Server instance <instance>. Connection timeout is <300> seconds.------可以看到这里创建vdi了
<2> vnet_pbxConnect: pbxConnectEx Succeeded
<2> logconnections: BPRD CONNECT FROM media-ip.62962 TO master-ip.1556 fd = 1396
<2> logconnections: BPRD CONNECT FROM media-ip.62963 TO master-ip.1556 fd = 952
<4> CGlobalInformation::VCSVirtualNameList: INF - Veritas Cluster Server is not installed.---这里显示没有安装veritas集群
<1> CGlobalInformation::VCSVirtualNameList: CONTINUATION: - The system cannot find the path specified. ------找不到路径
<4> getServerName: Read server name from nb_master_config: xxxxx
<4> CDBIniParms::CDBIniParms: INF - NT User is Administrator
<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>,SQL userid <sa> handle <0x0065acf0>.----sa0x0065acf0 登录 SQLserver
<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>,SQL userid <sa> handle <0x0065c260>.----sa0x0065c260 登录 SQLserver
<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxxinstance> has been made using Trusted security with DSN <NBMSSQL_temp_23736_9600_1> using standard userid <sa>.
<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c260>---s0 退出
<4> DBConnect: INF - Logging into SQL Server with DSN <NBMSSQL_temp_23736_9600_1>,SQL userid <sa> handle <0x0065c690>. 又一个sa登录
<4> DBDisconnect: INF - Logging out of SQL Server with handle <0x0065c690> 紧接着退出
<4> SQLEnumerator: INF - Enumerated SQL hosts: SERVER:Server={BJDSQLCLUSTERinstance};UID:Login ID=?;PWD:Password=?;Trusted_Connection:Use Integrated Security=?;*APP:AppName=?;*WSID:WorkStation ID=?
01:17:34.156 [23736.9600] <4> SQLEnumerator: INF - Could not enumerate Local SQL host/instance using SQLBrowseConnectW ---无法使用SQLBrowseConnect枚举出sql本地主机和实例,这个SQLBrowseConnect用来发现和枚举连接数据库所需要值(主机名实例名等)
<4> CGlobalInformation::SQLEnumerator: INF - Hosts and instances retrieved from host list string
<4> CGlobalInformation::SQLEnumerator: INF - host: mediaserver
<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxx
<4> CGlobalInformation::SQLEnumerator: INF - host: BJDSQLCLUSTER
<4> CGlobalInformation::SQLEnumerator: INF - instance: xxxxx
<4> CGlobalInformation::CreateDSN: INF - A successful connection to SQL Server <xxxxinstance> has been made using Trusted security with DSN <NBMSSQL_23736_9600_2> using standard userid <sa>.----从host list中发现了主机名和实例,并成功连接,至此说明nbu client 连接到了数据库实例,接下来看看为什么没有备份成功
-------------------------------------------------------分割线--------------------------------------------
<4> StartupProcess: INF - Starting: <C:Program FilesVeritasNetBackupbinadmincmdbppllist.exe -byclient mediaserver>
中间又是一堆登录信息,并成功连接到数据库,这里省略
<4> getServerName: Read server name from nb_master_config: masterserver
<2> vnet_pbxConnect: pbxConnectEx Succeeded
<2> logconnections: BPRD CONNECT FROM media-ip.62996 TO master-ip.1556 fd = 960 --media的bprd连接master
<16> writeToServer: ERR - send() to server on socket failed: 发送socket失败
<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket
<16> CDBbackrec::InitDeviceSet_Part2(): ERR - Error in GetConfiguration: 0x80770003.这里报错和activity monitor里一样了
01:22:09.551 <1> CDBbackrec::InitDeviceSet_Part2(): CONTINUATION: - The api was waiting and the timeout interval had elapsed.
<2> logconnections: BPRD CONNECT FROM media-ip.63001 TO master-ip.1556 fd = 1400
01:22:09.703 <4> KillAllThreads: INF - Killing group #0
01:22:09.704 [34284.33648] <4> KillAllThreads: INF - Killing group #0
01:22:09.704 <4> KillAllThreads: INF - Issuing SignalAbort to MS SQL Server VDI --windows中看到的消息
01:22:09.704 [34284.33416] <4> KillAllThreads: INF - Killing group #0
01:22:09.704 [34284.32560] <4> KillAllThreads: INF - Killing group #0
01:22:12.709 <2> vnet_pbxConnect: pbxConnectEx Succeeded
01:22:12.710 <2> logconnections: BPRD CONNECT FROM media-ip.63002 TO master-ip.1556 fd = 1276
01:22:14.546 <16> writeToServer: ERR - send() to server on socket failed:
<16> dbc_RemoteWriteFile: ERR - could not write progress status message to the NAME socket
<16> CDBbackrec::FreeDeviceSet(): ERR - Error in VDS->Close: 0x80770004.
看来故障原因是bprd 无法将进程状态写入name socket,导致 mediaserver和masterserver通信失败,从而导致vdi超时。
http://www.symantec.com/business/support/index?page=content&id=TECH182435
这里说 7.1版本中如果dbc_RemoteWriteFile- RemoteWriteFile status = 0状态为0可以忽略,下个版本中会解决,但是我是7.5,似乎不是这个问题。
http://www.symantec.com/docs/TECH146444 这篇文章提到sqlserver 某个补丁更新了SQLVDI.DLL,导致备份失败。也不是我的问题
http://www.symantec.com/connect/forums/having-problem-mssql-agent-backup这篇里提到2个方法
1删除进程dbbackex.exe,2增加Client Connect 时间即 Client Read Timeout,可以在bch脚本增加VDITIMEOUTSECONDS XXXX(关于这个参数查阅NetBackup for Microsoft SQL Server Administrator’s Guide)来设置nbu与VDI连接超时的时间。
注意:
Before running another backup,ensure the following log folders exist on media server:
bptm and bpbrm.
If backup still fails after increasing media server timeouts,please check a new set of logs:
dbclient on SQL client,bptm and bpbrm on media server.
解决方案
在脚本中加入了VDITIMEOUTSECONDS 1800后,手动备份成功
备注:
关于错误代码0x80770003和0x80770004在http://www.sqlbackuprestore.com/vdierrors.htm里有关于vdi的错误信息的详细解释
0x80770003 (-2139684861)
The api was waiting and the timeout interval had elapsed.
Similar to the above example,this can happen when the backup application has waited a set amount of time waiting for SQL Server to respond to its backup request,but did not receive any response.
0x80770004 (-2139684860)
An abort request is preventing anything except termination actions.
An example of this error is when the backup software has encountered a critical error,and has issued an abort request to the VDI.
一篇不错的文档:关于如何在SQLserver上对NBU排错
http://www.symantec.com/business/support/index?page=content&id=TECH38369
后记
备份流程 nbu策略--nbu备份脚本--mediaserverVDI---mediaserverDBProcess
mediaserver调用本地脚本,通过vdi和sqlserver里的一组备份进程通信,每个备份的数据库对应3个进程,备份完成后进程应该销毁,并通过vdi通知mediaserver,然后mediserver完成备份。
当sqlserver备份进程在N秒(N是脚本里的超时时间)内不能完成备份,不能通过vdi通知mediaserver,nbu认为备份失败。那么第二次备份时,进程依然存在的话,备份仍会失败。
造成备份很慢的情况可能是sqlserver服务器性能过低,导致进程运行缓慢。
思考
应该增加sqlserver的性能
(编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!