今天在将pg_dump压缩过的dump文件,通过pg_restore 导入到测试库时,中途异常中断,造成测试库Server
donw 机,而且之后数据无法启动。
-- 数据库异常down机的 csvlog 2010-09-17 17:28:03.943 CST,"mydb",23936,"192.168.1.25:58855",4c9334a3.5d80,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,FATAL,57P03,"the database system is in recovery mode", 2010-09-17 17:28:03.944 CST,23937,"192.168.1.25:58856",4c9334a3.5d81, 2010-09-17 17:28:03.954 CST,23938,"192.168.1.25:58857",4c9334a3.5d82, 2010-09-17 17:28:03.955 CST,23939,"192.168.1.25:58858",4c9334a3.5d83, 2010-09-17 17:28:03.956 CST,23940,"192.168.1.25:58859",4c9334a3.5d84, 2010-09-17 17:28:04.032 CST,32240,4c932f7d.7df0,9,2010-09-17 17:06:05 CST,53100,"could not write to file ""pg_xlog/xlogtemp.32240"": No space left on device", 2010-09-17 17:28:04.098 CST,946,4c9321d4.3b2,5,2010-09-17 16:07:48 CST,LOG,00000,"startup process (PID 32240) exited with exit code 1",6,"aborting startup due to startup process failure", 从数据库down机前的 csvlog日志来看,down机前数据库SERVER 处于 recovery mode,说明pg_restore时 SERVER此时有异常, 之后尝试重新起动Server,命令如下 [postgres@PG1 bin]$ pg_ctl -D $PGDATA start server starting 虽然显示'server starting',但数据库并没有真正起来,因为这时数据库根本无法连接,接着查看 csvlog --数据库启动异常时 csvlog 2010-09-17 17:36:36.660 CST,24245,"192.168.169.42:42566",4c9336a4.5eb5,2010-09-17 17:36:36 CST,"the database system is starting up", 2010-09-17 17:36:36.662 CST,24247,"192.168.169.42:42567",4c9336a4.5eb7, 2010-09-17 17:36:36.663 CST,24246,"192.168.1.25:33223",4c9336a4.5eb6,24248,"192.168.169.42:42568",4c9336a4.5eb8, 2010-09-17 17:36:36.665 CST,24249,"192.168.169.42:42569",4c9336a4.5eb9, 2010-09-17 17:36:36.666 CST,24163,4c9336a3.5e63,2010-09-17 17:36:35 CST,"startup process (PID 24165) exited with exit code 1",2,
从上面日志来看,可以看出Server 数据库正在启动(the database system is starting up"),但是到后面就异常中上了; 而且也没有多余的信息,由于事情紧迫,随即向师傅请教,师傅说数据库需要恢复,可以恢复到具体的时间点。以下是解决过程。 --解决过程 [postgres@PG1 pg_log]$ cd $PGHOME [postgres@PG1 pgsql]$ ll total 16K drwxr-xr-x 2 postgres postgres 4.0K Sep 2 18:43 bin drwxr-xr-x 6 postgres postgres 4.0K Aug 26 16:59 include drwxr-xr-x 3 postgres postgres 4.0K Sep 3 18:55 lib drwxr-xr-x 8 postgres postgres 4.0K Sep 1 17:27 share [postgres@PG1 pgsql]$ cd bin [postgres@PG1 bin]$ ll total 8.1M -rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 clusterdb -rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 createdb -rwxr-xr-x 1 postgres postgres 58K Aug 26 16:59 createlang -rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 createuser -rwxr-xr-x 1 postgres postgres 50K Aug 26 16:59 dropdb -rwxr-xr-x 1 postgres postgres 58K Aug 26 16:59 droplang -rwxr-xr-x 1 postgres postgres 50K Aug 26 16:59 dropuser -rwxr-xr-x 1 postgres postgres 616K Aug 26 16:59 ecpg -rwxr-xr-x 1 postgres postgres 75K Aug 26 16:59 initdb -rwxr-xr-x 1 postgres postgres 27K Aug 26 16:59 pg_config -rwxr-xr-x 1 postgres postgres 26K Aug 26 16:59 pg_controldata -rwxr-xr-x 1 postgres postgres 37K Aug 26 16:59 pg_ctl -rwxr-xr-x 1 postgres postgres 273K Aug 26 16:59 pg_dump -rwxr-xr-x 1 postgres postgres 71K Aug 26 16:59 pg_dumpall -rwxr-xr-x 1 postgres postgres 35K Aug 26 16:59 pg_resetxlog -rwxr-xr-x 1 postgres postgres 125K Aug 26 16:59 pg_restore -rwxr-xr-x 1 postgres postgres 619K Sep 2 18:43 pgsql2shp -rwxr-xr-x 1 postgres postgres 4.8M Aug 26 16:59 postgres lrwxrwxrwx 1 postgres postgres 8 Aug 26 16:59 postmaster -> postgres -rwxr-xr-x 1 postgres postgres 337K Aug 26 16:59 psql -rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 reindexdb -rwxr-xr-x 1 postgres postgres 621K Sep 2 18:43 shp2pgsql -rwxr-xr-x 1 postgres postgres 32K Aug 26 16:59 vacuumdb
--pg_controldata 查看PG SERVER 详细信息 [postgres@PG1 bin]$ ./pg_controldata $PGDATA pg_control version number: 843 Catalog version number: 200904091 Database system identifier: 5509641090052341117 Database cluster state: shut down pg_control last modified: Fri 17 Sep 2010 05:28:03 PM CST Latest checkpoint location: 137/BFFFF68 Prior checkpoint location: 131/9D90C818 Latest checkpoint's REDO location: 137/BFFFF68 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 0/64282 Latest checkpoint's NextOID: 215491390 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Time of latest checkpoint: Fri 17 Sep 2010 05:26:05 PM CST Minimum recovery ending location: 0/0 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 1048576 WAL block size: 65536 Bytes per WAL segment: 67108864 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value
重要的信息: Latest checkpoint's NextXID: 0/64282,"Latest checkpoint's NextXID" 是指最近一次安全的checkpoints的下一个事务ID,我们可以将数据库恢复到这一时刻。
-- 通过pg_resetxlog 将数据库恢复到事务 64282 时刻
[postgres@PG1 bin]$ pg_resetxlog --help pg_resetxlog resets the PostgreSQL transaction log.
Usage: pg_resetxlog [OPTION]... DATADIR
Options: -e XIDEPOCH set next transaction ID epoch -f force update to be done -l TLI,FILE,SEG force minimum WAL starting location for new transaction log -m XID set next multitransaction ID -n no update,just show extracted control values (for testing) -o OID set next OID -O OFFSET set next multitransaction offset -x XID set next transaction ID --help show this help,then exit --version output version information,then exit
Report bugs to <pgsql-bugs@postgresql.org>.
[postgres@PG1 bin]$ pg_resetxlog -x 64282 $PGDATA Transaction log reset
--再次启动 PG SERVER,正常,此时数据库已恢复 [postgres@PG1 bin]$ pg_ctl -D $PGDATA start server starting [postgres@PG1 bin]$ ps -ef | grep post postgres 25297 1 14 18:08 pts/1 00:00:00 /opt/pgsql/bin/postgres -D /opt/pgdata/pg_root postgres 25298 25297 0 18:08 ? 00:00:00 postgres: logger process postgres 25829 25297 0 18:08 ? 00:00:00 postgres: writer process postgres 25830 25297 0 18:08 ? 00:00:00 postgres: wal writer process postgres 25831 25297 0 18:08 ? 00:00:00 postgres: stats collector process
到了这里,数据库PG SERVER 终于可以启来了,这里顺便说一下,之前的 pg_restore 中途异常是因为脚本中加了 -j 参数,同时跑多个 pg_restore 线程, 造成 pg_restore 子线程连接丢失,pg_restore 脚本中去掉 -j 参数时,数据库顺利导入。
--pg_controldata 官网文档介绍 Name pg_controldata — display control information of a PostgreSQL database cluster
Synopsis pg_controldata [datadir]
Description pg_controldata prints information initialized during initdb,such as the catalog version. It also shows information about write-ahead logging and checkpoint processing. This information is cluster-wide,and not specific to any one database. This utility can only be run by the user who initialized the cluster because it requires read access to the data directory. You can specify the data directory on the command line,or use the environment variable PGDATA.
Environment PGDATA Default data directory location
--pg_resetxlog 官网文档 pg_resetxlog — reset the write-ahead log and other control information of a PostgreSQL database cluster (编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|