Postgresql 恢复一例

发布时间：2020-12-13 17:21:08 所属栏目：百科来源：网络整理

导读：今天在将pg_dump压缩过的dump文件，通过pg_restore 导入到测试库时，中途异常中断，造成测试库Server donw 机，而且之后数据无法启动。 -- 数据库异常down机的 csvlog 2010-09-17 17:28:03.943 CST,"mydb",23936,"192.168.1.25:58855",4c9334a3.5d80,1,"/opt/

今天在将pg_dump压缩过的dump文件，通过pg_restore 导入到测试库时，中途异常中断，造成测试库Server
donw 机，而且之后数据无法启动。

-- 数据库异常down机的 csvlog
2010-09-17 17:28:03.943 CST,"mydb",23936,"192.168.1.25:58855",4c9334a3.5d80,1,"/opt/pgsql/bin/postgres",2010-09-17 17:28:03 CST,FATAL,57P03,"the database system is in recovery mode",
2010-09-17 17:28:03.944 CST,23937,"192.168.1.25:58856",4c9334a3.5d81,
2010-09-17 17:28:03.954 CST,23938,"192.168.1.25:58857",4c9334a3.5d82,
2010-09-17 17:28:03.955 CST,23939,"192.168.1.25:58858",4c9334a3.5d83,
2010-09-17 17:28:03.956 CST,23940,"192.168.1.25:58859",4c9334a3.5d84,
2010-09-17 17:28:04.032 CST,32240,4c932f7d.7df0,9,2010-09-17 17:06:05 CST,53100,"could not write to file ""pg_xlog/xlogtemp.32240"": No space left on device",
2010-09-17 17:28:04.098 CST,946,4c9321d4.3b2,5,2010-09-17 16:07:48 CST,LOG,00000,"startup process (PID 32240) exited with exit code 1",6,"aborting startup due to startup process failure",
从数据库down机前的 csvlog日志来看，down机前数据库SERVER 处于 recovery mode,说明pg_restore时 SERVER此时有异常，
之后尝试重新起动Server,命令如下
[postgres@PG1 bin]$ pg_ctl -D $PGDATA start
server starting

虽然显示'server starting',但数据库并没有真正起来，因为这时数据库根本无法连接,接着查看 csvlog

--数据库启动异常时 csvlog
2010-09-17 17:36:36.660 CST,24245,"192.168.169.42:42566",4c9336a4.5eb5,2010-09-17 17:36:36 CST,"the database system is starting up",
2010-09-17 17:36:36.662 CST,24247,"192.168.169.42:42567",4c9336a4.5eb7,
2010-09-17 17:36:36.663 CST,24246,"192.168.1.25:33223",4c9336a4.5eb6,24248,"192.168.169.42:42568",4c9336a4.5eb8,
2010-09-17 17:36:36.665 CST,24249,"192.168.169.42:42569",4c9336a4.5eb9,
2010-09-17 17:36:36.666 CST,24163,4c9336a3.5e63,2010-09-17 17:36:35 CST,"startup process (PID 24165) exited with exit code 1",2,

从上面日志来看，可以看出Server 数据库正在启动（the database system is starting up"）,但是到后面就异常中上了;
而且也没有多余的信息，由于事情紧迫，随即向师傅请教，师傅说数据库需要恢复，可以恢复到具体的时间点。以下是解决过程。

--解决过程
[postgres@PG1 pg_log]$ cd $PGHOME
[postgres@PG1 pgsql]$ ll
total 16K
drwxr-xr-x 2 postgres postgres 4.0K Sep 2 18:43 bin
drwxr-xr-x 6 postgres postgres 4.0K Aug 26 16:59 include
drwxr-xr-x 3 postgres postgres 4.0K Sep 3 18:55 lib
drwxr-xr-x 8 postgres postgres 4.0K Sep 1 17:27 share
[postgres@PG1 pgsql]$ cd bin
[postgres@PG1 bin]$ ll
total 8.1M
-rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 clusterdb
-rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 createdb
-rwxr-xr-x 1 postgres postgres 58K Aug 26 16:59 createlang
-rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 createuser
-rwxr-xr-x 1 postgres postgres 50K Aug 26 16:59 dropdb
-rwxr-xr-x 1 postgres postgres 58K Aug 26 16:59 droplang
-rwxr-xr-x 1 postgres postgres 50K Aug 26 16:59 dropuser
-rwxr-xr-x 1 postgres postgres 616K Aug 26 16:59 ecpg
-rwxr-xr-x 1 postgres postgres 75K Aug 26 16:59 initdb
-rwxr-xr-x 1 postgres postgres 27K Aug 26 16:59 pg_config
-rwxr-xr-x 1 postgres postgres 26K Aug 26 16:59 pg_controldata
-rwxr-xr-x 1 postgres postgres 37K Aug 26 16:59 pg_ctl
-rwxr-xr-x 1 postgres postgres 273K Aug 26 16:59 pg_dump
-rwxr-xr-x 1 postgres postgres 71K Aug 26 16:59 pg_dumpall
-rwxr-xr-x 1 postgres postgres 35K Aug 26 16:59 pg_resetxlog
-rwxr-xr-x 1 postgres postgres 125K Aug 26 16:59 pg_restore
-rwxr-xr-x 1 postgres postgres 619K Sep 2 18:43 pgsql2shp
-rwxr-xr-x 1 postgres postgres 4.8M Aug 26 16:59 postgres
lrwxrwxrwx 1 postgres postgres 8 Aug 26 16:59 postmaster -> postgres
-rwxr-xr-x 1 postgres postgres 337K Aug 26 16:59 psql
-rwxr-xr-x 1 postgres postgres 54K Aug 26 16:59 reindexdb
-rwxr-xr-x 1 postgres postgres 621K Sep 2 18:43 shp2pgsql
-rwxr-xr-x 1 postgres postgres 32K Aug 26 16:59 vacuumdb

--pg_controldata 查看PG SERVER 详细信息
[postgres@PG1 bin]$ ./pg_controldata $PGDATA
pg_control version number: 843
Catalog version number: 200904091
Database system identifier: 5509641090052341117
Database cluster state: shut down
pg_control last modified: Fri 17 Sep 2010 05:28:03 PM CST
Latest checkpoint location: 137/BFFFF68
Prior checkpoint location: 131/9D90C818
Latest checkpoint's REDO location: 137/BFFFF68
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/64282
Latest checkpoint's NextOID: 215491390
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Time of latest checkpoint: Fri 17 Sep 2010 05:26:05 PM CST
Minimum recovery ending location: 0/0
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 1048576
WAL block size: 65536
Bytes per WAL segment: 67108864
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

重要的信息： Latest checkpoint's NextXID: 0/64282,"Latest checkpoint's NextXID"
是指最近一次安全的checkpoints的下一个事务ID，我们可以将数据库恢复到这一时刻。

-- 通过pg_resetxlog 将数据库恢复到事务 64282 时刻

[postgres@PG1 bin]$ pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.

Usage:
pg_resetxlog [OPTION]... DATADIR

Options:
-e XIDEPOCH set next transaction ID epoch
-f force update to be done
-l TLI,FILE,SEG force minimum WAL starting location for new transaction log
-m XID set next multitransaction ID
-n no update,just show extracted control values (for testing)
-o OID set next OID
-O OFFSET set next multitransaction offset
-x XID set next transaction ID
--help show this help,then exit
--version output version information,then exit

Report bugs to <pgsql-bugs@postgresql.org>.

[postgres@PG1 bin]$ pg_resetxlog -x 64282 $PGDATA
Transaction log reset

--再次启动 PG SERVER，正常,此时数据库已恢复
[postgres@PG1 bin]$ pg_ctl -D $PGDATA start
server starting
[postgres@PG1 bin]$ ps -ef | grep post
postgres 25297 1 14 18:08 pts/1 00:00:00 /opt/pgsql/bin/postgres -D /opt/pgdata/pg_root
postgres 25298 25297 0 18:08 ? 00:00:00 postgres: logger process
postgres 25829 25297 0 18:08 ? 00:00:00 postgres: writer process
postgres 25830 25297 0 18:08 ? 00:00:00 postgres: wal writer process
postgres 25831 25297 0 18:08 ? 00:00:00 postgres: stats collector process

到了这里，数据库PG SERVER 终于可以启来了，这里顺便说一下，之前的 pg_restore 中途异常是因为脚本中加了
-j 参数，同时跑多个 pg_restore 线程，造成 pg_restore 子线程连接丢失，pg_restore 脚本中去掉 -j
参数时，数据库顺利导入。

--pg_controldata 官网文档介绍
Name
pg_controldata — display control information of a PostgreSQL database cluster

Synopsis
pg_controldata [datadir]

Description
pg_controldata prints information initialized during initdb,such as the catalog version. It also shows information about write-ahead logging and checkpoint processing. This information is cluster-wide,and not specific to any one database.
This utility can only be run by the user who initialized the cluster because it requires read access to the data directory. You can specify the data directory on the command line,or use the environment variable PGDATA.

Environment
PGDATA
Default data directory location

--pg_resetxlog 官网文档 pg_resetxlog — reset the write-ahead log and other control information of a PostgreSQL database cluster

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!