PostgreSQL启动过程中的那些事七:初始化共享内存和信号二:shme
pg初始化完shmem,给其加上索引"ShmemIndex"后,接着就在shmem里初始化xlog。 1先上个图,看一下函数调用过程梗概,中间略过部分细节
初始化xlog方法调用流程图 2初始化xlog相关结构 话说main()->…->PostmasterMain()->…->reset_shared() ->CreateSharedMemoryAndSemaphores()>…->XLOGSHmemInit(),初始化控制文件data/global/pg_control相关数据结构及事务日志xlog相关数据结构,相关结构定义在下面。 typedef struct ControlFileData { /* * Unique system identifier --- to ensure wematch up xlog files with the * installation that produced them. */ uint64 system_identifier;
/* * Version identifier information. Keep these fields at the same offset, * especially pg_control_version; they won't bereal useful if they move * around. (Forhistorical reasons they must be 8 bytes into the file * rather than immediately at the front.) * * pg_control_version identifies the format ofpg_control itself. * catalog_version_no identifies the format ofthe system catalogs. * * There are additional version identifiers inindividual files; for * example,WAL logs contain per-page magic numbersthat can serve as * version cues for the WAL log. */ uint32 pg_control_version; /* PG_CONTROL_VERSION */ uint32 catalog_version_no; /* see catversion.h */
/* * System status data */ DBState state; /*see enum above */ pg_time_t time; /*time stamp of last pg_control update */ XLogRecPtr checkPoint; /*last check point record ptr */ XLogRecPtr prevCheckPoint; /* previous check point recordptr */
CheckPoint checkPointCopy; /* copy of last check pointrecord */
/* * These two values determine the minimum pointwe must recover up to * before starting up: * * minRecoveryPoint is updated to the latestreplayed LSN whenever we * flush a data change during archive recovery.That guards against * starting archive recovery,aborting it,andrestarting with an earlier * stop location. If we've already flushed datachanges from WAL record X * to disk,we mustn't start up until we reachX again. Zero when not * doing archive recovery. * * backupStartPoint is the redo pointer of thebackup start checkpoint,if * we are recovering from an online backup andhaven't reached the end of * backup yet. It is reset to zero when the endof backup is reached,and * we mustn't start up before that. A booleanwould suffice otherwise,but * we use the redo pointer as a cross-checkwhen we see an end-of-backup * record,to make sure the end-of-backuprecord corresponds the base * backup we're recovering from. */ XLogRecPtr minRecoveryPoint; XLogRecPtr backupStartPoint;
/* * Parameter settings that determine if the WALcan be used for archival * or hot standby. */ int wal_level; int MaxConnections; int max_prepared_xacts; int max_locks_per_xact;
/* * This data is used to check for hardware-architecturecompatibility of * the database and the backendexecutable. We need not check endianness * explicitly,since the pg_control versionwill surely look wrong to a * machine of different endianness,but we doneed to worry about MAXALIGN * and floating-point format. (Note: storage layout nominally also * depends on SHORTALIGN and INTALIGN,but inpractice these are the same * on all architectures of interest.) * * Testing just one double value is not a verybulletproof test for * floating-point compatibility,but it willcatch most cases. */ uint32 maxAlign; /* alignment requirement for tuples */ double floatFormat; /* constant 1234567.0 */ #define FLOATFORMAT_VALUE 1234567.0
/* * This data is used to make sure that configurationof this database is * compatible with the backend executable. */ uint32 blcksz; /* data block size for this DB */ uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */ uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */ uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
/*flag indicating internal format of timestamp,interval,time */ bool enableIntTimes; /* int64 storageenabled? */
/*flags indicating pass-by-value status of various types */ bool float4ByVal; /* float4 pass-by-value? */ bool float8ByVal; /* float8,int8,etc pass-by-value? */
/*CRC of all above ... MUST BE LAST! */ pg_crc32 crc; } ControlFileData;
/* * Bodyof CheckPoint XLOG records. This isdeclared here because we keep * acopy of the latest one in pg_control for possible disaster recovery. *Changing this struct requires a PG_CONTROL_VERSION bump. */ typedef struct CheckPoint { XLogRecPtr redo; /*next RecPtr available when we began to * create CheckPoint (i.e. REDO start point) */ TimeLineID ThisTimeLineID; /* current TLI */ uint32 nextXidEpoch; /* higher-order bits of nextXid */ TransactionIdnextXid; /* next free XID */ Oid nextOid; /* next free OID */ MultiXactIdnextMulti; /* next freeMultiXactId */ MultiXactOffsetnextMultiOffset; /* next free MultiXactoffset */ TransactionIdoldestXid; /* cluster-wide minimumdatfrozenxid */ Oid oldestXidDB; /* database with minimum datfrozenxid */ pg_time_t time; /*time stamp of checkpoint */
/* * Oldest XID still running. This is onlyneeded to initialize hot standby * mode from an online checkpoint,so we onlybother calculating this for * online checkpoints and only when wal_levelis hot_standby. Otherwise * it's set to InvalidTransactionId. */ TransactionIdoldestActiveXid; } CheckPoint;
/* * Total shared-memorystate for XLOG. */ typedef struct XLogCtlData { /* Protected byWALInsertLock: */ XLogCtlInsertInsert;
/* Protected byinfo_lck: */ XLogwrtRqstLogwrtRqst; XLogwrtResultLogwrtResult; uint32 ckptXidEpoch; /* nextXID & epoch of latest checkpoint */ TransactionIdckptXid; XLogRecPtr asyncXactLSN; /*LSN of newest async commit/abort */ uint32 lastRemovedLog; /* latest removed/recycledXLOG segment */ uint32 lastRemovedSeg;
/* Protected byWALWriteLock: */ XLogCtlWrite Write;
/* * These values do not change after startup,although the pointed-to pages * and xlblocks values certainly do. Permission to read/write the pages * and xlblocks values depends on WALInsertLockand WALWriteLock. */ char *pages; /* buffers forunwritten XLOG pages */ XLogRecPtr*xlblocks; /* 1st byte ptr-s +XLOG_BLCKSZ */ int XLogCacheBlck; /* highest allocated xlog buffer index */ TimeLineID ThisTimeLineID; TimeLineID RecoveryTargetTLI;
/* * archiveCleanupCommand is read fromrecovery.conf but needs to be in * shared memory so that the bgwriter processcan access it. */ char archiveCleanupCommand[MAXPGPATH];
/* * SharedRecoveryInProgress indicates if we'restill in crash or archive * recovery.Protected by info_lck. */ bool SharedRecoveryInProgress;
/* * SharedHotStandbyActive indicates if we'restill in crash or archive * recovery.Protected by info_lck. */ bool SharedHotStandbyActive;
/* * recoveryWakeupLatch is used to wake up thestartup process to continue * WAL replay,if it is waiting for WAL toarrive or failover trigger file * to appear. */ Latch recoveryWakeupLatch;
/* * During recovery,we keep a copy of thelatest checkpoint record here. * Used by the background writer when it wantsto create a restartpoint. * * Protected by info_lck. */ XLogRecPtr lastCheckPointRecPtr; CheckPoint lastCheckPoint;
/* end+1 of the lastrecord replayed (or being replayed) */ XLogRecPtr replayEndRecPtr; /* end+1 of the lastrecord replayed */ XLogRecPtr recoveryLastRecPtr; /* timestamp of lastCOMMIT/ABORT record replayed (or being replayed) */ TimestampTzrecoveryLastXTime; /* Are we requestedto pause recovery? */ bool recoveryPause;
slock_t info_lck; /*locks shared variables shown above */ } XLogCtlData;
/* * Shared state datafor XLogInsert. */ typedef struct XLogCtlInsert { XLogwrtResultLogwrtResult; /* a recent value of LogwrtResult */ XLogRecPtr PrevRecord; /*start of previously-inserted record */ int curridx; /* current block index in cache */ XLogPageHeadercurrpage; /* points to header of blockin cache */ char *currpos; /* currentinsertion point in cache */ XLogRecPtr RedoRecPtr; /*current redo point for insertions */ bool forcePageWrites; /* forcing full-page writes for PITR? */
/* * exclusiveBackup is true if a backup startedwith pg_start_backup() is * in progress,and nonExclusiveBackups is acounter indicating the number * of streaming base backups currently inprogress. forcePageWrites is set * to true when either of these is non-zero.lastBackupStart is the latest * checkpoint redo location used as a startingpoint for an online backup. */ bool exclusiveBackup; int nonExclusiveBackups; XLogRecPtr lastBackupStart; } XLogCtlInsert; 在XLOGSHmemInit()函数里,首先在shmem的哈希表索引"ShmemIndex"上给控制文件pg_control增加一个HashElement和ShmemIndexEnt(entry),在shmem里根据ControlFileData大小调用ShmemAlloc()分配内存空间,使ShmemIndexEnt的成员location指向该空间,size成员记录该空间大小。 XLOGSHmemInit()调用ShmemInitStruct(),在其中调用hash_search()在哈希表索引"ShmemIndex"中查找"XLOGCtl",如果没有,就在shmemIndex中给"XLOG Ctl"分一个HashElement和ShmemIndexEnt(entry),在其中的Entry中写上"XLOG Ctl"。返回ShmemInitStruct(),再调用ShmemAlloc()在共享内存上给"XLOG Ctl"相关结构(见下面“XLog相关结构图”)分配空间,设置entry(在这儿及ShmemIndexEnt类型变量)的成员location指向该空间,size成员记录该空间大小,最后返回XLOGShmemInit(),让XLogCtlData *类型静态全局变量XLogCtl指向在shmem里给"XLOG Ctl"相关结构分配的内存地址,设置其中XLogCtlData结构类型的成员值。初始化完成后数据结构如下图。
初始化完xlog的内存结构图 为了精简上图,把创建shmem的哈希表索引"ShmemIndex"时创建的HCTL结构删掉了,这个结构的作用是记录创建可扩展哈希表的相关信息。增加了左边灰色底的部分,描述共享内存/shmem里各变量物理布局概览,由下往上,由低地址到高地址。其中的"Control File"即ControlFileDate和"XLOG Ctl"即xlog的相关结构图下面分别给出,要不上面的图太大了。
控制文件结构图 上图中ControlFileData结构中的XLogRecPtr和CheckPoint不是指针,因此应该用右边的相应结构图代替,把这两个合进去有点费劲,将就着看吧。
XLog相关结构图 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |