postgresql – 对Postgres行大小有意义

发布时间：2020-12-13 16:37:49 所属栏目：百科来源：网络整理

导读：我有一个大型( 100M行)Postgres表格，结构为{integer，integer，integer，timestamp without time zone}。我预计一行的大小为3 *整数1 *时间戳= 3 * 4 1 * 8 = 20字节。实际上，行大小是pg_relation_size(tbl)/ count(*)= 52字节。为什么？ (没有对表进行删

我有一个大型(> 100M行)Postgres表格，结构为{integer，integer，integer，timestamp without time zone}。我预计一行的大小为3 *整数1 *时间戳= 3 * 4 1 * 8 = 20字节。

实际上，行大小是pg_relation_size(tbl)/ count(*)= 52字节。为什么？

(没有对表进行删除：pg_relation_size(tbl，’fsm’)?= 0)

行大小的计算比这更复杂。

存储通常在8 kb数据页中进行分区。每页有一个小的固定开销，可能的余数不够大，不能适合另一个元组，更重要的是死排或最初用FILLFACTOR设置保留的百分比。

更重要的是，每行有开销(元组)。 23个字节的堆叠标头和对齐填充。元组头的开始以及元组数据的开始以MAXALIGN的倍数排列，在典型的64位机器上是8字节。一些数据类型需要与2,4或8个字节的下一个倍数进行对齐。

Quoting the manual on the system table pg_tpye:

typalign is the alignment required when storing a value of this type.
It applies to storage on disk as well as most representations of the
value inside PostgreSQL. When multiple values are stored
consecutively,such as in the representation of a complete row on
disk,padding is inserted before a datum of this type so that it
begins on the specified boundary. The alignment reference is the
beginning of the first datum in the sequence.

Possible values are:

c = char alignment,i.e.,no alignment needed.

s = short alignment (2 bytes on most machines).

i = int alignment (4 bytes on most machines).

d = double alignment (8 bytes on many machines,but by no means all).

请阅读手册here中的基础知识。

你的例子

这导致3个整数列之后的4个字节的填充，因为时间戳列需要双重对齐，需要以8个字节的下一个倍数开始。

所以，一排占用：

23   -- heaptupleheader
 +  1   -- padding or NULL bitmap
 + 12   -- 3 * integer (no alignment padding here)
 +  4   -- padding after 3rd integer
 +  8   -- timestamp
 +  0   -- no padding since tuple ends at multiple of MAXALIGN

最后，在页眉(如pointed out by @A.H. in the comment)中每个元组有一个ItemData指针(item pointer)，占用4个字节：

+  4   -- item pointer in page header
------
 = 52 bytes

所以我们到达观察到的52个字节。

pg_relation_size(tbl)/ count(*)的计算是一个悲观的估计。 pg_relation_size(tbl)包括由fillfactor保留的膨胀(dead rows)和空格，以及每个数据页和每个表的开销。 (我们甚至没有在TOAST tables中提到压缩长期varlena数据，因为它不适用于此。)

您可以安装附加模块pgstattuple并调用SELECT * FROM pgstattuple(‘tbl_name’);有关表和元组大小的更多信息。