加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

Postgresql之autovacuum分析及表的垃圾数据查看

发布时间:2020-12-13 16:42:40 所属栏目:百科 来源:网络整理
导读:转载:http://blog.163.com/digoal@126/blog/static/163877040201343031118890/ PostgreSQL数据库日常维护需要维护哪些东西,和数据库中的业务类型有莫大的关系. PostgreSQL的并发控制简单来说是通过多tuple版本,tuple infomask信息,事务提交状态以及事务snap
转载:http://blog.163.com/digoal@126/blog/static/163877040201343031118890/
PostgreSQL数据库日常维护需要维护哪些东西,和数据库中的业务类型有莫大的关系.
PostgreSQL的并发控制简单来说是通过多tuple版本,tuple infomask信息,事务提交状态以及事务snapshot来实现的.
当删除一条记录时,并不是马上回收被删除的空间,因为有可能其他事务还会用到它,当更新一条记录是,老的记录会保留,然后插入新的记录.
例如 :
 
  

digoal=# create table tbl(id int,info text);
CREATE TABLE
digoal=# insert into tbl values (1,'test');
INSERT 0 1
digoal=# delete from tbl;
DELETE 1
digoal=# select ctid,* from tbl;
ctid | id | info
-------+----+------
(0,3) | 1 | test
(1 row)

多次删除插入后,ctid以及变成3了,因为前面的两条并为删除.
update也是如此 :
# update tbl set info='new';
UPDATE 1
老的tuple在0号block的itemid=3的位置,新的tuple是后面插入的在0号block的4号槽.
那么这些垃圾数据是怎么回收的呢,PostgreSQL的vacuum进程就是干这个事情的.
1. vacuum 数据清理.
以上测试表在执行vacuum后的输出如下 :
移除了3个版本.
# vacuum verbose tbl;
INFO: vacuuming "public.tbl"
INFO: "tbl": removed 3 row versions in 1 pages
INFO: "tbl": found 3 removable,1 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_32771"
INFO: index "pg_toast_32771_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted,0 are currently reusable.
INFO: "pg_toast_32771": found 0 removable,0 nonremovable row versions in 0 out of 0 pages
VACUUM

重新插入数据,此时那些被垃圾占用的槽位就可以被利用了.
# insert into tbl values (1,1) | 1 | test
(2 rows)

一个表有多少条垃圾数据,多少条活跃数据在系统表 pg_stat_all_tables 中可以查询.
 
   

# select * from pg_stat_all_tables where relid='tbl'::regclass;
-[ RECORD 1 ]-----+------------------------------
relid | 32771
schemaname | public
relname | tbl
seq_scan | 6
seq_tup_read | 7
idx_scan |
idx_tup_fetch |
n_tup_ins | 4
n_tup_upd | 1
n_tup_del | 2
n_tup_hot_upd | 1
n_live_tup | 2
n_dead_tup | 0
last_vacuum | 2013-05-27 17:00:17.094391+08
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 1
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0

n_live_tup | 2表示有2条活跃数据,
n_dead_tup | 0表示有0条垃圾数据.
执行以下删除后,会发生变化 :
# delete from tbl;; 
   
DELETE 2
digoal=# select * from pg_stat_all_tables where relid='tbl'::regclass;
seq_scan | 7
seq_tup_read | 9
n_tup_del | 4
n_live_tup | 0
n_dead_tup | 2
autoanalyze_count | 0

n_live_tup | 0表示有0条活跃数据,
n_dead_tup | 2表示有2条垃圾数据.
vacuum 后活跃数据和垃圾数据都会变成0
# vacuum tbl;
VACUUM
digoal=# select * from pg_stat_all_tables where relid='tbl'::regclass;
-[ RECORD 1 ]-----+------------------------------
relid | 32771
schemaname | public
relname | tbl
seq_scan | 7
seq_tup_read | 9
idx_scan |
idx_tup_fetch |
n_tup_ins | 4
n_tup_upd | 1
n_tup_del | 4
n_tup_hot_upd | 1
n_live_tup | 0
n_dead_tup | 0
last_vacuum | 2013-05-27 17:05:17.664564+08
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 2
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0


2. 自动垃圾回收的配置.
对于一个DML频繁的数据库,如果靠手动来回收垃圾是不太靠谱的事情,PostgreSQL提供了自动的垃圾回收配置.
相关参数如下 :

#------------------------------------------------------------------------------
# AUTOVACUUM PARAMETERS
#------------------------------------------------------------------------------

#autovacuum = on # Enable autovacuum subprocess? 'on'
# requires track_counts to also be on.
#log_autovacuum_min_duration = -1 # -1 disables,0 logs all actions and
# their durations,> 0 logs only
# actions running at least this number
# of milliseconds.
#autovacuum_max_workers = 3 # max number of autovacuum subprocesses
# (change requires restart)
#autovacuum_naptime = 1min # time between autovacuum runs
#autovacuum_vacuum_threshold = 50 # min number of row updates before
# vacuum
#autovacuum_analyze_threshold = 50 # min number of row updates before
# analyze
#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze
#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum
#autovacuum_vacuum_cost_delay = 20ms # default vacuum cost delay for
# autovacuum,in milliseconds;
# -1 means use vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# vacuum_cost_limit
# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0 # 0-100 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits

简单介绍一下参数的含义 :
autovacuum,自动垃圾回收的开关
log_autovacuum_min_duration,在什么情况下记录autovacuum日志输出. 0表示记录所有的autovacuum,-1表示不记录,其他为时间阈值,大于或等于这个时长的autovacuum才记录.
autovacuum_max_workers,指最大允许多少个autovacuum子进程同时工作. 因为vacuum会带来IO上的开销,还会消耗内存. 这个就不要配太大了.
autovacuum_vacuum_threshold表示autovacuum的vacuum操作所需的最小变更数,如果这个表的update/delete的tuple总数小于这个数字则不会触发autovacuum的vacuum操作.
和autovacuum_analyze_threshold 表示autovacuum的analyze操作所需的最小变更数,如果这个表的insert/update/delete的tuple总数小于这个数字则不会触发autovacuum的analyze操作.
autovacuum_vacuum_scale_factor,表示autovacuum的vacuum操作所需的变更量阈值,当 这个表的update/delete的tuple总数 大于(pg_class.reltuples* autovacuum_vacuum_scale_factor+ autovacuum_vacuum_threshold)时,触发vacuum操作.
autovacuum_analyze_scale_factor,表示autovacuum的analyze操作所需的变更量阈值,当 这个表的INSERT/update/delete的tuple总数 大于(pg_class.reltuples* autovacuum_analyze_scale_factor+ autovacuum_analyze_threshold)时,触发analyze操作.
autovacuum_freeze_max_age,即使autovacuum未开启,为了防止wrapped xid导致数据不可见,也会自动触发的vacuum操作. 表示一个表中存在的最早的事务信息到现在为止经历的事务数. 超出则强制vacuum. 防止xid wrapped.
autovacuum_vacuum_cost_delay,因为vacuum会带来一定的IO开销,所以PostgreSQL允许管理员指定当vacuum达到一定的阈值后进入随眠状态,然后再唤醒继续vacuum. 具体的计算需要配置项Cost-Based Vacuum Delay决定.
接下来主要举例说明几个 threshold参数的作用 :
查看当前的阈值 :
# show autovacuum_analyze_scale_factor;
autovacuum_analyze_scale_factor
---------------------------------
0.1
(1 row)
digoal=# show autovacuum_vacuum_scale_factor;
autovacuum_vacuum_scale_factor
--------------------------------
0.2
digoal=# show autovacuum_analyze_threshold;
autovacuum_analyze_threshold
------------------------------
50
digoal=# show autovacuum_vacuum_threshold;
autovacuum_vacuum_threshold
-----------------------------
修改naptime,以及log_autovacuum_min_duration 便于从日志中或者统计表中观察结果 :

pg93@db-17216333-> cd $PGDATA
vi postgresql.conf
autovacuum_naptime = 1s
log_autovacuum_min_duration 0
pg93@db pg_ctl reload
server signaled

创建测试表 :
CREATE TABLE

计算插入多少条数据后会触发analyze :

digoal = # select reltuples from pg_class where relname='tbl';
reltuples
-----------
0
(1 row)
autovacuum_analyze_scale_factor*0+autovacuum_analyze_threshold=50;

因此插入51条数据后会发生analyze.
记录pg_stat_all_tables的tbl信息,注意 last_autovacuum,和 last_autoanalyze 的值.
# select * from pg_stat_all_tables where relname ='tbl';
-[ RECORD 1 ]-----+-------
relid | 32798
seq_scan | 0
seq_tup_read | 0
n_tup_ins | 0
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
last_vacuum |
vacuum_count | 0
插入50条测试数据 :
# insert into tbl select generate_series(1,50),'test'; 
   
INSERT 0 50

stat信息,未触发analyze.
# select * from pg_stat_all_tables where relname ='tbl'; 
   
-[ RECORD 1 ]-----+-------
relid | 32798
seq_scan | 0
seq_tup_read | 0
n_tup_ins | 50
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 50
last_vacuum |
vacuum_count | 0
autoanalyze_count | 0

再插入1条记录.
# insert into tbl select 51,0);">INSERT 0 1 
   

触发analyze :
n_tup_ins     | 51 
   
n_live_tup | 51
last_autoanalyze | 2013-05-27 21:23:49.144829+08
autoanalyze_count | 1


# select reltuples from pg_class where relname='tbl';
-[ RECORD 1 ]-
reltuples | 51
autovacuum_analyze_scale_factor*51+autovacuum_analyze_threshold=55.1;

因此插入56条数据后会触发analyze.
 
   

INSERT 0 55
digoal=# select * from pg_stat_all_tables where relname ='tbl';
-[ RECORD 1 ]-----+------------------------------
relid | 32798
schemaname | public
relname | tbl
seq_scan | 0
seq_tup_read | 0
idx_scan |
idx_tup_fetch |
n_tup_ins | 106
n_tup_upd | 0
n_tup_del | 0
n_tup_hot_upd | 0
n_live_tup | 106
n_dead_tup | 0
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze | 2013-05-27 21:23:49.144829+08
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 1
digoal=# select reltuples from pg_class where relname='tbl';
-[ RECORD 1 ]-
reltuples | 51

再插入1条即可触发analyze.

# insert into tbl select 1,0);">INSERT 0 1
digoal=# select * from pg_stat_all_tables where relname ='tbl';
-[ RECORD 1 ]-----+-----------------------------
n_tup_ins | 107
n_live_tup | 107
last_autoanalyze | 2013-05-27 21:26:25.57402+08
autoanalyze_count | 2
digoal=# select reltuples from pg_class where relname='tbl';
-[ RECORD 1 ]--
reltuples | 107

计算UPDATE/DELETE多少条数据后会触发vacuum :
autovacuum_vacuum_scale_factor*107+autovacuum_vacuum_threshold=71.4;

因此更新或删除共计72条数据后会触发vacuum,如果中间发生了analyze,导致pg_class.reltuples发生变化,这个值也会变化.
发生62次insert,update,delete后会触发analyze.
# update tbl set info='new' where id<18;
UPDATE 35
digoal=# update tbl set info='new' where id<10;
UPDATE 19
digoal=# delete from tbl where id<9;
DELETE 17

总共发生了35+19+17=71超出62次dml,发生analyze,
analyze后,pg_class.reltuples变成90,
reltuples | 90

所以触发vacuum的值变成了多少呢?
 
  

9068;

因此只需要69次update/delete即可触发vacuum,而上一次vacuum到现在已经发生了71次update/delete,因此会触发vacuum.
seq_scan     | 8 
   
seq_tup_read | 856
n_tup_upd | 54
n_tup_del | 17
n_tup_hot_upd | 54
n_live_tup | 90
last_autovacuum | 2013-05-27 21:31:23.560703+08
last_autoanalyze | 2013-05-27 21:31:22.474655+08
autovacuum_count | 1
autoanalyze_count | 3

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读