sql – 分区表查询仍然扫描所有分区
我有一张超过十亿条记录的桌子.为了提高性能,我将其分区为30个分区.最常见的查询在where子句中有(id = …),所以我决定在id列上对表进行分区.
基本上,分区是以这种方式创建的: CREATE TABLE foo_0 (CHECK (id % 30 = 0)) INHERITS (foo); CREATE TABLE foo_1 (CHECK (id % 30 = 1)) INHERITS (foo); CREATE TABLE foo_2 (CHECK (id % 30 = 2)) INHERITS (foo); CREATE TABLE foo_3 (CHECK (id % 30 = 3)) INHERITS (foo); . . . 我为整个数据库运行了ANALYZE,特别是,我通过运行以下方法收集了该表的id列的额外统计信息: ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000; 但是,当我运行在id列上过滤的查询时,计划程序会显示它仍在扫描所有分区. constraint_exclusion设置为partition,因此不是问题. EXPLAIN ANALYZE SELECT * FROM foo WHERE (id = 2); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.544..215.540 rows=171477 loops=1) -> Append (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.539..106.446 rows=171477 loops=1) -> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1) Filter: (id = 2) -> Bitmap Heap Scan on foo_0 foo (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.020..0.020 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_0_idx_1 (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.018..0.018 rows=0 loops=1) Index Cond: (id = 2) -> Bitmap Heap Scan on foo_1 foo (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.012..0.012 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_1_idx_1 (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.010..0.010 rows=0 loops=1) Index Cond: (id = 2) -> Bitmap Heap Scan on foo_2 foo (cost=3280.30..272541.10 rows=121903 width=56) (actual time=30.504..77.033 rows=171477 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=29.825..29.825 rows=171477 loops=1) Index Cond: (id = 2) . . . 我能做些什么让刨床有更好的计划?我是否需要运行ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000;对于所有分区呢? 编辑 在使用Erwin建议的查询更改后,计划程序仅扫描正确的分区,但执行时间实际上比完整扫描(至少是索引)更差. EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2); QUERY PLAN QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.611..224.934 rows=171477 loops=1) -> Append (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.606..116.565 rows=171477 loops=1) -> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1) Filter: (id = 2) -> Bitmap Heap Scan on foo_0 foo (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.046..0.046 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_0_idx_1 (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.044..0.044 rows=0 loops=1) Index Cond: (id = 2) -> Bitmap Heap Scan on foo_1 foo (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.021..0.021 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_1_idx_1 (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.020..0.020 rows=0 loops=1) Index Cond: (id = 2) -> Bitmap Heap Scan on foo_2 foo (cost=3280.30..272541.10 rows=121903 width=56) (actual time=32.536..86.730 rows=171477 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=31.842..31.842 rows=171477 loops=1) Index Cond: (id = 2) -> Bitmap Heap Scan on foo_3 foo (cost=3475.87..285574.05 rows=129032 width=52) (actual time=0.035..0.035 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_3_idx_1 (cost=0.00..3443.61 rows=129032 width=0) (actual time=0.031..0.031 rows=0 loops=1) . . . -> Bitmap Heap Scan on foo_29 foo (cost=3401.84..276569.90 rows=126245 width=56) (actual time=0.019..0.019 rows=0 loops=1) Recheck Cond: (id = 2) -> Bitmap Index Scan on foo_29_idx_1 (cost=0.00..3370.28 rows=126245 width=0) (actual time=0.018..0.018 rows=0 loops=1) Index Cond: (id = 2) Total runtime: 238.790 ms 与: EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Result (cost=0.00..273120.30 rows=611 width=56) (actual time=31.519..257.051 rows=171477 loops=1) -> Append (cost=0.00..273120.30 rows=611 width=56) (actual time=31.516..153.356 rows=171477 loops=1) -> Seq Scan on foo (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1) Filter: ((id = 2) AND ((id % 30) = 2)) -> Bitmap Heap Scan on foo_2 foo (cost=3249.97..273120.30 rows=610 width=56) (actual time=31.512..124.177 rows=171477 loops=1) Recheck Cond: (id = 2) Filter: ((id % 30) = 2) -> Bitmap Index Scan on foo_2_idx_1 (cost=0.00..3249.82 rows=121903 width=0) (actual time=30.816..30.816 rows=171477 loops=1) Index Cond: (id = 2) Total runtime: 270.384 ms 解决方法对于非平凡的表达式,您必须在查询中重复或多或少的逐字条件,以使Postgres查询规划器理解它可以依赖于CHECK约束.即使看起来多余!Per documentation:
大胆强调我的.规划者不理解复杂的表达方式.
代替
尝试: SELECT * FROM foo WHERE id % 30 = 2 AND id = 2; 和:
您可以尝试使用constraint_exclusion = on来查看计划程序是否捕获而没有多余的逐字条件.但是你必须权衡这个设置的成本和收益. 替代方案是分区的简单条件,如outlined by @harmic所示. 否,增加STATISTICS的数量在这种情况下无济于事.查询中只有CHECK约束和WHERE条件. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |