postgresql explain query中的materialize

今天在GREENPLUM的执行计划中看到了materialize，不知道这个步骤是做什么的，查了下文档，没有相关解释。没事，GREENPLUM不行还有POSTGRESQL呢，GOOGLE了一把，找到了答案。

What does materialize do? I'm joining two tables,not views or anything like that.

A materialize node means the output of whatever is below it in the tree (which can be a scan,or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case,the planner is determining that the result of a scan on one of your tables will fit in memory,and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.

具体的SQL如下

aligputf8=# select count(1) from ttt1;
count
-------
10000
(1 row)

aligputf8=#
aligputf8=# select count(1) from ttt2;
count
-------
10000
(1 row)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1409.07 rows=1667 width=8)
-> Nested Loop (cost=0.00..1409.07 rows=1667 width=8)
-> Broadcast Motion 1:6 (slice1; segments: 1) (cost=0.00..137.07 rows=6 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=1 width=4)
Filter: id = 3
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(6 rows)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=122.00..3599268.93 rows=4998334 width=8)
-> Nested Loop (cost=122.00..3599268.93 rows=4998334 width=8)
-> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..346.93 rows=2999 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=500 width=4)
Filter: id < 3000
-> Materialize (cost=122.00..222.00 rows=1667 width=4)
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(7 rows)

我的理解就是PG为了加快nested loop循环的速度，把b表的数据缓存在了内存中。我们可以看到前面一个SQL，只对b表数据进行1次扫描，因此并不需要进行缓存；后者需要进行2999次扫描。

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!