加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

postgresql explain query中的materialize

发布时间:2020-12-13 17:51:21 所属栏目:百科 来源:网络整理
导读:今天在GREENPLUM的执行计划中看到了materialize,不知道这个步骤是做什么的,查了下文档,没有相关解释。没事,GREENPLUM不行还有POSTGRESQL呢,GOOGLE了一把,找到了答案。 Q: What does materialize do? I'm joining two tables,not views or anything lik

今天在GREENPLUM的执行计划中看到了materialize,不知道这个步骤是做什么的,查了下文档,没有相关解释。没事,GREENPLUM不行还有POSTGRESQL呢,GOOGLE了一把,找到了答案。

Q:

What does materialize do? I'm joining two tables,not views or anything like that.

A:

A materialize node means the output of whatever is below it in the tree (which can be a scan,or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case,the planner is determining that the result of a scan on one of your tables will fit in memory,and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.


具体的SQL如下

aligputf8=# select count(1) from ttt1;
count
-------
10000
(1 row)

aligputf8=#
aligputf8=# select count(1) from ttt2;
count
-------
10000
(1 row)


aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=0.00..1409.07 rows=1667 width=8)
-> Nested Loop (cost=0.00..1409.07 rows=1667 width=8)
-> Broadcast Motion 1:6 (slice1; segments: 1) (cost=0.00..137.07 rows=6 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=1 width=4)
Filter: id = 3
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(6 rows)

aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id=3;
aligputf8=#
aligputf8=# explain select * from ttt1 as a,ttt2 as b where a.id<3000;
QUERY PLAN
------------------------------------------------------------------------------------------------
Gather Motion 6:1 (slice2; segments: 6) (cost=122.00..3599268.93 rows=4998334 width=8)
-> Nested Loop (cost=122.00..3599268.93 rows=4998334 width=8)
-> Broadcast Motion 6:6 (slice1; segments: 6) (cost=0.00..346.93 rows=2999 width=4)
-> Seq Scan on ttt1 a (cost=0.00..137.00 rows=500 width=4)
Filter: id < 3000
-> Materialize (cost=122.00..222.00 rows=1667 width=4)
-> Seq Scan on ttt2 b (cost=0.00..112.00 rows=1667 width=4)
(7 rows)

我的理解就是PG为了加快nested loop循环的速度,把b表的数据缓存在了内存中。我们可以看到前面一个SQL,只对b表数据进行1次扫描,因此并不需要进行缓存;后者需要进行2999次扫描。

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读