postgresql – 为什么Postgres在jsonb列上查找这么慢？

发布时间：2020-12-13 16:08:19 所属栏目：百科来源：网络整理

导读：我有一个表格目标,其中包含text []类型的marital_status列和另一个类型为jsonb的列数据.这两列的内容是相同的,只是采用不同的格式(仅用于演示目的).示例数据： id | marital_status | data ----+--------------------------+-------------------------------

我有一个表格目标,其中包含text []类型的marital_status列和另一个类型为jsonb的列数据.这两列的内容是相同的,只是采用不同的格式(仅用于演示目的).示例数据：

id |      marital_status      |                        data                       
----+--------------------------+---------------------------------------------------
  1 | null                     | {}
  2 | {widowed}                | {"marital_status": ["widowed"]}
  3 | {never_married,divorced} | {"marital_status": ["never_married","divorced"]}
...

表中随机组合中有超过690K条记录.

在text []列上查找

EXPLAIN ANALYZE SELECT marital_status
FROM targeting
WHERE marital_status @> '{widowed}'::text[]

没有索引

通常需要< 900毫秒没有创建任何索引：

Seq Scan on targeting  (cost=0.00..172981.38 rows=159061 width=28) (actual time=0.017..840.084 rows=158877 loops=1)
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
Planning time: 0.150 ms
Execution time: 845.731 ms

有索引

使用索引通常需要< 200毫秒(75％的改进)：

CREATE INDEX targeting_marital_status_idx ON targeting ("marital_status");

结果：

Index Only Scan using targeting_marital_status_idx on targeting  (cost=0.42..23931.35 rows=159061 width=28) (actual time=3.528..143.848 rows=158877 loops=1)"
  Filter: (marital_status @> '{widowed}'::text[])
  Rows Removed by Filter: 452033
  Heap Fetches: 0
Planning time: 0.217 ms
Execution time: 148.506 ms

在jsonb列上查找

EXPLAIN ANALYZE SELECT data
FROM targeting
WHERE (data -> 'marital_status') @> '["widowed"]'::jsonb

没有索引

通常需要< 5,700毫秒没有创建任何指数(慢6倍以上！)：

Seq Scan on targeting  (cost=0.00..174508.65 rows=611 width=403) (actual time=0.095..5399.112 rows=158877 loops=1)
  Filter: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Filter: 452033
Planning time: 0.172 ms
Execution time: 5408.326 ms

有索引

对于索引,通常需要< 3,700毫秒(改善35％)：

CREATE INDEX targeting_data_marital_status_idx ON targeting USING GIN ((data->'marital_status'));

结果：

Bitmap Heap Scan on targeting  (cost=144.73..2482.75 rows=611 width=403) (actual time=85.966..3694.834 rows=158877 loops=1)
  Recheck Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
  Rows Removed by Index Recheck: 201080
  Heap Blocks: exact=33723 lossy=53028
  ->  Bitmap Index Scan on targeting_data_marital_status_idx  (cost=0.00..144.58 rows=611 width=0) (actual time=78.851..78.851 rows=158877 loops=1)"
        Index Cond: ((data -> 'marital_status'::text) @> '["widowed"]'::jsonb)
Planning time: 0.257 ms
Execution time: 3703.492 ms

问题

>为什么text []列的性能更高,即使不使用索引？
>为什么在jsonb列中添加索引只会使性能提高35％？
>在jsonb列上执行查找是否有更多的性能？

解决方法

似乎是一个简单的问题.基本上你问的是怎么来的,

CREATE TABLE foo ( id int,key1 text );

比快

CREATE TABLE bar ( id int,jsonb foo );

@Craig在评论中回答了这个问题

GIN indexing is generally less efficient than a b-tree,so that much is expected.

此模式中的空值也应该读取

SELECT jsonb_build_object('marital_status',ARRAY[null]);
     jsonb_build_object     
----------------------------
 {"marital_status": [null]}
(1 row)

并不是 {}. PostgreSQL采用了许多快捷方式来快速更新jsonb对象,并使索引节省空间.

如果这些都没有意义,请查看此伪表.

CREATE TABLE foo ( id int,x text,y text,z text )
CREATE INDEX ON foo(x);
CREATE INDEX ON foo(y);
CREATE INDEX ON foo(z);

这里我们有三个btree索引.我们来看一个类似的表..

CREATE TABLE bar ( id int,junk jsonb );
CREATE INDEX ON bar USING gin (junk);
INSERT INTO bar (id,junk) VALUES (1,$${"x": 10,"y": 42}$$);

对于像foo一样执行的条形图,我们需要两个btree,它们都将比我们拥有的单个GIN索引大得多.如果你这样做了

INSERT INTO bar (id,"y": 42,"z":3}$$);

我们必须在z上有另一个btree索引,这也将是巨大的.你可以看到我要去哪里. jsonb很棒,但索引和模式建模的复杂性并不与数据库并行.您不能只将数据库缩减为jsonb列,发出CREATE INDEX并期望获得相同的性能.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!