PostgreSQL不使用部分索引

发布时间：2020-12-13 16:30:46 所属栏目：百科来源：网络整理

导读：我在PostgreSQL 9.2中有一个包含文本列的表.我们称之为text_col.此列中的值非常独特(最多可包含5-6个重复项).该表有大约500万行.大约一半的这些行包含text_col的空值.当我执行以下查询时,我期望1-5行.在大多数情况下( 80％)我只期望1行. 询问 explain analyz

我在PostgreSQL 9.2中有一个包含文本列的表.我们称之为text_col.此列中的值非常独特(最多可包含5-6个重复项).该表有大约500万行.大约一半的这些行包含text_col的空值.当我执行以下查询时,我期望1-5行.在大多数情况下(> 80％)我只期望1行.

询问

explain analyze SELECT col1,col2.. colN
FROM table 
WHERE text_col = 'my_value';

text_col上存在btree索引.查询规划器从不使用此索引,我不确定原因.这是查询的输出.

规划人员

Seq Scan on two (cost=0.000..459573.080 rows=93 width=339) (actual time=1392.864..3196.283 rows=2 loops=1)
Filter: (victor = 'foxtrot'::text)
Rows Removed by Filter: 4077384

我添加了另一个部分索引来尝试过滤掉那些非空的值,但这没有帮助(有或没有text_pattern_ops.我不需要text_pattern_ops,因为我的查询中没有表达LIKE条件,但它们也匹配相等).

CREATE INDEX name_idx
  ON table
  USING btree
  (text_col COLLATE pg_catalog."default" text_pattern_ops)
  WHERE text_col IS NOT NULL;

使用set enable_seqscan = off禁用序列扫描;使规划者仍然通过index_scan选择seqscan.综上所述…

>此查询返回的行数很小.
>鉴于非空行非常独特,对文本的索引扫描应该更快.
>清理和分析表并没有帮助优化器选择索引.

我的问题

>为什么数据库通过索引扫描选择序列扫描？
>当一个表有一个应该检查相等条件的文本列时,是否有任何我可以遵循的最佳实践？
>如何减少此查询所需的时间？

[编辑 – 更多信息]

>索引扫描在我的本地数据库中获取,该数据库包含大约10％的生产数据.

一个 partial index是一个好主意,可以排除你显然不需要的表的一半行.更简单：

CREATE INDEX name_idx ON table (text_col)
WHERE text_col IS NOT NULL;

确保在创建索引后运行ANALYZE表. (如果您不手动执行,Autovacuum会在一段时间后自动执行此操作,但如果您在创建后立即进行测试,则测试将失败.)

然后,为了说服查询规划器可以使用特定的部分索引,在查询中重复WHERE条件 – 即使它看起来完全是多余的：

SELECT col1,col2,.. colN
FROM   table 
WHERE  text_col = 'my_value'
AND text_col IS NOT NULL;  -- repeat condition

瞧.

Per documentation：

However,keep in mind that the predicate must match the conditions
used in the queries that are supposed to benefit from the index. To be
precise,a partial index can be used in a query only if the system can
recognize that the WHERE condition of the query mathematically implies
the predicate of the index. PostgreSQL does not have a sophisticated
theorem prover that can recognize mathematically equivalent
expressions that are written in different forms. (Not only is such a
general theorem prover extremely difficult to create,it would
probably be too slow to be of any real use.) The system can recognize
simple inequality implications,for example “x < 1” implies “x < 2”;
otherwise the predicate condition must exactly match part of the
query’s WHERE condition or the index will not be recognized as usable.
Matching takes place at query planning time,not at run time. As a
result,parameterized query clauses do not work with a partial index.

至于参数化查询：再次,将部分索引的(冗余)谓词添加为附加的常量WHERE条件,并且它可以正常工作.

Postgres 9.6中的一个重要更新大大提高了index-only scans的机会(这可以使查询更便宜,查询规划人员将更容易选择此类查询计划).有关：

> PostgreSQL not using index during count(*)

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!