使用ORDER和LIMIT子句极慢的PostgreSQL查询
发布时间:2020-12-13 16:43:37 所属栏目:百科 来源:网络整理
导读:我有一个表,让我们称之为“foos”,有近600万条记录。我运行以下查询: SELECT "foos".*FROM "foos"INNER JOIN "bars" ON "foos".bar_id = "bars".idWHERE (("bars".baz_id = 13266))ORDER BY "foos"."id" DESCLIMIT 5 OFFSET 0; 此查询需要很长时间才能运
我有一个表,让我们称之为“foos”,有近600万条记录。我运行以下查询:
SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; 此查询需要很长时间才能运行(Rails在运行时超时)。有问题的所有ID都有索引。好奇的部分是,如果我删除ORDER BY子句或LIMIT子句,它几乎立即运行。 我假设,ORDER BY和LIMIT的存在使得PostgreSQL在查询计划中做出一些不好的选择。任何人都有任何想法如何解决这个问题? 如果它有帮助,这里是所有3种情况的EXPLAIN: //////// Both ORDER and LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC LIMIT 5 OFFSET 0; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..16663.44 rows=5 width=663) -> Nested Loop (cost=0.00..25355084.05 rows=7608 width=663) Join Filter: (foos.bar_id = bars.id) -> Index Scan Backward using foos_pkey on foos (cost=0.00..11804133.33 rows=4963477 width=663) Filter: (((NOT privacy_protected) OR (user_id = 67962)) AND ((status)::text = 'DONE'::text)) -> Materialize (cost=0.00..658.96 rows=182 width=4) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) (8 rows) //////// Just LIMIT SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) LIMIT 5 OFFSET 0; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..22.21 rows=5 width=663) -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (7 rows) //////// Just ORDER SELECT "foos".* FROM "foos" INNER JOIN "bars" ON "foos".bar_id = "bars".id WHERE (("bars".baz_id = 13266)) ORDER BY "foos"."id" DESC; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=36515.17..36534.19 rows=7608 width=663) Sort Key: foos.id -> Nested Loop (cost=0.00..33788.21 rows=7608 width=663) -> Index Scan using index_bars_on_baz_id on bars (cost=0.00..658.05 rows=182 width=4) Index Cond: (baz_id = 13266) -> Index Scan using index_foos_on_bar_id on foos (cost=0.00..181.51 rows=42 width=663) Index Cond: (foos.bar_id = bars.id) Filter: (((NOT foos.privacy_protected) OR (foos.user_id = 67962)) AND ((foos.status)::text = 'DONE'::text)) (8 rows)
当您同时具有LIMIT和ORDER BY时,优化程序已决定更快地通过键递减来遍历foo上未过滤的记录,直到它获得其余标准的五个匹配项。在其他情况下,它只是将查询作为嵌套循环运行,并返回所有记录。
Offhand,我想说的问题是,PG不会调查各种ID的联合分布,这就是为什么计划是这样次优的。 对于可能的解决方案:我假设你最近运行了ANALYZE。如果没有,请这样做。这可能解释了为什么你的估计时间很高,即使在快速返回的版本。如果问题仍然存在,可以运行ORDER BY作为子查询,并在外部查询中敲击LIMIT。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |