如何在PostgreSQL查询中排序不同的元组
我正在尝试在Postgres中提交一个只返回不同元组的查询.在我的示例查询中,我不希望对于cluster_id / feed_id组合多次存在条目的重复条目.如果我做一个简单的事:
select distinct on (cluster_info.cluster_id,feed_id) cluster_info.cluster_id,num_docs,feed_id,url_time from url_info join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) where feed_id in (select pot_seeder from potentials) and num_docs > 5 and url_time > '2012-04-16'; 我得到了,但我也想根据num_docs进行分组.所以,当我做以下事情时: select distinct on (cluster_info.cluster_id,url_time from url_info join cluster_info on (cluster_info.cluster_id = url_info.cluster_id) where feed_id in (select pot_seeder from potentials) and num_docs > 5 and url_time > '2012-04-16' order by num_docs desc; 我收到以下错误: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions LINE 1: select distinct on (cluster_info.cluster_id,feed_id) cluste... 我想我理解为什么我会收到错误(除非我以某种方式明确描述该组,否则不能通过元组进行分组)但是我该怎么做?或者,如果我对错误的解释不正确,有没有办法实现我的初始目标? 解决方法
最左边的ORDER BY项不能与DISTINCT子句的项不一致.我引用
the manual about
DISTINCT :
尝试: SELECT * FROM ( SELECT DISTINCT ON (c.cluster_id,feed_id) c.cluster_id,url_time FROM url_info u JOIN cluster_info c ON (c.cluster_id = u.cluster_id) WHERE feed_id IN (SELECT pot_seeder FROM potentials) AND num_docs > 5 AND url_time > '2012-04-16' ORDER BY c.cluster_id,url_time -- first columns match DISTINCT -- the rest to pick certain values for dupes -- or did you want to pick random values for dupes? ) x ORDER BY num_docs DESC; 或者使用GROUP BY: SELECT c.cluster_id,url_time FROM url_info u JOIN cluster_info c ON (c.cluster_id = u.cluster_id) WHERE feed_id IN (SELECT pot_seeder FROM potentials) AND num_docs > 5 AND url_time > '2012-04-16' GROUP BY c.cluster_id,feed_id ORDER BY num_docs DESC; 如果c.cluster_id,feed_id是所有(在本例中都是)表中包含SELECT列表中的列的主键列,那么这只适用于PostgreSQL 9.1或更高版本. 否则,您需要GROUP BY其余列或聚合或提供更多信息. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |