使用T-SQL聚合仅相邻的记录
我(在示例中简化了)包含以下数据的表
Row Start Finish ID Amount --- --------- ---------- -- ------ 1 2008-10-01 2008-10-02 01 10 2 2008-10-02 2008-10-03 02 20 3 2008-10-03 2008-10-04 01 38 4 2008-10-04 2008-10-05 01 23 5 2008-10-05 2008-10-06 03 14 6 2008-10-06 2008-10-07 02 3 7 2008-10-07 2008-10-08 02 8 8 2008-10-08 2008-11-08 03 19 日期表示一段时间,ID是系统在该时间段内所处的状态,金额是与该状态相关的值. 我想要做的是聚合具有相同ID号的相邻行的Amounts,但保持相同的整体序列,以便可以组合连续的运行.因此,我希望得到如下数据: Row Start Finish ID Amount --- --------- ---------- -- ------ 1 2008-10-01 2008-10-02 01 10 2 2008-10-02 2008-10-03 02 20 3 2008-10-03 2008-10-05 01 61 4 2008-10-05 2008-10-06 03 14 5 2008-10-06 2008-10-08 02 11 6 2008-10-08 2008-11-08 03 19 我正在使用可以放入SP的T-SQL解决方案,但是我无法通过简单查询看到如何做到这一点.我怀疑它可能需要某种迭代,但我不想走那条路. 我想要进行这种聚合的原因是该过程的下一步是按照序列中出现的唯一ID进行SUM()和Count(),这样我的最终数据将如下所示: ID Counts Total -- ------ ----- 01 2 71 02 2 31 03 2 33 但是,如果我做一个简单的 SELECT COUNT(ID),SUM(Amount) FROM data GROUP BY ID 在原始的桌子上,我得到了类似的东西 ID Counts Total -- ------ ----- 01 3 71 02 3 31 03 2 33 这不是我想要的. 解决方法如果您在 R T Snodgrass之前阅读“在SQL中开发面向时间的数据库应用程序”一书(其pdf可从他的网站上的出版物中获得),并且在p165-166上得到图6.25,您将发现非可以在当前示例中使用的简单SQL,用于将具有相同ID值和连续时间间隔的各行分组.下面的查询开发接近正确,但最后发现了一个问题,它的源代码在第一个SELECT语句中.我还没有找到为什么给出错误的答案. [如果有人可以在他们的DBMS上测试SQL并告诉我第一个查询是否在那里正常工作,那将是一个很大的帮助!] 它看起来像: -- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented -- Database Applications in SQL" CREATE TABLE Data ( Start DATE,Finish DATE,ID CHAR(2),Amount INT ); INSERT INTO Data VALUES('2008-10-01','2008-10-02','01',10); INSERT INTO Data VALUES('2008-10-02','2008-10-03','02',20); INSERT INTO Data VALUES('2008-10-03','2008-10-04',38); INSERT INTO Data VALUES('2008-10-04','2008-10-05',23); INSERT INTO Data VALUES('2008-10-05','2008-10-06','03',14); INSERT INTO Data VALUES('2008-10-06','2008-10-07',3); INSERT INTO Data VALUES('2008-10-07','2008-10-08',8); INSERT INTO Data VALUES('2008-10-08','2008-11-08',19); SELECT DISTINCT F.ID,F.Start,L.Finish FROM Data AS F,Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish))); 该查询的输出是: 01 2008-10-01 2008-10-02 01 2008-10-03 2008-10-05 02 2008-10-02 2008-10-03 02 2008-10-06 2008-10-08 03 2008-10-05 2008-10-06 03 2008-10-05 2008-11-08 03 2008-10-08 2008-11-08 编辑:倒数第二行有一个问题 – 它应该不存在.而且我还不清楚(它)来自哪里. 现在我们需要将该复杂表达式视为另一个SELECT语句的FROM子句中的查询表达式,该语句将对与上面显示的最大范围重叠的条目求和给定ID的金额值. SELECT M.ID,M.Start,M.Finish,SUM(D.Amount) FROM Data AS D,(SELECT DISTINCT F.ID,L.Finish FROM Data AS F,Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID,M.Finish ORDER BY M.ID,M.Start; 这给出了: ID Start Finish Amount 01 2008-10-01 2008-10-02 10 01 2008-10-03 2008-10-05 61 02 2008-10-02 2008-10-03 20 02 2008-10-06 2008-10-08 11 03 2008-10-05 2008-10-06 14 03 2008-10-05 2008-11-08 33 -- Here be trouble! 03 2008-10-08 2008-11-08 19 编辑:这几乎是正确的数据集,用于执行原始问题所请求的COUNT和SUM聚合,因此最终答案是: SELECT I.ID,COUNT(*) AS Number,SUM(I.Amount) AS Amount FROM (SELECT M.ID,SUM(D.Amount) AS Amount FROM Data AS D,L.Finish FROM Data AS F,Data AS L WHERE F.Start < L.Finish AND F.ID = L.ID -- There are no gaps between F.Finish and L.Start AND NOT EXISTS (SELECT * FROM Data AS M WHERE M.ID = F.ID AND F.Finish < M.Start AND M.Start < L.Start AND NOT EXISTS (SELECT * FROM Data AS T1 WHERE T1.ID = F.ID AND T1.Start < M.Start AND M.Start <= T1.Finish)) -- Cannot be extended further AND NOT EXISTS (SELECT * FROM Data AS T2 WHERE T2.ID = F.ID AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR (T2.Start <= L.Finish AND L.Finish < T2.Finish))) ) AS M WHERE D.ID = M.ID AND M.Start <= D.Start AND M.Finish >= D.Finish GROUP BY M.ID,M.Finish ) AS I GROUP BY I.ID ORDER BY I.ID; id number amount 01 2 71 02 2 31 03 3 66 评论: 对于记录:在Solaris 10上的IBM Informix Dynamic Server 11.50上进行了测试.但是,应该可以在任何其他符合标准的标准SQL DBMS上正常工作. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |