有没有办法使用OVER子句而不是CTE来计算TSQL中的相关性?
发布时间:2020-12-12 08:55:03 所属栏目:MsSql教程 来源:网络整理
导读:假设您有一个包含列,Date,GroupID,X和Y的表. CREATE TABLE #sample ( [Date] DATETIME,GroupID INT,X FLOAT,Y FLOAT )DECLARE @date DATETIME = getdate()INSERT INTO #sample VALUES(@date,1,3)INSERT INTO #sample VALUES(DATEADD(d,@date),1)INSERT INTO #
假设您有一个包含列,Date,GroupID,X和Y的表.
CREATE TABLE #sample ( [Date] DATETIME,GroupID INT,X FLOAT,Y FLOAT ) DECLARE @date DATETIME = getdate() INSERT INTO #sample VALUES(@date,1,3) INSERT INTO #sample VALUES(DATEADD(d,@date),1) INSERT INTO #sample VALUES(DATEADD(d,2,4,2) INSERT INTO #sample VALUES(DATEADD(d,3,6,4) INSERT INTO #sample VALUES(DATEADD(d,5,7,5) INSERT INTO #sample VALUES(DATEADD(d,6) 并且您想要计算每个组的X和Y的相关性.目前我使用的CTE有点乱: ;WITH DataAvgStd AS (SELECT GroupID,AVG(X) AS XAvg,AVG(Y) AS YAvg,STDEV(X) AS XStdev,STDEV(Y) AS YSTDev,COUNT(*) AS SampleSize FROM #sample GROUP BY GroupID),ExpectedVal AS (SELECT s.GroupID,SUM(( X - XAvg ) * ( Y - YAvg )) AS ExpectedValue FROM #sample s JOIN DataAvgStd das ON s.GroupID = das.GroupID GROUP BY s.GroupID) SELECT das.GroupID,ev.ExpectedValue / ( das.SampleSize - 1 ) / ( das.XStdev * das.YSTDev ) AS Correlation FROM DataAvgStd das JOIN ExpectedVal ev ON das.GroupID = ev.GroupID DROP TABLE #sample 似乎应该有一种方法可以使用OVER和PARTITION一次性执行此操作而不需要任何子查询.理想情况下,TSQL会有一个函数,所以你可以写: SELECT GroupID,CORR(X,Y) OVER(PARTITION BY GroupID) FROM #sample GROUP BY GroupID 解决方法使用这个corellation公式,即使使用over(),也无法避免所有嵌套查询.问题是你不能在同一个查询中反复使用这两个组,也不能有嵌套的聚合函数,例如sum(x – avg(x)).因此,在最佳情况下,根据您的数据,您至少需要保留.你的代码看起来就像那样 ;WITH DataAvgStd AS (SELECT GroupID,STDEV(X) over(partition by GroupID) AS XStdev,STDEV(Y) over(partition by GroupID) AS YSTDev,COUNT(*) over(partition by GroupID) AS SampleSize,( X - AVG(X) over(partition by GroupID)) * ( Y - AVG(Y) over(partition by GroupID)) AS ExpectedValue FROM #sample s) SELECT distinct GroupID,SUM(ExpectedValue) over(partition by GroupID) / (SampleSize - 1 ) / ( XStdev * YSTDev ) AS Correlation FROM DataAvgStd 另一种方法是使用等效公式进行相关,如Wikipedia所述. 这可以写成 SELECT GroupID,Correlation=(COUNT(*) * SUM(X * Y) - SUM(X) * SUM(Y)) / (SQRT(COUNT(*) * SUM(X * X) - SUM(X) * SUM(x)) * SQRT(COUNT(*) * SUM(Y* Y) - SUM(Y) * SUM(Y))) FROM #sample s GROUP BY GroupID; (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |