postgresql – 使用postgres进行线性回归
发布时间:2020-12-13 18:07:21 所属栏目:百科 来源:网络整理
导读:我使用Postgres,我有大量的行,每个站点有值和日期. (日期可以分开几天.) id | value | idstation | udate--------+-------+-----------+-----1 | 5 | 12 | 1984-02-11 00:00:002 | 7 | 12 | 1984-02-17 00:00:003 | 8 | 12 | 1984-02-21 00:00:004 | 9 | 12
我使用Postgres,我有大量的行,每个站点有值和日期.
(日期可以分开几天.) id | value | idstation | udate --------+-------+-----------+----- 1 | 5 | 12 | 1984-02-11 00:00:00 2 | 7 | 12 | 1984-02-17 00:00:00 3 | 8 | 12 | 1984-02-21 00:00:00 4 | 9 | 12 | 1984-02-23 00:00:00 5 | 4 | 12 | 1984-02-24 00:00:00 6 | 8 | 12 | 1984-02-28 00:00:00 7 | 9 | 14 | 1984-02-21 00:00:00 8 | 15 | 15 | 1984-02-21 00:00:00 9 | 14 | 18 | 1984-02-21 00:00:00 10 | 200 | 19 | 1984-02-21 00:00:00 原谅可能是一个愚蠢的问题,但我不是一个数据库大师. 是否可以直接输入一个SQL查询,该查询将计算每个日期的每个日期的线性回归,知道回归必须只计算实际的ID日期,前一个id日期和下一个id日期? 例如,id 2的线性回归必须以值7(实际),5(上一个),8(下一个)计算日期1984-02-17,1984-02-11和1984-02-21 编辑:我必须使用regr_intercept(value,udate)但我真的不知道如何执行此操作,如果我必须只使用每行的实际,上一个和下一个值/日期. Edit2:将3行添加到idstation(12); ID和日期编号已更改 希望你能帮助我,谢谢!
这是Joop的统计数据和Denis的窗口函数的组合:
WITH num AS ( SELECT id,idstation,(udate - '1984-01-01'::date) as idate -- count in dayse since jan 1984,value AS value FROM thedata ) -- id + the ids of the {prev,next} records -- within the same idstation group,drag AS ( SELECT id AS center,LAG(id) OVER www AS prev,LEAD(id) OVER www AS next FROM thedata WINDOW www AS (partition by idstation ORDER BY id) ) -- junction CTE between ID and its three feeders,tri AS ( SELECT center AS this,center AS that FROM drag UNION ALL SELECT center AS this,prev AS that FROM drag UNION ALL SELECT center AS this,next AS that FROM drag ) SELECT t.this,n.idstation,regr_intercept(value,idate) AS intercept,regr_slope(value,idate) AS slope,regr_r2(value,idate) AS rsq,regr_avgx(value,idate) AS avgx,regr_avgy(value,idate) AS avgy FROM num n JOIN tri t ON t.that = n.id GROUP BY t.this,n.idstation ; 结果: INSERT 0 7 this | idstation | intercept | slope | rsq | avgx | avgy ------+-----------+-------------------+-------------------+-------------------+------------------+------------------ 1 | 12 | -46 | 1 | 1 | 52 | 6 2 | 12 | -24.2105263157895 | 0.578947368421053 | 0.909774436090226 | 53.3333333333333 | 6.66666666666667 3 | 12 | -10.6666666666667 | 0.333333333333333 | 1 | 54.5 | 7.5 4 | 14 | | | | 51 | 9 5 | 15 | | | | 51 | 15 6 | 18 | | | | 51 | 14 7 | 19 | | | | 51 | 200 (7 rows) 使用rank()或row_number()函数可以更优雅地完成三组的聚类,这也允许使用更大的滑动窗口. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |