在Postgresql上配对顺序事件

发布时间：2020-12-13 16:09:17 所属栏目：百科来源：网络整理

导读：我们正在记录用户在桌面上的iPad应用程序上执行的主要操作流程.每个流都有一个开始(标记为已启动)和一个标记为已取消或已完成的结束,并且不应存在任何重叠事件. 为用户启动,取消或完成的一组流程如下所示： user_id timestamp event_text event_numinfo@cafe

我们正在记录用户在桌面上的iPad应用程序上执行的主要操作流程.每个流都有一个开始(标记为已启动)和一个标记为已取消或已完成的结束,并且不应存在任何重叠事件.

为用户启动,取消或完成的一组流程如下所示：

user_id             timestamp                   event_text      event_num
info@cafe-test.de   2016-10-30 00:08:00.966+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:08:15.58+00   Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:08:15.581+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:34:44.134+00  Flow Finished   1
info@cafe-test.de   2016-10-30 00:42:26.102+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:42:49.276+00  Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:59:47.337+00  Flow Cancelled  2
info@cafe-test.de   2016-10-30 00:59:47.337+00  Flow Started    0
info@cafe-test.de   2016-10-30 00:59:47.928+00  Flow Cancelled  2

我们想要计算取消流量和完成流量的平均持续时间.为此,我们需要将事件Started与Cancelled或Finished配对.以下代码执行此操作,但无法解决我们遇到的以下数据质量问题：

>当客户想要在结束正在进行的流程(Flow1)之前启动新流程(让我们称之为Flow2)时,我们会在拍摄新流程的已启动事件时拍摄已取消的事件.所以Flow1已取消= Flow2已启动.但是,当我们使用窗口函数进行排序时,实际属于不同流的有序事件之间的超前/滞后得到匹配.
通过使用此代码：

WITH track_scf AS (SELECT user_id,timestamp,event_text,CASE WHEN event_text LIKE '%Started%' THEN 0 when event_text like '%Cancelled%' then 2 ELSE 1 END AS event_num FROM tracks ORDER BY 2,4 desc ) SELECT user_id,CASE WHEN event_num=0 then timestamp end as start,CASE WHEN LEAD(event_num,1) OVER (PARTITION BY user_id ORDER BY timestamp,event_num) <> 0 THEN LEAD(timestamp,event_num) END as end,event_num) <> 0 THEN LEAD(event_num,event_num) END as action FROM track_scf

我们得到这个结果：

user_id             start                       end                         action
info@cafe-test.de   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
info@cafe-test.de   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
info@cafe-test.de   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  NULL                        NULL
info@cafe-test.de   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.337+00  2
info@cafe-test.de   NULL                        2016-10-30 00:59:47.928+00  2

但我们应该得到这个：

user_id             start                       end                         action
info@cafe-test.de   2016-10-30 00:08:00.966+00  2016-10-30 00:08:15.58+00   2
info@cafe-test.de   2016-10-30 00:08:15.581+00  2016-10-30 00:34:44.134+00  1
info@cafe-test.de   2016-10-30 00:42:26.102+00  2016-10-30 00:42:49.276+00  2
info@cafe-test.de   2016-10-30 00:42:49.277+00  2016-10-30 00:59:47.337+00  2
info@cafe-test.de   2016-10-30 00:59:47.337+00  2016-10-30 00:59:47.928+00  2

如何更改代码以使配对正确？

解决方法

select      user_id,"start","end","action"

from       (select      user_id,timestamp                 as "start",lead (event_num)   over w as "action",lead ("timestamp") over w as "end",event_num

            from        tracks t

            window      w as (partition by user_id order by "timestamp",event_num desc)
            ) t

where       t.event_num = 0
;

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!