加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 站长学院 > MsSql教程 > 正文

sql – 获取envelope.i.e重叠时间跨度

发布时间:2020-12-12 16:22:44 所属栏目:MsSql教程 来源:网络整理
导读:我有一个像这样的在线会话的表(空行只是为了更好的可见性): ip_address | start_time | stop_time------------|------------------|------------------10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:1210.10.10.10 | 2016-04-02 08:11 | 2016-04-02 08:20
我有一个像这样的在线会话的表(空行只是为了更好的可见性):
ip_address  | start_time       | stop_time
------------|------------------|------------------
10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:12
10.10.10.10 | 2016-04-02 08:11 | 2016-04-02 08:20

10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:10
10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:08
10.10.10.10 | 2016-04-02 09:05 | 2016-04-02 09:11
10.10.10.10 | 2016-04-02 09:02 | 2016-04-02 09:15
10.10.10.10 | 2016-04-02 09:10 | 2016-04-02 09:12

10.66.44.22 | 2016-04-02 08:05 | 2016-04-02 08:07
10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11

我需要“信封”在线时间跨度:

ip_address  | full_start_time  | full_stop_time
------------|------------------|------------------
10.10.10.10 | 2016-04-02 08:00 | 2016-04-02 08:20
10.10.10.10 | 2016-04-02 09:00 | 2016-04-02 09:15
10.66.44.22 | 2016-04-02 08:03 | 2016-04-02 08:11

我有这个查询返回所需的结果:

WITH t AS 
    -- Determine full time-range of each IP
    (SELECT ip_address,MIN(start_time) AS min_start_time,MAX(stop_time) AS max_stop_time FROM IP_SESSIONS GROUP BY ip_address),t2 AS
    -- compose ticks
    (SELECT DISTINCT ip_address,min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE AS ts
    FROM t
    CONNECT BY min_start_time + (LEVEL-1) * INTERVAL '1' MINUTE <= max_stop_time),t3 AS 
    -- get all "online" ticks
    (SELECT DISTINCT ip_address,ts
    FROM t2
        JOIN IP_SESSIONS USING (ip_address)
    WHERE ts BETWEEN start_time AND stop_time),t4 AS
    (SELECT ip_address,ts,LAG(ts) OVER (PARTITION BY ip_address ORDER BY ts) AS previous_ts
    FROM t3),t5 AS 
    (SELECT ip_address,SUM(DECODE(previous_ts,NULL,1,0 + (CASE WHEN previous_ts + INTERVAL '1' MINUTE <> ts THEN 1 ELSE 0 END))) 
            OVER (PARTITION BY ip_address ORDER BY ts ROWS UNBOUNDED PRECEDING) session_no
    FROM t4)
SELECT ip_address,MIN(ts) AS full_start_time,MAX(ts) AS full_stop_time
FROM t5
GROUP BY ip_address,session_no
ORDER BY 1,2;

但是,我对表现感到担忧.该表有数亿行,时间分辨率为毫秒(不是示例中给出的一分钟).因此,CTE t3将是巨大的.有没有人有避免自我加入和“连接”的解决方案?

单个智能Analytic Function会很棒.

解决方法

试试这个吧.我尽我所能地测试了它,我相信它涵盖了所有的可能性,包括合并相邻的间隔(10:15到10:30和10:30到10:40结合成一个间隔,10:15到10:40 ).它应该也很快,它使用不多.
with m as
        (
         select ip_address,start_time,max(stop_time) over (partition by ip_address order by start_time 
                             rows between unbounded preceding and 1 preceding) as m_time
         from ip_sessions
         union all
         select ip_address,max(stop_time) from ip_sessions group by ip_address
        ),n as
        (
         select ip_address,m_time 
         from m 
         where start_time > m_time or start_time is null or m_time is null
        ),f as
        (
         select ip_address,lead(m_time) over (partition by ip_address order by start_time) as stop_time
         from n
        )
select * from f where start_time is not null
/

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读