加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > Windows > 正文

计算之前未发生的新值,而不是在最后一组中发生的值

发布时间:2020-12-13 22:31:05 所属栏目:Windows 来源:网络整理
导读:我试图计算每月独特的“新”用户数. New是一个之前没有出现的用户(从一开始)我也在尝试计算上个月没有出现的唯一用户数. 原始数据看起来像 library(dplyr) date - c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-0
我试图计算每月独特的“新”用户数. New是一个之前没有出现的用户(从一开始)我也在尝试计算上个月没有出现的唯一用户数.

原始数据看起来像

library(dplyr)
    date <- c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-01-12","2010-03-24")
    mth <- rep(c("2010-01","2010-02","2010-03"),3)
    user <- c("123","129","145","123","180","184","145")

    dt <- data.frame(date,mth,user)

    dt <- dt %>% arrange(date)

    dt

       date     mth user
1 2010-01-10 2010-01  123
2 2010-01-11 2010-01  123
3 2010-01-12 2010-01  180
4 2010-02-13 2010-02  129
5 2010-02-14 2010-02  129
6 2010-02-14 2010-02  184
7 2010-03-22 2010-03  145
8 2010-03-23 2010-03  180
9 2010-03-24 2010-03  145

答案应该是这样的

new <- c(2,2,1,1)
    totNew <- c(2,4,5,5)
    notLastMonth <- c(2,2)

    tmp <- cbind(dt,new,totNew,notLastMonth)
    tmp

        date     mth user new totNew notLastMonth
1 2010-01-10 2010-01  123   2      2            2
2 2010-01-11 2010-01  123   2      2            2
3 2010-01-12 2010-01  180   2      2            2
4 2010-02-13 2010-02  129   2      4            2
5 2010-02-14 2010-02  129   2      4            2
6 2010-02-14 2010-02  184   2      4            2
7 2010-03-22 2010-03  145   1      5            2
8 2010-03-23 2010-03  180   1      5            2
9 2010-03-24 2010-03  145   1      5            2

解决方法

这是一次尝试(代码正文中的解释)

dt %>%
  group_by(user) %>%
  mutate(Count = row_number()) %>% # Count appearances per user
  group_by(mth) %>%
  mutate(new = sum(Count == 1)) %>% # Count first appearances per months
  summarise(new = first(new),# Summarise new users per month (for cumsum)
            users = list(unique(user))) %>% # Create a list of unique users per month (for notLastMonth)
  mutate(totNew = cumsum(new),# Calculate overall cummulative sum of unique users
         notLastMonth = lengths(Map(setdiff,users,lag(users)))) %>% # Compare new users to previous month
  select(-users) %>%
  right_join(dt) # Join back to the real data

# A tibble: 9 × 6
#       mth   new totNew notLastMonth       date   user
#    <fctr> <int>  <int>        <int>     <fctr> <fctr>
# 1 2010-01     2      2            2 2010-01-10    123
# 2 2010-01     2      2            2 2010-01-11    123
# 3 2010-01     2      2            2 2010-01-12    180
# 4 2010-02     2      4            2 2010-02-13    129
# 5 2010-02     2      4            2 2010-02-14    129
# 6 2010-02     2      4            2 2010-02-14    184
# 7 2010-03     1      5            2 2010-03-22    145
# 8 2010-03     1      5            2 2010-03-23    180
# 9 2010-03     1      5            2 2010-03-24    145

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读