计算之前未发生的新值,而不是在最后一组中发生的值
发布时间:2020-12-13 22:31:05 所属栏目:Windows 来源:网络整理
导读:我试图计算每月独特的“新”用户数. New是一个之前没有出现的用户(从一开始)我也在尝试计算上个月没有出现的唯一用户数. 原始数据看起来像 library(dplyr) date - c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-0
我试图计算每月独特的“新”用户数. New是一个之前没有出现的用户(从一开始)我也在尝试计算上个月没有出现的唯一用户数.
原始数据看起来像 library(dplyr) date <- c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-01-12","2010-03-24") mth <- rep(c("2010-01","2010-02","2010-03"),3) user <- c("123","129","145","123","180","184","145") dt <- data.frame(date,mth,user) dt <- dt %>% arrange(date) dt date mth user 1 2010-01-10 2010-01 123 2 2010-01-11 2010-01 123 3 2010-01-12 2010-01 180 4 2010-02-13 2010-02 129 5 2010-02-14 2010-02 129 6 2010-02-14 2010-02 184 7 2010-03-22 2010-03 145 8 2010-03-23 2010-03 180 9 2010-03-24 2010-03 145 答案应该是这样的 new <- c(2,2,1,1) totNew <- c(2,4,5,5) notLastMonth <- c(2,2) tmp <- cbind(dt,new,totNew,notLastMonth) tmp date mth user new totNew notLastMonth 1 2010-01-10 2010-01 123 2 2 2 2 2010-01-11 2010-01 123 2 2 2 3 2010-01-12 2010-01 180 2 2 2 4 2010-02-13 2010-02 129 2 4 2 5 2010-02-14 2010-02 129 2 4 2 6 2010-02-14 2010-02 184 2 4 2 7 2010-03-22 2010-03 145 1 5 2 8 2010-03-23 2010-03 180 1 5 2 9 2010-03-24 2010-03 145 1 5 2 解决方法
这是一次尝试(代码正文中的解释)
dt %>% group_by(user) %>% mutate(Count = row_number()) %>% # Count appearances per user group_by(mth) %>% mutate(new = sum(Count == 1)) %>% # Count first appearances per months summarise(new = first(new),# Summarise new users per month (for cumsum) users = list(unique(user))) %>% # Create a list of unique users per month (for notLastMonth) mutate(totNew = cumsum(new),# Calculate overall cummulative sum of unique users notLastMonth = lengths(Map(setdiff,users,lag(users)))) %>% # Compare new users to previous month select(-users) %>% right_join(dt) # Join back to the real data # A tibble: 9 × 6 # mth new totNew notLastMonth date user # <fctr> <int> <int> <int> <fctr> <fctr> # 1 2010-01 2 2 2 2010-01-10 123 # 2 2010-01 2 2 2 2010-01-11 123 # 3 2010-01 2 2 2 2010-01-12 180 # 4 2010-02 2 4 2 2010-02-13 129 # 5 2010-02 2 4 2 2010-02-14 129 # 6 2010-02 2 4 2 2010-02-14 184 # 7 2010-03 1 5 2 2010-03-22 145 # 8 2010-03 1 5 2 2010-03-23 180 # 9 2010-03 1 5 2 2010-03-24 145 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
相关内容
- 1 windows MySql 5.6 安装
- windows – 为什么调用ProcessGroupPolicyEx回调会导致访问
- windows – 批处理文件中的菜单
- 在Windows上无法使用Redis绑定TCP侦听器*:6379
- windows-server-2008 – Windows Server 2008服务器上的系统
- windows – 确保在任何给定时间只有1个PowerShell脚本实例正
- .net – 了解Windows中的AppDomains
- Windows cmd将一个命令的输出作为参数传递给另一个命令
- Windows Azure Access Control和Windows Phone 8
- windows – 在堆栈上分配更多页面大小的缓冲区会破坏内存吗
推荐文章
站长推荐
- windows – windbg:有可能在我自己的程序中嵌入
- windows-phone-7.1 – 如何在Windows8上运行Wind
- .net – WPF中的应用程序级快捷键
- windows – 使用vbscript激活(带到前台)特定窗口
- Windows Server 2008 R2(x64) IIS7+PHP5.6.30(F
- windows-server-2012-r2 – Windows 2012 DNS服务
- windows-phone-7 – 如何在窗口电话7的列表框中应
- cmd,bat和dos的区别
- windows-server-2008-r2 – Windows Server 2008
- windows-server-2008 – 将AD从2003 32bit迁移到
热点阅读