加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

在R中向量化复杂的dplyr语句

发布时间:2020-12-15 05:21:58 所属栏目:Java 来源:网络整理
导读:我正在努力计算参加课程的学生数量,从那些能够参加课程的学生,并非所有学校都提供计算机,不同的学校提供??英语,那些能够学习计算和英语的学生会有所不同.例如.使用下面的测试数据,我们有: df - read.csv(text="school,student,course,result URN1,stu1,comp
我正在努力计算参加课程的学生数量,从那些能够参加课程的学生,并非所有学校都提供计算机,不同的学校提供??英语,那些能够学习计算和英语的学生会有所不同.例如.使用下面的测试数据,我们有:

df <- read.csv(text="school,student,course,result
      URN1,stu1,comp,A
      URN1,stu2,B
      URN1,stu3,C
      URN1,Eng,D
      URN1,ICT,E
      URN2,stu4,B
      URN2,stu5,C
      URN3,stu6,D
      URN3,E
      URN4,stu7,stu8,E
      URN5,stu9,stu10,E")

[1] “comp taken by 58.3333333333333 % of possible students”

[1] “Eng taken by 33.3333333333333 % of possible students”

[1] “ICT taken by 38.4615384615385 % of possible students”

我有以下循环(嘘!)来做到这一点:

library(magrittr)
library(dplyr)

for(c in unique(df$course)){
  # c <- "comp"
  #get URNs of schools offering each course
  URNs <- df %>% filter(course == c) %>% distinct(school) %$% school
  #get number of students in each school offering course c
  num_possible <- df %>% filter(school %in% URNs) %>% summarise(n = n()) %$% n
  #get number of students taking course c 
  num_actual <- df %>% filter(course == c) %>% summarise(n = n()) %$% n

  # get % of students taking course from those who could theoretically take c
  print(paste(c,"taken by",(100 * num_actual/num_possible),"% of possible students"))
}

但是想要将它全部矢量化,但是,我无法将num_possible与num_actual放在同一个函数中:

df %>% group_by(course) %>% summarise(num_possible = somesubfunction(),num_actual = n())

somesubfunction()应该返回可能参加课程的学生人数c

解决方法

您可以先创建一个辅助数据框,然后映射它以获得可能的学生数.考虑一下

school_students <- df %>% 
  group_by(school) %>% 
  summarise(students = n(),courses = paste0(unique(course),collapse = ","))

df %>% 
  count(course) %>%
  mutate(possible = map_int(as.character(course),~sum(school_students[str_detect(school_students$courses,.),"students"]))) %>%
  mutate(pct = n / possible * 100)

# A tibble: 3 x 4
  course     n possible   pct
  <fct>  <int>    <int> <dbl>
1 comp       7       12  58.3
2 Eng        3        9  33.3
3 ICT        5       13  38.5

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读