在R中向量化复杂的dplyr语句
发布时间:2020-12-15 05:21:58 所属栏目:Java 来源:网络整理
导读:我正在努力计算参加课程的学生数量,从那些能够参加课程的学生,并非所有学校都提供计算机,不同的学校提供??英语,那些能够学习计算和英语的学生会有所不同.例如.使用下面的测试数据,我们有: df - read.csv(text="school,student,course,result URN1,stu1,comp
我正在努力计算参加课程的学生数量,从那些能够参加课程的学生,并非所有学校都提供计算机,不同的学校提供??英语,那些能够学习计算和英语的学生会有所不同.例如.使用下面的测试数据,我们有:
df <- read.csv(text="school,student,course,result URN1,stu1,comp,A URN1,stu2,B URN1,stu3,C URN1,Eng,D URN1,ICT,E URN2,stu4,B URN2,stu5,C URN3,stu6,D URN3,E URN4,stu7,stu8,E URN5,stu9,stu10,E")
我有以下循环(嘘!)来做到这一点: library(magrittr) library(dplyr) for(c in unique(df$course)){ # c <- "comp" #get URNs of schools offering each course URNs <- df %>% filter(course == c) %>% distinct(school) %$% school #get number of students in each school offering course c num_possible <- df %>% filter(school %in% URNs) %>% summarise(n = n()) %$% n #get number of students taking course c num_actual <- df %>% filter(course == c) %>% summarise(n = n()) %$% n # get % of students taking course from those who could theoretically take c print(paste(c,"taken by",(100 * num_actual/num_possible),"% of possible students")) } 但是想要将它全部矢量化,但是,我无法将num_possible与num_actual放在同一个函数中: df %>% group_by(course) %>% summarise(num_possible = somesubfunction(),num_actual = n()) somesubfunction()应该返回可能参加课程的学生人数c 解决方法
您可以先创建一个辅助数据框,然后映射它以获得可能的学生数.考虑一下
school_students <- df %>% group_by(school) %>% summarise(students = n(),courses = paste0(unique(course),collapse = ",")) df %>% count(course) %>% mutate(possible = map_int(as.character(course),~sum(school_students[str_detect(school_students$courses,.),"students"]))) %>% mutate(pct = n / possible * 100) # A tibble: 3 x 4 course n possible pct <fct> <int> <int> <dbl> 1 comp 7 12 58.3 2 Eng 3 9 33.3 3 ICT 5 13 38.5 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |