regex – dplyr中的正则表达式匹配
发布时间:2020-12-14 06:03:28 所属栏目:百科 来源:网络整理
导读:在回答 this question,时,我写了以下代码: df - data.frame(Call_Num = c("HV5822.H4 C47 Circulating Collection,3rd Floor","QE511.4 .G53 1982 Circulating Collection,"TL515 .M63 Circulating Collection,"D753 .F4 Circulating Collection,"DB89.F7 D
在回答
this question,时,我写了以下代码:
df <- data.frame(Call_Num = c("HV5822.H4 C47 Circulating Collection,3rd Floor","QE511.4 .G53 1982 Circulating Collection,"TL515 .M63 Circulating Collection,"D753 .F4 Circulating Collection,"DB89.F7 D4 Circulating Collection,3rd Floor")) require(stringr) matches = str_match(df$Call_Num,"([A-Z]+)(d+)s*.") df2 <- data.frame(df,letter=matches[,2],number=matches[,3]) 现在我的问题是:有没有一种简单的方法将最后两行合并为一个dplyr调用,大概是使用mutate()?或者,我也对do()的解决方案感兴趣.对于mutate()方法,由于我们正在提取2个组,因此我将采用一个解决方案,使用不同的正则表达式调用str_match()两次,每个所需的组一个. 编辑:为了澄清,我在这里看到的主要挑战是str_match返回一个矩阵,我想知道如何在mutate()或do()中处理它.我对使用其他提取信息的方法解决原始问题不感兴趣.已经有很多这样的解决方案已经给出了here. 解决方法
你可以尝试做
df %>% do(data.frame(.,str_match(.$Call_Num,"([A-Z]+)(d+)s*.")[,-1],stringsAsFactors=FALSE)) %>% rename_(.dots=setNames(names(.)[-1],c('letter','number'))) # Call_Num letter number #1 HV5822.H4 C47 Circulating Collection,3rd Floor HV 5822 #2 QE511.4 .G53 1982 Circulating Collection,3rd Floor QE 511 #3 TL515 .M63 Circulating Collection,3rd Floor TL 515 #4 D753 .F4 Circulating Collection,3rd Floor D 753 #5 DB89.F7 D4 Circulating Collection,3rd Floor DB 89 或者@SamFirke评论说,也可以使用重命名列 --- %>% setNames(.,c(names(.)[1],"letter","number")) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |