加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

正则表达式 – 如何根据部分字符串匹配R中的其他列在数据帧中创

发布时间:2020-12-13 22:55:58 所属栏目:百科 来源:网络整理
导读:我在 r中有一个数据帧,带有2列GL和GLDESC,并且根据GLDESC列中的某些数据,添加一个名为KIND的第3列. 数据框如下: GL GLDESC1 515100 Payroll-Indir Salary Labor2 515900 Payroll-Indir Compensated Absences3 532300 Bulk Gas4 539991 Area Charge In5 5510
我在 r中有一个数据帧,带有2列GL和GLDESC,并且根据GLDESC列中的某些数据,添加一个名为KIND的第3列.

数据框如下:

GL                             GLDESC
1 515100         Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300                           Bulk Gas
4 539991                     Area Charge In
5 551000        Repairs & Maint-Spare Parts
6 551100                 Supplies-Operating
7 551300                        Consumables

对于数据表的每行:

>如果GLDESC在字符串中包含单词Payroll,那我想要KIND为工资单
>如果GLDESC在字符串中包含“Gas”字样,那么我希望KIND成为“材料”
>在所有其他情况下,我想要KIND是其他的

我在stackoverflow上寻找类似的例子,但是找不到任何的,也在R中看到在转换,grep,apply和正则表达式上的虚拟变量只尝试匹配GLDESC列的一部分,然后用该类型的字段填充KIND列,无法使其工作.

任何帮助都不胜感激.

感谢:D

由于您只有两个条件,您可以使用嵌套ifelse:
#random data; it wasn't easy to copy-paste yours  
DF <- data.frame(GL = sample(10),GLDESC = paste(sample(letters,10),c("gas","payroll12","GaSer","asdf","qweaa","PayROll-12","asdfg","GAS--2","fghfgh","qweee"),sample(letters,sep = " "))

DF$KIND <- ifelse(grepl("gas",DF$GLDESC,ignore.case = T),"Materials",ifelse(grepl("payroll","Payroll","Other"))

DF
#   GL         GLDESC      KIND
#1   8        e gas l Materials
#2   1  c payroll12 y   Payroll
#3  10      m GaSer v Materials
#4   6       t asdf n     Other
#5   2      w qweaa t     Other
#6   4 r PayROll-12 q   Payroll
#7   9      n asdfg a     Other
#8   5     d GAS--2 w Materials
#9   7     s fghfgh e     Other
#10  3      g qweee k     Other

编辑10/3/2016(..收到比预期更多的关注)

处理更多模式的可能解决方案可能是迭代所有模式,并且每当有匹配时,逐渐减少比较的数量:

ff = function(x,patterns,replacements = patterns,fill = NA,...)
{
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill),length(x))    
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
        greps = grepl(patterns[[i]],x[empty],...)
        ans[empty[greps]] = replacements[[i]]  
        empty = empty[!greps]
    }

    return(ans)
}

ff(DF$GLDESC,"payroll"),c("Materials","Payroll"),"Other",ignore.case = TRUE)
# [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"

ff(c("pat1a pat2","pat1a pat1b","pat3","pat4"),c("pat1a|pat1b","pat2","pat3"),c("1","2","3"),fill = "empty")
#[1] "1"     "1"     "3"     "empty"

ff(c("pat1a pat2",c("pat2","pat1a|pat1b",c("2","1",fill = "empty")
#[1] "2"     "1"     "3"     "empty"

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读