正则表达式 – 在R中组合字符向量中的行

发布时间：2020-12-14 05:37:38 所属栏目：百科来源：网络整理

导读：R中有一个约50,000行的字符向量(内容).但是,从文本文件读入的某些行在不同的行上,不应该是.具体来说,线条看起来像这样： [1] hello,[2] world[3] ""[4] how[5] are [6] you[7] "" 我想组合这些线条,以便我有一些看起来像这样的东西： [1] hello,world[2] how

R中有一个约50,000行的字符向量(内容).但是,从文本文件读入的某些行在不同的行上,不应该是.具体来说,线条看起来像这样：

[1] hello,[2] world
[3] ""
[4] how
[5] are 
[6] you
[7] ""

我想组合这些线条,以便我有一些看起来像这样的东西：

[1] hello,world
[2] how are you

我试过写一个for循环：

for(i in 1:length(content)){
    if(content[i+1] != ""){
        content[i+1] <- c(content[i],content[i+1])
    }
}

但是当我运行循环时,我收到一个错误：缺少需要TRUE / FALSE的值.

任何人都可以建议一个更好的方法来做到这一点,甚至可能甚至不使用循环？

谢谢！

编辑：
我实际上试图将其应用于每个都有数千行的文档语料库.关于如何将这些解决方案转换为可应用于每个文档内容的功能的任何想法？

解决方法

我认为有更优雅的解决方案,但这可能对您有用：

chars <- c("hello,","world","","how","are","you","")
###identify groups that belong together (id increases each time a "" is found)
ids <- cumsum(chars=="")

#split vector (an filter out "" by using the select vector)
select <- chars!=""
splitted <- split(chars[select],ids[select])

#paste the groups together
res <- sapply(splitted,paste,collapse=" ")

#remove names(if necessary,probably not)
res <- unname(res) #thanks @Roland

> res
[1] "hello,world" "how are you"

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!