正则表达式模式匹配中的错误,用于将文本检索到数据帧的两列中
考虑以下假设数据:
x <- "There is a horror movie running in the iNox theater. : If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" y <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken. To specify the row names and not a column. By name or number. : If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" z <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one. : And the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" df <- data.frame(Text = c(x,y,z),row.names = NULL,stringsAsFactors = F) 您是否注意到在不同位置有“:”.例如: >在’x’中它(“:”)在第一句之后. 我想做什么,创建两列,以便: >只考虑第一个“:”而不是最后一个. 想要’x’的输出: Col1 Col2 There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please 想要输出“y”(因为“:”因此在前三个句子中找不到): Col1 Col2 NA There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken. To specify the row names and not a column. By name or number. : If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please 就像上面’y’的结果一样,’z’的通缉输出结果应该是: Col1 Col2 NA all of the text from 'z' 我想要做的是: resX <- data.frame(Col1 = gsub("s:.*$","1",df$Text[[1]]),Col2 = gsub("^[^:]+(?:).s",df$Text[[1]])) resY <- data.frame(Col1 = gsub("s:.*$",df$Text[[2]]),df$Text[[2]])) resZ <- data.frame(Col1 = gsub("s:.*$",df$Text[[3]]),df$Text[[3]])) 然后使用rbind将上面的内容合并到结果数据帧“resDF”中. 问题是: >以上可以使用“for()循环”或任何其他方法来完成,使代码更简单. 解决方法
你可以试试这个负面的前瞻性正则表达式:
^(?s)(?!(?:(?:[^:]*?.){3,}))(.*?):(.*)$ Regex Demo and Detailed explanation of the regex
如果你的条件满足,那么正则表达式将返回true,你应该得到2份 第1组包含第一个值:第2组将包含值. 如果条件未满足,则将整个字符串复制到第2列并将所需的任何内容作为第1列 包含名为过程数据的方法的更新样本片段将为您完成这些技巧.如果条件满足,那么它将拆分数据并放入col1和col2 ….如果在输入中y和z的情况下不满足条件…它将NA放在col1和整个值中在col2. 运行示例源 – > ideone: library(stringr) x <- "There is a horror movie running in the iNox theater. : If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" y <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken. To specify the row names and not a column. By name or number. : If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" z <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one. : And the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please" df <- data.frame(Text = c(x,stringsAsFactors = F) resDF <- data.frame("Col1" = character(),"Col2" = character(),stringsAsFactors=FALSE) processData <- function(a) { patt <- "^(?s)(?!(?:(?:[^:]*?.){3,}))(.*?):(.*)$" if(grepl(patt,a,perl=TRUE)) { result<-str_match(a,patt) col1<-result[2] col2<-result[3] } else { col1<-"NA" col2<-a } return(c(col1,col2)) } for (i in 1:nrow(df)){ tmp <- df[i,] resDF[nrow(resDF) + 1,] <- processData(tmp) } print(resDF) 样本输出: Col1 1 There is a horror movie running in the iNox theater. 2 NA 3 NA Col2 1 If row names are supplied of length one and the data n frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). n If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify n the row names and not a column (by name or number) Can we go : Please 2 There is a horror movie running in the iNox theater. If row names are supplied of length one and the data n frame has a single row,the row.names is taken. To specify the row names and not a column. By name or number. : n If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify n the row names and not a column (by name or number) Can we go : Please 3 There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row,the row.names is taken to specify the row names and not a column (by name or number). n If row names are supplied of length one. : And the data frame has a single row,the row.names is taken to specify n the row names and not a column (by name or number) Can we go : Please (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |