如何从文本文件中读取信息？

发布时间：2020-12-14 18:45:36 所属栏目：资源来源：网络整理

导读：我有数百个文本文件,每个文件中包含以下信息： *****Auto-Corelation Results******1 .09 -.19 .18 non-Significant*****STATISTICS FOR MANN-KENDELL TEST******S= 609VAR(S)= 162409.70Z= 1.51Random : No trend at 95%*****SENs STATISTICS ******SEN SLO

我有数百个文本文件,每个文件中包含以下信息：

*****Auto-Corelation Results******
1     .09    -.19     .18     non-Significant

*****STATISTICS FOR MANN-KENDELL TEST******
S=  609
VAR(S)=      162409.70
Z=           1.51
Random : No trend at 95%

*****SENs STATISTICS ******
SEN SLOPE =  .24

现在,我想读取所有这些文件,并从每个文件(例如.24)“收集”Sen的统计信息,并将其与相应的文件名一起编译成一个文件.我必须在R里做.

我使用过CSV文件,但不知道如何使用文本文件.

这是我现在使用的代码：

require(gtools)
GG <- grep("*.txt",list.files(),value = TRUE)
GG<-mixedsort(GG)
S <- sapply(seq(GG),function(i){
X <- readLines(GG[i])
grep("SEN SLOPE",X,value = TRUE)
})
spl <- unlist(strsplit(S,".*[^.0-9]"))
SenStat <- as.numeric(spl[nzchar(spl)])
SenStat<-data.frame( SenStat,file = GG)
write.table(SenStat,"sen.csv",sep = ",",row.names = FALSE)

当前代码无法正确读取所有值并给出此错误：

Warning message:
NAs introduced by coercion

另外,我没有将文件名作为Output的另一列.请帮忙！

诊断1

代码也在读取=符号.这是print(spl)的输出

[1] ""       "5.55"   ""       "-.18"   ""       "3.08"   ""       "3.05"   ""       "1.19"   ""       "-.32"  
[13] ""       ".22"    ""       "-.22"   ""       ".65"    ""       "1.64"   ""       "2.68"   ""       ".10"   
[25] ""       ".42"    ""       "-.44"   ""       ".49"    ""       "1.44"   ""       "=-1.07" ""       ".38"   
[37] ""       ".14"    ""       "=-2.33" ""       "4.76"   ""       ".45"    ""       ".02"    ""       "-.11"  
[49] ""       "=-2.64" ""       "-.63"   ""       "=-3.44" ""       "2.77"   ""       "2.35"   ""       "6.29"  
[61] ""       "1.20"   ""       "=-1.80" ""       "-.63"   ""       "5.83"   ""       "6.33"   ""       "5.42"  
[73] ""       ".72"    ""       "-.57"   ""       "3.52"   ""       "=-2.44" ""       "3.92"   ""       "1.99"  
[85] ""       ".77"    ""       "3.01"

诊断2

发现我认为的问题.负号有点棘手.在某些文件中

SEN SLOPE =-1.07
SEN SLOPE = -.11

由于=之后的差距,我正在为第一个获得NAs,但是代码正在读取第二个.如何修改正则表达式来解决这个问题？谢谢！

解决方法

假设“text.txt”是您的文本文件之一.使用readLines读入R,可以使用grep查找包含SEN SLOPE的行.如果没有其他参数,grep将返回找到正则表达式的元素的索引号.在这里,我们发现它是第11行.添加value = TRUE参数以获取读取的行.

x <- readLines("text.txt")
grep("SEN SLOPE",x)
## [1] 11
( gg <- grep("SEN SLOPE",x,value = TRUE) )
## [1] "SEN SLOPE =  .24"

要查找工作目录中的所有.txt文件,我们可以使用带有正则表达式的list.files.

list.files(pattern = "*.txt")
## [1] "text.txt"

循环播放多个文件

我创建了第二个文本文件text2.txt,其中包含不同的SEN SLOPE值,以说明如何将此方法应用于多个文件.我们可以使用sapply,然后使用strsplit来获取所需的spl值.

GG <- list.files(pattern = "*.txt")
S <- sapply(seq_along(GG),function(i){
    X <- readLines(GG[i])
    ifelse(length(X) > 0,grep("SEN SLOPE",value = TRUE),NA)
    ## added 04/23/14 to account for empty files (as per comment)
})
spl <- unlist(strsplit(S,split = ".*((=|(s=))|(=s|s=s))"))
## above regex changed to capture up to and including "=" and 
## surrounding space,if any - 04/23/14 (as per comment)
SenStat <- as.numeric(spl[nzchar(spl)])

然后我们可以将结果放入数据框并将其发送到具有write.table的文件

( SenStatDf <- data.frame(SenStat,file = GG) )
##   SenStat      file
## 1    0.46 text2.txt
## 2    0.24  text.txt

我们可以将它写入文件

write.table(SenStatDf,"myFile.csv",row.names = FALSE)

更新于2014年7月21日：

由于结果被写入文件,因此可以更加简单(和更快)

( SenStatDf <- cbind(
      SenSlope = c(lapply(GG,function(x){
          y <- readLines(x)
          z <- y[grepl("SEN SLOPE",y)]
          unlist(strsplit(z,split = ".*=s+"))[-1]
          }),recursive = TRUE),file = GG
 ) )
#      SenSlope file       
# [1,] ".46"   "test2.txt"
# [2,] ".24"   "test.txt"

然后写入和读入R

write.table(SenStatDf,"myFile.txt",row.names = FALSE)
read.table("myFile.txt",header = TRUE)
#   SenSlope      file
# 1     1.24 test2.txt
# 2     0.24  test.txt

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!