使用带有日期值和聚合的dcast.data.table
试图解决这个问题.假设你有一个data.table:
dt <- data.table (person=c('bob','bob','bob'),door=c('front door','front door','front door'),type=c('timeIn','timeIn','timeOut'),time=c( as.POSIXct('2016 12 02 06 05 01',format = '%Y %m %d %H %M %S'),as.POSIXct('2016 12 02 06 05 02',as.POSIXct('2016 12 02 06 05 03',format = '%Y %m %d %H %M %S') ) ) 我想将它转动为这样 person door timeIn timeOut bob front door min(<date/time>) max(<date/time>) 我似乎无法为dcast.data.table获得正确的语法.我试过了 dcast.data.table( dt,person + door ~ type,value.var = 'time',fun.aggregate = function(x) ifelse(type == 'timeIn',min(x),max(x)) ) 这会引发错误:
我也尝试过: dcast.data.table(dt,value.var = 'time') 但结果却抛弃了我的约会 person door timeIn timeOut 1: bob front door 2 1 任何建议,将不胜感激. TIA
有几种方法可以使用dcast实现所需的结果. jazzurro的解决方案在重塑结果之前进行聚合.这里的方法直接使用dcast,但可能需要一些后处理.我们正在使用jazzurro的数据,这些数据经过调整以遵守UTC时区和CRAN版本1.10.0的data.table.
1.获得工作的ifelse 据Q报道, dcast( dt,max(x)) ) 返回错误消息.错误消息的全文包括使用fill参数的提示.不幸的是,ifelse()不尊重POSIXct类(有关详细信息,请参阅?ifelse),因此需要强制执行. 同 dcast( dt,fun.aggregate = function(x) lubridate::as_datetime(ifelse(type == 'timeIn',max(x))),fill = 0 ) 我们得到了 # person door timeIn timeOut #1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05 #2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05 2.替代ifelse ifelse的帮助页面建议 (tmp <- yes; tmp[!test] <- no[!test]; tmp) 作为替代.遵循这个建议,fun.aggregate = function(x) { test <- type == "timeIn"; tmp <- min(x); tmp[!test] = max(x)[!test]; tmp } ) 回报 # person door timeIn timeOut #1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05 #2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05 请注意,既不需要填充参数也不需要强制使用POSIXct. 3.使用增强的dcast 使用最新版本的dcast.data.table,我们可以为fun.aggregate提供一系列函数: dcast(dt,fun = list(min,max)) 回报 # person door time_min_timeIn time_min_timeOut time_max_timeIn time_max_timeOut #1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:03 2016-12-02 07:06:02 2016-12-02 07:06:05 #2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:03 2016-12-02 06:05:02 2016-12-02 06:05:05 我们可以删除不需要的列并重命名其他列 dcast(dt,max))[,.(person,door,timeIn = time_min_timeIn,timeOut = time_max_timeOut)] 这让我们 # person door timeIn timeOut #1: ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05 #2: bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05 数据 如上所述,我们使用的是jazzurro的数据 dt <- structure(list(person = c("bob","bob","ana","ana"),door = c("front door","front door","front door" ),type = c("timeIn","timeIn","timeOut","timeOut"),time = structure(c(1480658701,1480658702,1480658703,1480658705,1480662361,1480662362,1480662363,1480662365),class = c("POSIXct","POSIXt"))),.Names = c("person","door","type","time"),row.names = c(NA,-8L),class = c("data.table","data.frame")) 但是将时区强制转换为UTC. 同 dt[,time := lubridate::with_tz(time,"UTC")] 我们有 dt # person door type time #1: bob front door timeIn 2016-12-02 06:05:01 #2: bob front door timeIn 2016-12-02 06:05:02 #3: bob front door timeOut 2016-12-02 06:05:03 #4: bob front door timeOut 2016-12-02 06:05:05 #5: ana front door timeIn 2016-12-02 07:06:01 #6: ana front door timeIn 2016-12-02 07:06:02 #7: ana front door timeOut 2016-12-02 07:06:03 #8: ana front door timeOut 2016-12-02 07:06:05 独立于当地时区. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |