加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

使用带有日期值和聚合的dcast.data.table

发布时间:2020-12-13 20:46:19 所属栏目:百科 来源:网络整理
导读:试图解决这个问题.假设你有一个data.table: dt - data.table (person=c('bob','bob','bob'),door=c('front door','front door','front door'),type=c('timeIn','timeIn','timeOut'),time=c(as.POSIXct('2016 12 02 06 05 01',format = '%Y %m %d %H %M %S')
试图解决这个问题.假设你有一个data.table:
dt <- data.table (person=c('bob','bob','bob'),door=c('front door','front door','front door'),type=c('timeIn','timeIn','timeOut'),time=c(
as.POSIXct('2016 12 02 06 05 01',format = '%Y %m %d %H %M %S'),as.POSIXct('2016 12 02 06 05 02',as.POSIXct('2016 12 02 06 05 03',format = '%Y %m %d %H %M %S')                     )
)

我想将它转动为这样

person        door        timeIn             timeOut

bob           front door  min(<date/time>) max(<date/time>)

我似乎无法为dcast.data.table获得正确的语法.我试过了

dcast.data.table(
  dt,person + door ~ type,value.var = 'time',fun.aggregate = function(x) ifelse(type == 'timeIn',min(x),max(x))
)

这会引发错误:

Aggregating function(s) should take vector inputs and return a single value (length=1).

我也尝试过:

dcast.data.table(dt,value.var = 'time')

但结果却抛弃了我的约会

person       door timeIn timeOut
1:    bob front door      2       1

任何建议,将不胜感激. TIA

有几种方法可以使用dcast实现所需的结果. jazzurro的解决方案在重塑结果之前进行聚合.这里的方法直接使用dcast,但可能需要一些后处理.我们正在使用jazzurro的数据,这些数据经过调整以遵守UTC时区和CRAN版本1.10.0的data.table.

1.获得工作的ifelse

据Q报道,

dcast(
  dt,max(x))
)

返回错误消息.错误消息的全文包括使用fill参数的提示.不幸的是,ifelse()不尊重POSIXct类(有关详细信息,请参阅?ifelse),因此需要强制执行.

dcast(
  dt,fun.aggregate = function(x) 
    lubridate::as_datetime(ifelse(type == 'timeIn',max(x))),fill = 0
)

我们得到了

#   person       door              timeIn             timeOut
#1:    ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2:    bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05

2.替代ifelse

ifelse的帮助页面建议

(tmp <- yes; tmp[!test] <- no[!test]; tmp)

作为替代.遵循这个建议,fun.aggregate = function(x) { test <- type == "timeIn"; tmp <- min(x); tmp[!test] = max(x)[!test]; tmp } )

回报

#   person       door              timeIn             timeOut
#1:    ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2:    bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05

请注意,既不需要填充参数也不需要强制使用POSIXct.

3.使用增强的dcast

使用最新版本的dcast.data.table,我们可以为fun.aggregate提供一系列函数:

dcast(dt,fun = list(min,max))

回报

#   person       door     time_min_timeIn    time_min_timeOut     time_max_timeIn    time_max_timeOut
#1:    ana front door 2016-12-02 07:06:01 2016-12-02 07:06:03 2016-12-02 07:06:02 2016-12-02 07:06:05
#2:    bob front door 2016-12-02 06:05:01 2016-12-02 06:05:03 2016-12-02 06:05:02 2016-12-02 06:05:05

我们可以删除不需要的列并重命名其他列

dcast(dt,max))[,.(person,door,timeIn = time_min_timeIn,timeOut = time_max_timeOut)]

这让我们

#   person       door              timeIn             timeOut
#1:    ana front door 2016-12-02 07:06:01 2016-12-02 07:06:05
#2:    bob front door 2016-12-02 06:05:01 2016-12-02 06:05:05

数据

如上所述,我们使用的是jazzurro的数据

dt <- structure(list(person = c("bob","bob","ana","ana"),door = c("front door","front door","front door"
),type = c("timeIn","timeIn","timeOut","timeOut"),time = structure(c(1480658701,1480658702,1480658703,1480658705,1480662361,1480662362,1480662363,1480662365),class = c("POSIXct","POSIXt"))),.Names = c("person","door","type","time"),row.names = c(NA,-8L),class = c("data.table","data.frame"))

但是将时区强制转换为UTC.

dt[,time := lubridate::with_tz(time,"UTC")]

我们有

dt
#   person       door    type                time
#1:    bob front door  timeIn 2016-12-02 06:05:01
#2:    bob front door  timeIn 2016-12-02 06:05:02
#3:    bob front door timeOut 2016-12-02 06:05:03
#4:    bob front door timeOut 2016-12-02 06:05:05
#5:    ana front door  timeIn 2016-12-02 07:06:01
#6:    ana front door  timeIn 2016-12-02 07:06:02
#7:    ana front door timeOut 2016-12-02 07:06:03
#8:    ana front door timeOut 2016-12-02 07:06:05

独立于当地时区.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读