加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

R:如何在保留其他列的同时聚合某些列

发布时间:2020-12-14 04:48:04 所属栏目:百科 来源:网络整理
导读:我有类似于 here所述的类似问题,但我尝试过的解决方案都没有. 给出这样的表: Date Exercise Category Weight Reps EstMax RepxWeight Note4/2/16 Deadlift Legs 135 7 166.4685 7x135 easy4/2/16 Deadlift Legs 135 7 166.4685 7x135 kinda easy4/2/16 Dead
我有类似于 here所述的类似问题,但我尝试过的解决方案都没有.

给出这样的表:

Date    Exercise    Category    Weight  Reps    EstMax  RepxWeight  Note
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   easy
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   kinda easy
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   tired
4/2/16  Bench Press Chest   95  5   110.8175    5x95    hard
4/2/16  Bench Press Chest   135 2   143.991 2x135   not hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   a little hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   super tired
4/18/16 Deadlift    Legs    155 8   196.292 8x155   …
4/18/16 Deadlift    Legs    155 5   180.8075    5x155   bad day
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   good day
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   felt easy
5/8/16  Bench Press Chest   115 4   130.318 4x115   easy
5/8/16  Bench Press Chest   115 4   130.318 4x115   hard

我想聚合以基于多个其他列(例如日期和练习)获取具有特定列(例如,EstMax)的最大值的行,但是还保留行中的所有其他列.并且在具有相同最大值的多个条目的情况下,取第一个条目.

预期的输出看起来像这样:

Date    Exercise    Category    Weight  Reps    EstMax  RepxWeight  Note
4/2/16  Deadlift    Legs    135 7   166.4685    7x135   easy
4/2/16  Bench Press Chest   135 2   143.991 2x135   not hard
4/9/16  Bench Press Chest   135 2   143.991 2x135   a little hard
4/18/16 Deadlift    Legs    155 8   196.292 8x155   …
5/8/16  Deadlift    Legs    185 3   203.4815    3x185   good day
5/8/16  Bench Press Chest   115 4   130.318 4x115   hard

我试过的一些方法的例子;在每种情况下,’额外列’最终都被用作聚合的因素,这不是我想要的.

data <- structure(list(Date = structure(c(2L,2L,3L,1L,4L,4L),.Label = c("4/18/16","4/2/16","4/9/16","5/8/16"),class = "factor"),Exercise = structure(c(2L,1L),.Label = c("Bench Press","Deadlift"),Category = structure(c(2L,.Label = c("Chest","Legs"),Weight = c(135L,135L,95L,155L,185L,115L,115L),Reps = c(7L,7L,5L,8L,EstMax = c(166.4685,166.4685,110.8175,143.991,196.292,180.8075,203.4815,130.318,130.318),RepxWeight = structure(c(6L,6L,3L),.Label = c("2x135","3x185","4x115","5x155","5x95","7x135","8x155"),Note = structure(c(4L,11L,9L,10L,7L),.Label = c("…","a little hard","bad day","easy","felt easy","good day","hard","kinda easy","not hard","super tired","tired"),class = "factor")),.Names = c("Date","Exercise","Category","Weight","Reps","EstMax","RepxWeight","Note"),class = "data.frame",row.names = c(NA,-13L))

# base R
aggregate(EstMax ~ Date + Exercise,data = data,FUN = max)
# Date    Exercise   EstMax
# 1  4/2/16 Bench Press 143.9910
# 2  4/9/16 Bench Press 143.9910
# 3  5/8/16 Bench Press 130.3180
# 4 4/18/16    Deadlift 196.2920
# 5  4/2/16    Deadlift 166.4685
# 6  5/8/16    Deadlift 203.4815

aggregate(EstMax ~ Date + Exercise + RepxWeight + Note,FUN = max)
# Date    Exercise RepxWeight          Note   EstMax
# 1  4/18/16    Deadlift      8x155             … 196.2920
# 2   4/9/16 Bench Press      2x135 a little hard 143.9910
# 3  4/18/16    Deadlift      5x155       bad day 180.8075
# 4   5/8/16 Bench Press      4x115          easy 130.3180
# 5   4/2/16    Deadlift      7x135          easy 166.4685
# 6   5/8/16    Deadlift      3x185     felt easy 203.4815
# 7   5/8/16    Deadlift      3x185      good day 203.4815
# 8   5/8/16 Bench Press      4x115          hard 130.3180
# 9   4/2/16 Bench Press       5x95          hard 110.8175
# 10  4/2/16    Deadlift      7x135    kinda easy 166.4685
# 11  4/2/16 Bench Press      2x135      not hard 143.9910
# 12  4/9/16 Bench Press      2x135   super tired 143.9910
# 13  4/2/16    Deadlift      7x135         tired 166.4685


# data table
library("data.table")
data_dt <- data.table(data)
data_dt[,max(EstMax),by = c("Date","Exercise")]
# Date    Exercise       V1
# 1:  4/2/16    Deadlift 166.4685
# 2:  4/2/16 Bench Press 143.9910
# 3:  4/9/16 Bench Press 143.9910
# 4: 4/18/16    Deadlift 196.2920
# 5:  5/8/16    Deadlift 203.4815
# 6:  5/8/16 Bench Press 130.3180

data_dt[,.(Date,Exercise,Weight,Reps,RepxWeight,Note)]
# Date    Exercise Weight Reps RepxWeight          Note       V1
# 1:  4/2/16    Deadlift    135    7      7x135          easy 166.4685
# 2:  4/2/16    Deadlift    135    7      7x135    kinda easy 166.4685
# 3:  4/2/16    Deadlift    135    7      7x135         tired 166.4685
# 4:  4/2/16 Bench Press     95    5       5x95          hard 110.8175
# 5:  4/2/16 Bench Press    135    2      2x135      not hard 143.9910
# 6:  4/9/16 Bench Press    135    2      2x135 a little hard 143.9910
# 7:  4/9/16 Bench Press    135    2      2x135   super tired 143.9910
# 8: 4/18/16    Deadlift    155    8      8x155             … 196.2920
# 9: 4/18/16    Deadlift    155    5      5x155       bad day 180.8075
# 10:  5/8/16    Deadlift    185    3      3x185      good day 203.4815
# 11:  5/8/16    Deadlift    185    3      3x185     felt easy 203.4815
# 12:  5/8/16 Bench Press    115    4      4x115          easy 130.3180
# 13:  5/8/16 Bench Press    115    4      4x115          hard 130.3180

特别喜欢碱R溶液.还看到了which.max()函数可能有用,但无法弄清楚如何将其应用于此.

我看过的其他相关问题却没有解决这个问题:

Adding a non-aggregated column to an aggregated data set based on the aggregation of another column

Only keep min value for each factor level

How to select the row with the maximum value in each group

aggregating multiple columns in data.table

How to aggregate some columns while keeping other columns in R?

解决方法

我知道你寻求一个基本的R解决方案,但同时,这里有一个dplyr:

library(dplyr)

data %>% 
  group_by(Date,Exercise) %>% 
  slice(which.max(EstMax))

# # A tibble: 6 x 8
# # Groups:   Date,Exercise [6]
#      Date    Exercise Category Weight  Reps   EstMax RepxWeight          Note
#    <fctr>      <fctr>   <fctr>  <int> <int>    <dbl>     <fctr>        <fctr>
# 1 4/18/16    Deadlift     Legs    155     8 196.2920      8x155             …
# 2  4/2/16 Bench Press    Chest    135     2 143.9910      2x135      not hard
# 3  4/2/16    Deadlift     Legs    135     7 166.4685      7x135          easy
# 4  4/9/16 Bench Press    Chest    135     2 143.9910      2x135 a little hard
# 5  5/8/16 Bench Press    Chest    115     4 130.3180      4x115          easy
# 6  5/8/16    Deadlift     Legs    185     3 203.4815      3x185      good day

编辑

data.table不是我的强项,但为了完整起见,这是我的尝试:

library(data.table)

setDT(data)[,.SD[which.max(EstMax)],by = .(Date,Exercise)]

#       Date    Exercise Category Weight Reps   EstMax RepxWeight          Note
# 1:  4/2/16    Deadlift     Legs    135    7 166.4685      7x135          easy
# 2:  4/2/16 Bench Press    Chest    135    2 143.9910      2x135      not hard
# 3:  4/9/16 Bench Press    Chest    135    2 143.9910      2x135 a little hard
# 4: 4/18/16    Deadlift     Legs    155    8 196.2920      8x155             …
# 5:  5/8/16    Deadlift     Legs    185    3 203.4815      3x185      good day
# 6:  5/8/16 Bench Press    Chest    115    4 130.3180      4x115          easy

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读