加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

如何在熊猫中找到两个日期时间之间的差异?

发布时间:2020-12-20 13:17:11 所属栏目:Python 来源:网络整理
导读:我有以下数据类型: id=["Train A","Train A","Train B","Train B"]arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","
我有以下数据类型:

id=["Train A","Train A","Train B","Train B"]
arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]
departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"]

要获得以下数据:

id              arrival_time                departure_time
Train A                 0                  2016-05-19 08:25:00
Train A          2016-05-19 13:50:00       2016-05-19 16:00:00
Train A          2016-05-19 21:25:00       2016-05-20 07:45:00
Train B                    0               2016-05-24 12:50:00
Train B          2016-05-24 18:30:00       2016-05-25 23:00:00
Train B          2016-05-26 12:15:00       2016-05-26 19:45:00

出发时间和到达时间的数据类型是datetime64 [ns].

如何找到第一排出发时间和第二排到达时间之间的时差?我厌倦了以下代码,但它没有用.例如,找到[2016-05-19 08:25:00]和[2016-05-19 13:50:00]之间的时差.

df['Duration'] = df.departure_time.iloc[i+1] - df.arrival_time.iloc[i]

解决方法

我想你需要先转换日期字符串 to_datetime,还必须将0转换为NaN:

df = pd.DataFrame({'id': id,'arrival_time':arrival_time,'departure_time':departure_time})

df['arrival_time'] = pd.to_datetime(df['arrival_time'].replace('0',np.nan))
#another solution for replace not dates to NaT
#df['arrival_time'] = pd.to_datetime(df['arrival_time'],errors='coerce')
df['departure_time'] = pd.to_datetime(df['departure_time'])
print (df)
         arrival_time      departure_time       id
0                 NaT 2016-05-19 08:25:00  Train A
1 2016-05-19 13:50:00 2016-05-19 16:00:00  Train A
2 2016-05-19 21:25:00 2016-05-20 07:45:00  Train A
3                 NaT 2016-05-24 12:50:00  Train B
4 2016-05-24 18:30:00 2016-05-25 23:00:00  Train B
5 2016-05-26 12:15:00 2016-05-26 19:45:00  Train B

然后shift column departure_time每个组id与groupby和substract arrival_time列.

df['Duration'] = df.groupby('id')['departure_time'].shift() - df['arrival_time']
print (df)
         arrival_time      departure_time       id          Duration
0                 NaT 2016-05-19 08:25:00  Train A               NaT
1 2016-05-19 13:50:00 2016-05-19 16:00:00  Train A -1 days +18:35:00
2 2016-05-19 21:25:00 2016-05-20 07:45:00  Train A -1 days +18:35:00
3                 NaT 2016-05-24 12:50:00  Train B               NaT
4 2016-05-24 18:30:00 2016-05-25 23:00:00  Train B -1 days +18:20:00
5 2016-05-26 12:15:00 2016-05-26 19:45:00  Train B -1 days +10:45:00

或者可能需要交换列以获得正时间delta:

df['Duration'] = df['arrival_time'] - df.groupby('id')['departure_time'].shift()
print (df)
         arrival_time      departure_time       id  Duration
0                 NaT 2016-05-19 08:25:00  Train A       NaT
1 2016-05-19 13:50:00 2016-05-19 16:00:00  Train A  05:25:00
2 2016-05-19 21:25:00 2016-05-20 07:45:00  Train A  05:25:00
3                 NaT 2016-05-24 12:50:00  Train B       NaT
4 2016-05-24 18:30:00 2016-05-25 23:00:00  Train B  05:40:00
5 2016-05-26 12:15:00 2016-05-26 19:45:00  Train B  13:15:00

最后可以在total_seconds之前将timedelta转换为秒:

df['Duration'] = (df['arrival_time'] - df.groupby('id')['departure_time'].shift()).dt.total_seconds()
print (df)
         arrival_time      departure_time       id  Duration
0                 NaT 2016-05-19 08:25:00  Train A       NaN
1 2016-05-19 13:50:00 2016-05-19 16:00:00  Train A   19500.0
2 2016-05-19 21:25:00 2016-05-20 07:45:00  Train A   19500.0
3                 NaT 2016-05-24 12:50:00  Train B       NaN
4 2016-05-24 18:30:00 2016-05-25 23:00:00  Train B   20400.0
5 2016-05-26 12:15:00 2016-05-26 19:45:00  Train B   47700.0

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读