如何在熊猫中找到两个日期时间之间的差异?
发布时间:2020-12-20 13:17:11 所属栏目:Python 来源:网络整理
导读:我有以下数据类型: id=["Train A","Train A","Train B","Train B"]arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"]departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","
我有以下数据类型:
id=["Train A","Train A","Train B","Train B"] arrival_time = ["0"," 2016-05-19 13:50:00","2016-05-19 21:25:00","0","2016-05-24 18:30:00","2016-05-26 12:15:00"] departure_time = ["2016-05-19 08:25:00","2016-05-19 16:00:00","2016-05-20 07:45:00","2016-05-24 12:50:00","2016-05-25 23:00:00","2016-05-26 19:45:00"] 要获得以下数据: id arrival_time departure_time Train A 0 2016-05-19 08:25:00 Train A 2016-05-19 13:50:00 2016-05-19 16:00:00 Train A 2016-05-19 21:25:00 2016-05-20 07:45:00 Train B 0 2016-05-24 12:50:00 Train B 2016-05-24 18:30:00 2016-05-25 23:00:00 Train B 2016-05-26 12:15:00 2016-05-26 19:45:00 出发时间和到达时间的数据类型是datetime64 [ns]. 如何找到第一排出发时间和第二排到达时间之间的时差?我厌倦了以下代码,但它没有用.例如,找到[2016-05-19 08:25:00]和[2016-05-19 13:50:00]之间的时差. df['Duration'] = df.departure_time.iloc[i+1] - df.arrival_time.iloc[i] 解决方法
我想你需要先转换日期字符串
to_datetime ,还必须将0转换为NaN:
df = pd.DataFrame({'id': id,'arrival_time':arrival_time,'departure_time':departure_time}) df['arrival_time'] = pd.to_datetime(df['arrival_time'].replace('0',np.nan)) #another solution for replace not dates to NaT #df['arrival_time'] = pd.to_datetime(df['arrival_time'],errors='coerce') df['departure_time'] = pd.to_datetime(df['departure_time']) print (df) arrival_time departure_time id 0 NaT 2016-05-19 08:25:00 Train A 1 2016-05-19 13:50:00 2016-05-19 16:00:00 Train A 2 2016-05-19 21:25:00 2016-05-20 07:45:00 Train A 3 NaT 2016-05-24 12:50:00 Train B 4 2016-05-24 18:30:00 2016-05-25 23:00:00 Train B 5 2016-05-26 12:15:00 2016-05-26 19:45:00 Train B 然后 df['Duration'] = df.groupby('id')['departure_time'].shift() - df['arrival_time'] print (df) arrival_time departure_time id Duration 0 NaT 2016-05-19 08:25:00 Train A NaT 1 2016-05-19 13:50:00 2016-05-19 16:00:00 Train A -1 days +18:35:00 2 2016-05-19 21:25:00 2016-05-20 07:45:00 Train A -1 days +18:35:00 3 NaT 2016-05-24 12:50:00 Train B NaT 4 2016-05-24 18:30:00 2016-05-25 23:00:00 Train B -1 days +18:20:00 5 2016-05-26 12:15:00 2016-05-26 19:45:00 Train B -1 days +10:45:00 或者可能需要交换列以获得正时间delta: df['Duration'] = df['arrival_time'] - df.groupby('id')['departure_time'].shift() print (df) arrival_time departure_time id Duration 0 NaT 2016-05-19 08:25:00 Train A NaT 1 2016-05-19 13:50:00 2016-05-19 16:00:00 Train A 05:25:00 2 2016-05-19 21:25:00 2016-05-20 07:45:00 Train A 05:25:00 3 NaT 2016-05-24 12:50:00 Train B NaT 4 2016-05-24 18:30:00 2016-05-25 23:00:00 Train B 05:40:00 5 2016-05-26 12:15:00 2016-05-26 19:45:00 Train B 13:15:00 最后可以在 df['Duration'] = (df['arrival_time'] - df.groupby('id')['departure_time'].shift()).dt.total_seconds() print (df) arrival_time departure_time id Duration 0 NaT 2016-05-19 08:25:00 Train A NaN 1 2016-05-19 13:50:00 2016-05-19 16:00:00 Train A 19500.0 2 2016-05-19 21:25:00 2016-05-20 07:45:00 Train A 19500.0 3 NaT 2016-05-24 12:50:00 Train B NaN 4 2016-05-24 18:30:00 2016-05-25 23:00:00 Train B 20400.0 5 2016-05-26 12:15:00 2016-05-26 19:45:00 Train B 47700.0 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |