python – Pandas中的复杂时间操作
发布时间:2020-12-20 13:14:07 所属栏目:Python 来源:网络整理
导读:以下是我非常大的数据框的一小部分示例: In [38]: dfOut[38]: Send_Customer Pay_Customer Send_Time 0 1000000000284044644 1000000000251680999 2016-08-01 09:55:48 1 2000000000223021617 1000000000190078650 2016-08-01 02:44:23 2 20000000002893010
以下是我非常大的数据框的一小部分示例:
In [38]: df Out[38]: Send_Customer Pay_Customer Send_Time 0 1000000000284044644 1000000000251680999 2016-08-01 09:55:48 1 2000000000223021617 1000000000190078650 2016-08-01 02:44:23 2 2000000000289301033 1000000000309048473 2016-08-01 09:20:14 3 1000000000333893941 1000000000333956151 2016-08-01 09:20:14 4 1000000000340371553 2000000000103942022 2016-08-01 09:20:14 5 2000000000098132192 2000000000089264458 2016-08-01 09:21:27 6 1000000000007716594 2000000000144437513 2016-08-01 09:20:54 7 1000000000135884145 1000000000278399847 2016-08-01 09:21:43 8 2000000000141318366 2000000000151080468 2016-08-01 09:20:46 9 1000000000056842546 2000000000139908360 2016-08-01 09:20:55 10 1000000000275051425 2000000000254558241 2016-08-01 09:20:17 11 1000000000162362467 1000000000340653197 2016-08-01 09:23:45 12 1000000000039529533 1000000000072903285 2016-08-01 09:22:56 13 1000000000034147075 2000000000079408765 2016-08-01 09:20:17 14 1000000000319501203 1000000000337830072 2016-08-01 09:20:20 15 1000000000025289495 2000000000287368163 2016-08-01 09:20:31 16 1000000000043110429 1000000000209850047 2016-08-01 09:22:33 我需要在10小时的时间内找出Send_Customer有多少非独特或唯一的Pay_Customer? 所以,这是我使用的方法: In [39]: df['time_diff'] = df.groupby('Send_Customer')['Send_Time'].apply(lambda x : x.diff().abs()) In [41]: df[df['time_diff']<=dt.timedelta(seconds=36000)] Out[41]: Send_Customer Pay_Customer Send_Time 4361 1000000000284044644 1000000000326834813 2016-08-01 14:32:17 7530 2000000000223021617 1000000000340199555 2016-08-01 04:49:41 10937 2000000000148219588 1000000000312697109 2016-08-01 04:49:40 12876 1000000000339947901 2000000000218218239 2016-08-01 14:51:51 13553 1000000000248905073 1000000000248729812 2016-08-01 16:44:35 14281 2000000000270573223 1000000000341120021 2016-08-01 09:35:11 time_diff 4361 00:10:37 7530 00:17:06 10937 01:09:45 12876 00:53:59 13553 01:12:17 14281 05:19:34 这种方法部分工作,因为在[‘Send_Time’]上使用.diff()消除了用于区分的第一行.有关如何保存这些行的任何想法? 解决方法
如果我理解正确:在差异之后,第一行是NaT.为了保留第一行,您可以将NaT值替换为您的条件不会过滤掉的内容,例如0.
在这里,我只需在第一行末尾添加.fillna(0): df['time_diff'] = df.groupby('Send_Customer')['Send_Time'].apply( lambda x : x.diff().abs() ).fillna(0) df[df['time_diff'] <= dt.timedelta(seconds=36000)] (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |