python – 如何在DatetimeIndex的特定时间插入值
发布时间:2020-12-20 11:08:53 所属栏目:Python 来源:网络整理
导读:我有一个可重现的代码如下: import pandas as pdimport datetimefoo = pd.read_csv("http://m.uploadedit.com/bbtc/1545406250692.txt",header=None,names=["Stock","Date","Time","Open","High","Low","Close","Volume","OI"],dtype={"Stock":"category"},
我有一个可重现的代码如下:
import pandas as pd import datetime foo = pd.read_csv("http://m.uploadedit.com/bbtc/1545406250692.txt",header=None,names=["Stock","Date","Time","Open","High","Low","Close","Volume","OI"],dtype={"Stock":"category"},parse_dates= [['Date','Time']],index_col="Date_Time") foo.sort_index(inplace=True) bar = foo.between_time('09:00:00','15:30:00') #Dropping post and pre market data i.e. from index 15:31 - 16:35 #resampling the data by 120 Minutes (2 hours) twohour = bar.loc["2018-11-22 09:08:00":].resample('120Min',closed = 'right',label = 'left',base=75).agg({'Open': 'first','High': 'max','Low': 'min','Close': 'last'}).dropna() twohour.head(7) Out[]: Close High Open Low Date_Time 2018-11-22 07:15:00 321.3 321.30 321.30 321.30 2018-11-22 09:15:00 324.5 326.90 320.10 320.00 2018-11-22 11:15:00 323.2 324.85 324.60 322.20 2018-11-22 13:15:00 319.9 324.35 323.20 319.50 2018-11-22 15:15:00 320.0 320.35 319.85 319.15 2018-11-26 07:15:00 324.90 324.90 324.90 324.90 2018-11-26 09:15:00 311.35 324.40 323.10 309.60 我希望时间09:15:00的索引中的Open列中的每个值都被索引中的Close列的值替换为时间07:15:00. 简而言之,我需要这个输出: Out[]: Close High Open Low Date_Time 2018-11-22 07:15:00 321.3 321.30 321.30 321.30 2018-11-22 09:15:00 324.5 326.90 321.30 320.00 2018-11-22 11:15:00 323.2 324.85 324.60 322.20 2018-11-22 13:15:00 319.9 324.35 323.20 319.50 2018-11-22 15:15:00 320.0 320.35 319.85 319.15 2018-11-26 07:15:00 324.90 324.90 324.90 324.90 2018-11-26 09:15:00 311.35 324.40 324.90 309.60 我尝试通过将DateTimeindex转换为字典然后替换值来使用.loc.但字典没有排序,所以需要对字典进行排序,代码变得越来越难看. 解决方法
您可以在比较之前将索引转换为timdelta或字符串:
# timedelta option,vectorised & efficient mask_bool = (df.index - df.index.normalize()) == '09:15:00' # string alternative,inefficient mask_bool = df.index.strftime('%H:%M') == '09:15' 然后通过loc或mask分配: # Option 1: assign conditionally via loc df.loc[mask_bool,'Open'] = df['Close'].shift(1) # Option 2: mask with pd.Series.mask df['Open'] = df['Open'].mask(mask_bool,df['Close'].shift(1)) 结果: print(df) Close High Open Low Date_Time 2018-11-22 07:15:00 321.30 321.30 321.30 321.30 2018-11-22 09:15:00 324.50 326.90 321.30 320.00 2018-11-22 11:15:00 323.20 324.85 324.60 322.20 2018-11-22 13:15:00 319.90 324.35 323.20 319.50 2018-11-22 15:15:00 320.00 320.35 319.85 319.15 2018-11-26 07:15:00 324.90 324.90 324.90 324.90 2018-11-26 09:15:00 311.35 324.40 324.90 309.60 绩效基准 对于较大的数据帧,timedelta矢量化版本应该是高效的,但请注意这将取决于系统和设置: # Python 3.6.5,Pandas 0.23,NumPy 1.14.3 import pandas as pd from datetime import time df = pd.DataFrame.from_dict({'Date_Time': ['2018-11-22 07:15:00','2018-11-22 09:15:00','2018-11-22 11:15:00','2018-11-22 13:15:00','2018-11-22 15:15:00','2018-11-26 07:15:00','2018-11-26 09:15:00'],'Close': [321.3,324.5,323.2,319.9,320.0,324.9,311.35],'High': [321.3,326.9,324.85,324.35,320.35,324.4],'Open': [321.3,321.3,324.6,319.85,324.9],'Low': [321.3,322.2,319.5,319.15,309.6]}) df['Date_Time'] = pd.to_datetime(df['Date_Time']) df = df.set_index('Date_Time') df = pd.concat([df]*10**4) %timeit (df.index - df.index.normalize()) == '09:15:00' # 8.67 ms %timeit df.index.strftime('%H:%M') == '09:15' # 651 ms %timeit df.index.time == time(9,15) # 28.3 ms (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |