使用GroupBy获取Pandas的平均值 – 获取DataError：没有要聚合的

发布时间：2020-12-14 04:59:30 所属栏目：百科来源：网络整理

导读：我知道有很多问题,比如 Getting daily averages with pandas 和 How get monthly mean in pandas using groupby但我得到一个奇怪的错误. 简单数据集,带有一个索引列(类型时间戳)和一个值列. 想获得数据的月平均值. In [76]: df.head()Out[76]: A2008-01-02 1

我知道有很多问题,比如 Getting daily averages with pandas
和 How get monthly mean in pandas using groupby但我得到一个奇怪的错误.

简单数据集,带有一个索引列(类型时间戳)和一个值列.
想获得数据的月平均值.

In [76]: df.head()
Out[76]: 
                          A
2008-01-02                1
2008-01-03                2
2008-01-04                3
2008-01-07                4
2008-01-08                5

但是,当我分组时,我只得到索引的组而不是值

In [74]: df.head().groupby(lambda x: x.month).groups
Out[74]: 
{1: [Timestamp('2008-01-02 00:00:00'),Timestamp('2008-01-03 00:00:00'),Timestamp('2008-01-04 00:00:00'),Timestamp('2008-01-07 00:00:00'),Timestamp('2008-01-08 00:00:00')]}

尝试采用means()会导致错误：

尝试了df.head().resample(“M”,how =’mean’)和df.head().groupby(lambda x：x.month).mean()

并获取错误：DataError：没有要聚合的数字类型

In [75]: df.resample("M",how='mean')
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-75-79dc1a060ba4> in <module>()
----> 1 df.resample("M",how='mean')

/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc in resample(self,rule,how,axis,fill_method,closed,label,convention,kind,loffset,limit,base)
   2878                               fill_method=fill_method,convention=convention,2879                               limit=limit,base=base)
-> 2880         return sampler.resample(self).__finalize__(self)
   2881 
   2882     def first(self,offset):

/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in resample(self,obj)
     82 
     83         if isinstance(ax,DatetimeIndex):
---> 84             rs = self._resample_timestamps()
     85         elif isinstance(ax,PeriodIndex):
     86             offset = to_offset(self.freq)

/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _resample_timestamps(self)
    286             # Irregular data,have to use groupby
    287             grouped = obj.groupby(grouper,axis=self.axis)
--> 288             result = grouped.aggregate(self._agg_method)
    289 
    290             if self.fill_method is not None:

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self,arg,*args,**kwargs)
   2436     def aggregate(self,**kwargs):
   2437         if isinstance(arg,compat.string_types):
-> 2438             return getattr(self,arg)(*args,**kwargs)
   2439 
   2440         result = OrderedDict()

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    664         """
    665         try:
--> 666             return self._cython_agg_general('mean')
    667         except GroupByError:
    668             raise

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self,numeric_only)
   2356 
   2357     def _cython_agg_general(self,numeric_only=True):
-> 2358         new_items,new_blocks = self._cython_agg_blocks(how,numeric_only=numeric_only)
   2359         return self._wrap_agged_blocks(new_items,new_blocks)
   2360 

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self,numeric_only)
   2406 
   2407         if len(new_blocks) == 0:
-> 2408             raise DataError('No numeric types to aggregate')
   2409 
   2410         return data.items,new_blocks

DataError: No numeric types to aggregate

解决方法

是的,你应该尝试使用像df [‘A’] = df [‘A’].astype(int)之类的东西来强制A到数字.可能值得检查初始数据读入中是否有任何内容导致它成为对象而不是数字.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!