python – pandas dataframe：如何计算二进制列中1行的数量？

发布时间：2020-12-16 23:43:45 所属栏目：Python 来源：网络整理

导读：我有以下pandas DataFrame： import pandas as pdimport numpy as npdf = pd.DataFrame({"first_column": [0,1,0]}) df first_column0 01 02 03 14 15 16 07 08 19 110 011 012 013 014 115 116 117 118 119 020 0 first_column是0和1的二进制列.存在连续的

我有以下pandas DataFrame：

import pandas as pd
import numpy as np

df = pd.DataFrame({"first_column": [0,1,0]})

>>> df
    first_column
0              0
1              0
2              0
3              1
4              1
5              1
6              0
7              0
8              1
9              1
10             0
11             0
12             0
13             0
14             1
15             1
16             1
17             1
18             1
19             0
20             0

first_column是0和1的二进制列.存在连续的“簇”,它们总是成对出现至少两个.

我的目标是创建一个列“计算”每组的行数：

>>> df
    first_column    counts
0              0        0
1              0        0
2              0        0
3              1        3
4              1        3
5              1        3
6              0        0
7              0        0
8              1        2
9              1        2
10             0        0
11             0        0
12             0        0
13             0        0
14             1        5
15             1        5
16             1        5
17             1        5
18             1        5
19             0        0
20             0        0

这听起来像df.loc()的工作,例如df.loc [df.first_column == 1] ……某事

我只是不确定如何考虑每个“群集”,以及如何用“行数”标记每个独特的群集.

怎么会这样做？

解决方法

这是NumPy的 cumsum和 bincount的一种方法 –

def cumsum_bincount(a):  
    # Append 0 & look for a [0,1] pattern. Form a binned array based off 1s groups
    ids = a*(np.diff(np.r_[0,a])==1).cumsum()

    # Get the bincount,index into the count with ids and finally mask out 0s
    return a*np.bincount(ids)[ids]

样品运行 –

In [88]: df['counts'] = cumsum_bincount(df.first_column.values)

In [89]: df
Out[89]: 
    first_column  counts
0              0       0
1              0       0
2              0       0
3              1       3
4              1       3
5              1       3
6              0       0
7              0       0
8              1       2
9              1       2
10             0       0
11             0       0
12             0       0
13             0       0
14             1       5
15             1       5
16             1       5
17             1       5
18             1       5
19             0       0
20             0       0

将前6个元素设置为1,然后测试 –

In [101]: df.first_column.values[:5] = 1

In [102]: df['counts'] = cumsum_bincount(df.first_column.values)

In [103]: df
Out[103]: 
    first_column  counts
0              1       6
1              1       6
2              1       6
3              1       6
4              1       6
5              1       6
6              0       0
7              0       0
8              1       2
9              1       2
10             0       0
11             0       0
12             0       0
13             0       0
14             1       5
15             1       5
16             1       5
17             1       5
18             1       5
19             0       0
20             0       0

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!