根据名称pandas python对某些列进行乘法和求和
发布时间:2020-12-20 11:04:45 所属栏目:Python 来源:网络整理
导读:我有一个小样本数据集: import pandas as pdd = { 'measure1_x': [10,12,20,30,21],'measure2_x':[11,10,3,3],'measure3_x':[10,1,1],'measure1_y': [1,2,'measure2_y':[1,'measure3_y':[1,1]}df = pd.DataFrame(d)df = df.reindex_axis([ 'measure1_x','me
我有一个小样本数据集:
import pandas as pd d = { 'measure1_x': [10,12,20,30,21],'measure2_x':[11,10,3,3],'measure3_x':[10,1,1],'measure1_y': [1,2,'measure2_y':[1,'measure3_y':[1,1] } df = pd.DataFrame(d) df = df.reindex_axis([ 'measure1_x','measure2_x','measure3_x','measure1_y','measure2_y','measure3_y' ],axis=1) 看起来像: measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y 10 11 10 1 1 1 12 12 0 2 1 0 20 10 12 2 1 2 30 3 1 3 3 1 21 3 1 1 3 1 我创建了几乎相同的列名,除了’_x’和’_y’以帮助确定哪一对应该相乘:我想在忽略’_x’和’_y’时将该对与相同的列名称相乘,然后我想要总和数字来得到一个总数,请记住我的实际数据集是巨大的,并且列不是这个完美的顺序所以这个命名是一种识别正确对的乘法方法: total = measure1_x * measure1_y measure2_x * measure2_y measure3_x * measure3_y 如此理想的输出: measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y total 10 11 10 1 1 1 31 12 12 0 2 1 0 36 20 10 12 2 1 2 74 30 3 1 3 3 1 100 21 3 1 1 3 1 31 我的尝试和思考过程,但不能继续语法: #first identify the column names that has '_x' and '_y',then identify if #the column names are the same after removing '_x' and '_y',if the pair has #the same name then multiply them,do that for all pairs and sum the results #up to get the total number for colname in df.columns: if "_x".lower() in colname.lower() or "_y".lower() in colname.lower(): if "_x".lower() in colname.lower(): colnamex = colname if "_y".lower() in colname.lower(): colnamey = colname #if colnamex[:-2] are the same for colnamex and colnamey then multiply and sum 解决方法
过滤np.einsum
以为我这次尝试的东西有点不同 – >分别获取_x和_y列 df = df.sort_index(axis=1) # optional,do this if your columns aren't sorted i = df.filter(like='_x') j = df.filter(like='_y') df['Total'] = np.einsum('ij,ij->i',i,j) # (i.values * j).sum(axis=1) df measure1_x measure2_x measure3_x measure1_y measure2_y measure3_y Total 0 10 11 10 1 1 1 31 1 12 12 0 2 1 0 36 2 20 10 12 2 1 2 74 3 30 3 1 3 3 1 100 4 21 3 1 1 3 1 31 一个稍微强大的版本,它过滤掉非数字列并事先执行断言 – df = df.sort_index(axis=1).select_dtypes(exclude=[object]) i = df.filter(regex='.*_x') j = df.filter(regex='.*_y') assert i.shape == j.shape df['Total'] = np.einsum('ij,j) 如果断言失败,则假设1)您的列是数字的,以及2)x和y列的数量相等,正如您的问题所暗示的那样,不适用于您的实际数据集. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |