python – 替换部分匹配字符串的pandas数据框中的列名

发布时间：2020-12-20 12:12:53 所属栏目：Python 来源：网络整理

导读：背景我想在数据框中识别部分匹配字符串的列名称,并将其替换为原始名称以及添加到其中的一些新元素.新元素是由列表定义的整数.这是一个similar question,但我担心建议的解决方案在我的特定情况下不够灵活. here是另一篇文章,其中有几个很好的答案接近我面临

背景

我想在数据框中识别部分匹配字符串的列名称,并将其替换为原始名称以及添加到其中的一些新元素.新元素是由列表定义的整数.这是一个similar question,但我担心建议的解决方案在我的特定情况下不够灵活. here是另一篇文章,其中有几个很好的答案接近我面临的问题.

有些研究

我知道我可以组合两个字符串列表,使用字典作为函数df.rename中的输入将它们成对映射到into a dictionary和rename the columns.但考虑到现有列的数量会有所不同,这似乎有点过于复杂,而且不够灵活.与要重命名的列数一样.

以下代码段将生成一个输入示例：

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,110,size=(Observations,Columns)),columns = ['Price','obs_1','obs_2','obs_3','obs_4'])

datelist = pd.date_range(pd.datetime.today().strftime('%Y-%m-%d'),periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print(df)

输入

enter image description here

我想识别以obs_开头的列名,并在=符号后面的列表newElements = [5,10,15,20]中添加元素(整数).名为Price的列保持不变.在obs_列之后出现的其他列也应保持不变.

以下代码段将演示所需的输出：

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,'Obs_1 = 5','Obs_2 = 10','Obs_3 = 15','Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print(df2)

产量

enter image description here

我的尝试

# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5,20]
oldElements = [1,2,3,4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()

# Since I know the first column should not be renamed,# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:   
    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames,newnames)),inplace = True)

print('My attempt: ',df)

enter image description here

已经完成了新列名的完整列表
我也可以使用df.columns = newnames,
但希望你们其中一个人有使用的建议
df.rename以更加pythonic的方式.

谢谢你的任何建议！

这是一个简单的复制粘贴的完整代码：

# Libraries
import numpy as np
import pandas as pd
import itertools

# A dataframe
Observations = 5
Columns = 5
np.random.seed(123)
df = pd.DataFrame(np.random.randint(90,periods=Observations).tolist()
df['Dates'] = datelist
df = df.set_index(['Dates'])
print('Input: ',df)

# Desired output
Observations = 5
Columns = 5
np.random.seed(123)
df2 = pd.DataFrame(np.random.randint(90,'Obs_4 = 20'])

df2['Dates'] = datelist
df2 = df2.set_index(['Dates'])
print('Desired output: ',df2)

# My attempts
# Define the partial string I'm lookin for
stringMatch = 'Obs_'

# Put existing column names in a list
oldnames = list(df)

# Put elements that should be added to the column names
# where the three first letters match 'obs_'
newElements = [5,4]

# Change types of the elements in the list
str_newElements = [str(x) for x in newElements]
str_oldElements = [str(y) for y in oldElements]
str_newNames = str_newElements.copy()


# Since I know the first column should not be renamed,# I start with 'Price' in a list
newnames = ['Price']

# Then I add the renamed parts to the same list
i = 0
for oldElement in str_oldElements:

    #print(repr(oldElement) + repr(str_newElements[i]))
    newnames.append(stringMatch + oldElement + ' = ' + str_newElements[i])
    i = i + 1

# Rename columns using the dict as input in df.rename
df.rename(columns = dict(zip(oldnames,df)

编辑：后果

仅仅一天之后,这么多好的答案真是太神奇了！这使得很难确定接受哪个答案.我不知道以下是否会给整个帖子增加很多价值,但我继续把所有建议都包含在函数中并用％timeit测试它们.

结果如下：

enter image description here

建议fram HH1是第一个发布的,也是执行时间最快的之一.如果有人感兴趣,我会在稍后提供代码.

编辑2

当我尝试时,来自suvy的建议呈现了这些结果：

enter image description here

该片段工作正常,直到最后一行.在运行df = df.rename(columns = dict(zip(names,renames))之后,数据框看起来像这样：

enter image description here

解决方法

这有用吗？

df.columns = [col + ' = ' + str(newElements.pop(0)) if col.startswith(stringMatch) else col for col in df.columns]

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!