加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

如何根据列值删除行,其中某行的列值是另一行的子集?

发布时间:2020-12-16 22:27:28 所属栏目:Python 来源:网络整理
导读:假设我有一个数据帧df如下: index company url address 0 A . www.abc.contact.com 16D Bayberry Rd,New Bedford,MA,02740,USA 1 A . www.abc.contact.com . MA,USA 2 A . www.abc.about.com . USA 3 B . www.pqr.com . New Bedford,USA 4 B. www.pqr.com/a

假设我有一个数据帧df如下: –

index company  url                          address 
 0     A .    www.abc.contact.com         16D Bayberry Rd,New Bedford,MA,02740,USA
 1     A .    www.abc.contact.com .       MA,USA
 2     A .    www.abc.about.com .         USA
 3     B .    www.pqr.com .               New Bedford,USA
 4     B.     www.pqr.com/about .         MA,USA

我想从数据框中删除所有行,其中地址是另一个地址的子集,公司是相同的.例如,我希望这5行中的这两行.

index  company  url                          address 
 0     A .    www.abc.contact.com         16D Bayberry Rd,USA
 3     B .    www.pqr.com .               New Bedford,USA
最佳答案
也许它不是最佳解决方案,但它可以在这个小型数据框架上工作:

EDIT添加了对公司名称的检查,假设我们删除了标点符号

df = pd.DataFrame({"company": ['A','A','B','B'],"address": ['16D Bayberry Rd,USA','MA,'USA','New Bedford,USA']})
# Splitting addresses by column and making sets from every address to use "issubset" later
addresses = list(df['address'].apply(lambda x: set(x.split(','))).values)
companies = list(df['company'].values)

rows_to_drop = []  # Storing row indexes to drop here
# Iterating by every address
for i,(address,company) in enumerate(zip(addresses,companies)):
    # Iteraing by the remaining addresses
    rem_addr = addresses[:i] + addresses[(i + 1):]
    rem_comp = companies[:i] + companies[(i + 1):]

    for other_addr,other_comp in zip(rem_addr,rem_comp):
        # If address is a subset of another address,add it to drop
        if address.issubset(other_addr) and company == other_comp:
            rows_to_drop.append(i)
            break

df = df.drop(rows_to_drop)
print(df)

company address
0   A   16D Bayberry Rd,USA
3   B   New Bedford,USA

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读