在Python中执行多个列表推导的最有效方法

发布时间：2020-12-20 12:13:10 所属栏目：Python 来源：网络整理

导读：鉴于这三个列表理解,有没有更有效的方法来做到这一点,而不是三个故意集？我相信在这种情况下for循环可能是糟糕的形式,但如果我在rowsaslist中迭代大量的行,我觉得我下面的内容并不那么有效. cachedStopWords = stopwords.words('english')rowsaslist = [x.lo

鉴于这三个列表理解,有没有更有效的方法来做到这一点,而不是三个故意集？我相信在这种情况下for循环可能是糟糕的形式,但如果我在rowsaslist中迭代大量的行,我觉得我下面的内容并不那么有效.

cachedStopWords = stopwords.words('english')

rowsaslist = [x.lower() for x in rowsaslist]
rowsaslist = [''.join(c for c in s if c not in string.punctuation) for s in rowsaslist]
rowsaslist = [' '.join([word for word in p.split() if word not in cachedStopWords]) for p in rowsaslist]

将这些全部合并为一个理解陈述更有效吗？我从可读性的角度来看,它可能是一堆乱七八糟的代码.

解决方法

您可以简单地定义2个函数并在一个列表理解中使用它们,而不是在同一个列表上迭代3次：

cachedStopWords = stopwords.words('english')


def remove_punctuation(text):
    return ''.join(c for c in text.lower() if c not in string.punctuation)

def remove_stop_words(text):
    return ' '.join([word for word in p.split() if word not in cachedStopWords])

rowsaslist = [remove_stop_words(remove_punctuation(text)) for text in rowsaslist]

我从来没用过停字.如果它返回一个列表,你最好先将它转换为一个集合,以加速不在cachedStopWords测试中的单词.

最后,NLTK软件包可以帮助您处理文本.见@alvas’ answer.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!