python – 提取两个句子之间不同的单词
发布时间:2020-12-20 11:59:25 所属栏目:Python 来源:网络整理
导读:我有一个非常大的数据框,有两列名为sentence1和sentence2. 我正在尝试使用两个句子之间不同的单词创建一个新列,例如: sentence1=c("This is sentence one","This is sentence two","This is sentence three")sentence2=c("This is the sentence four","This
我有一个非常大的数据框,有两列名为sentence1和sentence2.
我正在尝试使用两个句子之间不同的单词创建一个新列,例如: sentence1=c("This is sentence one","This is sentence two","This is sentence three") sentence2=c("This is the sentence four","This is the sentence five","This is the sentence six") df = as.data.frame(cbind(sentence1,sentence2)) 我的数据框架具有以下结构: ID sentence1 sentence2 1 This is sentence one This is the sentence four 2 This is sentence two This is the sentence five 3 This is sentence three This is the sentence six 我的预期结果是: ID sentence1 sentence2 Expected_Result 1 This is ... This is ... one the four 2 This is ... This is ... two the five 3 This is ... This is ... three the six 在R中我试图分割句子,并在得到列表之间不同的元素后,例如: df$split_Sentence1<-strsplit(df$sentence1,split=" ") df$split_Sentence2<-strsplit(df$sentence2,split=" ") df$Dif<-setdiff(df$split_Sentence1,df$split_Sentence2) 但是这种方法在应用setdiff时不起作用…… 在Python中,我试图应用NLTK,尝试首先获取令牌,然后提取两个列表之间的差异,如: from nltk.tokenize import word_tokenize df['tokensS1'] = df.sentence1.apply(lambda x: word_tokenize(x)) df['tokensS2'] = df.sentence2.apply(lambda x: word_tokenize(x)) 在这一点上,我没有找到一个功能,给我我需要的结果.. 我希望你能帮助我.谢谢 解决方法
这是一个R解决方案.
我创建了一个exclusiveWords函数,用于查找两个集合之间的唯一单词,并返回由这些单词组成的“句子”.我将它包装在Vectorize()中,以便它可以同时处理data.frame的所有行. df = as.data.frame(cbind(sentence1,sentence2),stringsAsFactors = F) exclusiveWords <- function(x,y){ x <- strsplit(x," ")[[1]] y <- strsplit(y," ")[[1]] u <- union(x,y) u <- union(setdiff(u,x),setdiff(u,y)) return(paste0(u,collapse = " ")) } exclusiveWords <- Vectorize(exclusiveWords) df$result <- exclusiveWords(df$sentence1,df$sentence2) df # sentence1 sentence2 result # 1 This is sentence one This is the sentence four the four one # 2 This is sentence two This is the sentence five the five two # 3 This is sentence three This is the sentence six the six three (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |