R文本挖掘 – 文本字段之间的交集
发布时间:2020-12-14 05:10:24 所属栏目:大数据 来源:网络整理
导读:我想知道是否有一种快速的方法来找到2个文本字符串之间的有向交集,例如 t1 - "I have achieved my goals over the past 20 years and look forward for my next chalanges" t2 - " have achieved goals and look my chalanges some other words bla bla" t1
我想知道是否有一种快速的方法来找到2个文本字符串之间的有向交集,例如
t1 <- "I have achieved my goals over the past 20 years and look forward for my next chalanges" t2 <- " have achieved goals and look my chalanges some other words bla bla" t1 isContainedIn t2将返回7,因为在t1中出现的7个单词也在t2中出现. keywords title 1 Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi +$50 Visa Gift Card 2 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents) 3 Samsung UN48H6350 48" Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details 4 Samsung UN48H6350 48" Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player 5 Samsung UN48H6350 48" Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV 6 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi 7 Samsung UN48H6350 48" Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW 8 Samsung UN48H6350 48" Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza) 9 Samsung UN48H6350 48" Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle 10 Samsung UN48H6350 48" Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi,(R#416) 解决方法
我猜另一种类似的方法就是使用一个简单的匹配
string <- strsplit(c(t1,t2),"s+") # similar to @Richard length(na.omit(match(string[[2]],string[[1]]))) ## [1] 7 或者也许是lapply length(unlist(lapply(string[[2]],intersect,string[[1]]))) ## [1] 7 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |