scala – Spark数据帧过滤器
发布时间:2020-12-16 08:50:31 所属栏目:安全 来源:网络整理
导读:val df = sc.parallelize(Seq((1,"Emailab"),(2,"Phoneab"),(3,"Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2")+---+-------+| c1| c2|+---+-------+| 1|Emailab|| 2|Phoneab|| 3| Faxab|| 4| Mail|| 5| Ot
val df = sc.parallelize(Seq((1,"Emailab"),(2,"Phoneab"),(3,"Faxab"),(4,"Mail"),(5,"Other"),(6,"MSL12"),(7,"MSL"),(8,"HCP"),(9,"HCP12"))).toDF("c1","c2") +---+-------+ | c1| c2| +---+-------+ | 1|Emailab| | 2|Phoneab| | 3| Faxab| | 4| Mail| | 5| Other| | 6| MSL12| | 7| MSL| | 8| HCP| | 9| HCP12| +---+-------+ 我想过滤掉“c2”栏前3个字符“MSL”或“HCP”的记录. 所以输出应该如下. +---+-------+ | c1| c2| +---+-------+ | 1|Emailab| | 2|Phoneab| | 3| Faxab| | 4| Mail| | 5| Other| +---+-------+ 任何人都可以帮忙吗? 我知道df.filter($“c2”.rlike(“MSL”)) – 这是用于选择记录但是如何排除记录. ? 版本:Spark 1.6.2 解决方法df.filter(not( substring(col("c2"),3).isin("MSL","HCP")) ) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |