scala – 如何获取所有值为null的列名?
发布时间:2020-12-16 09:23:07 所属栏目:安全 来源:网络整理
导读:当列名具有空值时,我没有任何想法来获取列名 例如, case class A(name: String,id: String,email: String,company: String)val e1 = A("n1",null,"n1@c1.com",null)val e2 = A("n2","n2@c1.com",null)val e3 = A("n3","n3@c1.com",null)val e4 = A("n4","n4@
当列名具有空值时,我没有任何想法来获取列名
例如, case class A(name: String,id: String,email: String,company: String) val e1 = A("n1",null,"n1@c1.com",null) val e2 = A("n2","n2@c1.com",null) val e3 = A("n3","n3@c1.com",null) val e4 = A("n4","n4@c2.com",null) val e5 = A("n5","n5@c2.com",null) val e6 = A("n6","n6@c2.com",null) val e7 = A("n7","n7@c3.com",null) val e8 = A("n8","n8@c3.com",null) val As = Seq(e1,e2,e3,e4,e5,e6,e7,e8) val df = sc.parallelize(As).toDF 这段代码使数据帧如下: +----+----+---------+-------+ |name| id| email|company| +----+----+---------+-------+ | n1|null|n1@c1.com| null| | n2|null|n2@c1.com| null| | n3|null|n3@c1.com| null| | n4|null|n4@c2.com| null| | n5|null|n5@c2.com| null| | n6|null|n6@c2.com| null| | n7|null|n7@c3.com| null| | n8|null|n8@c3.com| null| +----+----+---------+-------+ 我希望列名称的所有行都为null:id,company 我不关心输出的类型.数组,字符串,RDD等等 解决方法
您可以对所有列进行简单计数,然后使用返回计数为0的列的索引,将子集df.columns:
import org.apache.spark.sql.functions.{count,col} // Get column indices val col_inds = df.select(df.columns.map(c => count(col(c)).alias(c)): _*) .collect()(0) .toSeq.zipWithIndex .filter(_._1 == 0).map(_._2) // Subset column names using the indices col_inds.map(i => df.columns.apply(i)) //Seq[String] = ArrayBuffer(id,company) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |