加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > 安全 > 正文

scala – 如何获取所有值为null的列名?

发布时间:2020-12-16 09:23:07 所属栏目:安全 来源:网络整理
导读:当列名具有空值时,我没有任何想法来获取列名 例如, case class A(name: String,id: String,email: String,company: String)val e1 = A("n1",null,"n1@c1.com",null)val e2 = A("n2","n2@c1.com",null)val e3 = A("n3","n3@c1.com",null)val e4 = A("n4","n4@
当列名具有空值时,我没有任何想法来获取列名

例如,

case class A(name: String,id: String,email: String,company: String)

val e1 = A("n1",null,"n1@c1.com",null)
val e2 = A("n2","n2@c1.com",null)
val e3 = A("n3","n3@c1.com",null)
val e4 = A("n4","n4@c2.com",null)
val e5 = A("n5","n5@c2.com",null)
val e6 = A("n6","n6@c2.com",null)
val e7 = A("n7","n7@c3.com",null)
val e8 = A("n8","n8@c3.com",null)
val As = Seq(e1,e2,e3,e4,e5,e6,e7,e8)
val df = sc.parallelize(As).toDF

这段代码使数据帧如下:

+----+----+---------+-------+
|name|  id|    email|company|
+----+----+---------+-------+
|  n1|null|n1@c1.com|   null|
|  n2|null|n2@c1.com|   null|
|  n3|null|n3@c1.com|   null|
|  n4|null|n4@c2.com|   null|
|  n5|null|n5@c2.com|   null|
|  n6|null|n6@c2.com|   null|
|  n7|null|n7@c3.com|   null|
|  n8|null|n8@c3.com|   null|
+----+----+---------+-------+

我希望列名称的所有行都为null:id,company

我不关心输出的类型.数组,字符串,RDD等等

解决方法

您可以对所有列进行简单计数,然后使用返回计数为0的列的索引,将子集df.columns:

import org.apache.spark.sql.functions.{count,col}
// Get column indices
val col_inds = df.select(df.columns.map(c => count(col(c)).alias(c)): _*)
                 .collect()(0)
                 .toSeq.zipWithIndex
                 .filter(_._1 == 0).map(_._2)
// Subset column names using the indices
col_inds.map(i => df.columns.apply(i))
//Seq[String] = ArrayBuffer(id,company)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读