加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Java > 正文

java – Spark Dataframe在指定Schema时返回NULL

发布时间:2020-12-15 01:08:09 所属栏目:Java 来源:网络整理
导读:我正在努力将JavaRDD(字符串是JSON字符串)转换为数据帧并显示它.我正在做类似下面的事情, public void call(JavaRDD 架构如下所示, public static StructType buildSchema() { StructType schema = new StructType( new StructField[] { DataTypes.createStr

我正在努力将JavaRDD(字符串是JSON字符串)转换为数据帧并显示它.我正在做类似下面的事情,

public void call(JavaRDD

架构如下所示,

public static StructType buildSchema() {
    StructType schema = new StructType(
            new StructField[] { DataTypes.createStructField("student_id",DataTypes.StringType,false),DataTypes.createStructField("school_id",DataTypes.IntegerType,DataTypes.createStructField("teacher",true),DataTypes.createStructField("rank",DataTypes.createStructField("created",DataTypes.TimestampType,DataTypes.createStructField("created_user",DataTypes.createStructField("notes",DataTypes.createStructField("additional_data",DataTypes.createStructField("datetime",true) });
    return (schema);
}

上面的代码回复了我,

|student_id|school_id|teacher|rank|created|created_user|notes|additional_data|datetime|
+----------+------+--------+-----+-----------+-------+------------+--------+-------------+-----+-------------------+---------+---------------+--------+----+-------+-----------+
|      null|  null|    null| null|       null|   null|        null|    null|         null|

但是,当我没有指定架构并创建Dataframe时,

DataFrame df = sqlContext.read().json(filteredRDD);

这给我的结果如下,

|student_id|school_id|teacher|rank|created|created_user|notes|additional_data|datetime|
        +----------+------+--------+-----+-----------+-------+------------+--------+-------------+-----+-------------------+---------+---------------+--------+----+-------+-----------+
        |      1|  123|    xxx| 3|       2017-06-02 23:49:10.410|   yyyy|        NULL|    good academics|         2017-06-02 23:49:10.410|

示例JSON记录:

{"student_id": "1","school_id": "123","teacher": "xxx","rank": "3","created": "2017-06-02 23:49:10.410","created_user":"yyyy","notes": "NULL","additional_date":"good academics","datetime": "2017-06-02 23:49:10.410"}

对我做错的任何帮助?

最佳答案
问题是在我的json记录中,school_id是字符串类型,而spark显然无法从String转换为Integer.在这种情况下,它将整个记录视为null.我修改了我的模式,将school_id表示为StringType,解决了我的问题.有关它的一些很好的解释提供于:http://blog.antlypls.com/blog/2016/01/30/processing-json-data-with-sparksql/

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读