元组到火花scala中的数据框架
发布时间:2020-12-16 10:05:49 所属栏目:安全 来源:网络整理
导读:我有一个名为数组列表的数组,看起来像这样 arraylist: Array[(String,Any)] = Array((id,772914),(x4,2),(x5,24),(x6,1),(x7,77491.25),(x8,17911.77778),(x9,225711),(x10,17),(x12,6),(x14,5),(x16,(x18,5.0),(x19,8.0),(x20,7959.0),(x21,676.0),(x22,228
我有一个名为数组列表的数组,看起来像这样
arraylist: Array[(String,Any)] = Array((id,772914),(x4,2),(x5,24),(x6,1),(x7,77491.25),(x8,17911.77778),(x9,225711),(x10,17),(x12,6),(x14,5),(x16,(x18,5.0),(x19,8.0),(x20,7959.0),(x21,676.0),(x22,228.5068871),(x23,195.0),(x24,109.6015511),(x25,965.0),(x26,1017.79043),(x27,2.0),(Target,(x29,13),(x30,735255.5),(x31,332998.432),(x32,38168.75),(x33,107957.5278),(x34,(x35,(x36,(x37,(x38,(x39,(x40,(x41,7),(x42,(x43,(x44,(x45,(x46,(x47,(x48,(x49,14.0),(x50,2.588435821),(x51,617127.5),(x52,414663.9738),(x53,39900.0),(x54,16743.15781),(x55,105000.0),(x56,52842.29076),(x57,25750.46154),(x58,8532.045819),(x64,(x66,(x67,(x68,(x69,(x70,(x71,(x73,(... 我想将它转换为具有两列“ID”和值的数据帧.这就是我正在使用的代码 val df = sc.parallelize(arraylist).toDF("Names","Values") 但是我收到了一个错误 java.lang.UnsupportedOperationException: Schema for type Any is not supported 我怎样才能克服这个问题? 解决方法
消息告诉您所有内容:)任何不支持作为DataFrame列的类型.任何类型都可以由null作为元组的第二个元素引起
将arraylist类型更改为Array [(String,Int)](如果您可以手动执行;如果它由Scala扣除,则检查空值和第二个元素的无效值)或手动创建架构: import org.apache.spark.sql.types._ import org.apache.spark.sql._ val arraylist: Array[(String,Any)] = Array(("id",("x4",("x5",24.0)); val schema = StructType( StructField("Names",StringType,false) :: StructField("Values",DoubleType,false) :: Nil) val rdd = sc.parallelize (arraylist).map (x => Row(x._1,x._2.asInstanceOf[Number].doubleValue())) val df = sqlContext.createDataFrame(rdd,schema) df.show() 注意:createDataFrame需要RDD [Row],所以我将元组的RDD转换为Row的RDD (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |