scala – Spark case类 – 十进制类型编码器错误“无法从十进制

发布时间：2020-12-16 18:39:49 所属栏目：安全来源：网络整理

导读：我正在从 MySQL / MariaDB中提取数据,并且在创建数据集期间,数据类型会出错 Exception in thread “main” org.apache.spark.sql.AnalysisException: Cannot up cast AMOUNT from decimal(30,6) to decimal(38,18) as it may truncate The type path of the

我正在从 MySQL / MariaDB中提取数据,并且在创建数据集期间,数据类型会出错

Exception in thread “main” org.apache.spark.sql.AnalysisException:
Cannot up cast AMOUNT from decimal(30,6) to decimal(38,18) as it may
truncate The type path of the target object is:
– field (class: “org.apache.spark.sql.types.Decimal”,name: “AMOUNT”)
– root class: “com.misp.spark.Deal” You can either add an explicit cast to the input data or choose a higher precision type of the field
in the target object;

案例类定义如下

case class
(
AMOUNT: Decimal
)

任何人都知道如何解决它而不是触摸数据库？

解决方法

那个错误说apache spark不能自动将BigDecimal(30,6)从数据库转换为BigDecimal(38,18),这在数据集中需要(我不知道为什么它需要固定的参数38,18.它甚至更多奇怪的是,火花不能自动将低精度的类型转换为高精度的类型).

据报道有一个错误：https://issues.apache.org/jira/browse/SPARK-20162(也许是你).无论如何,我找到了通过在数据帧中将列转换为BigDecimal(38,18)然后将数据框转换为数据集来读取数据的良好解决方法.

//first read data to dataframe with any way suitable for you
var df: DataFrame = ???
val dfSchema = df.schema

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DecimalType
dfSchema.foreach { field =>
  field.dataType match {
    case t: DecimalType if t != DecimalType(38,18) =>
      df = df.withColumn(field.name,col(field.name).cast(DecimalType(38,18)))
  }
}
df.as[YourCaseClassWithBigDecimal]

它应该解决阅读问题(但不是写作我猜)

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!