scala – Apache Zeppelin无法反序列化数据集:“NoSuchMethodEr
我正在尝试使用Apache Zeppelin(0.7.2,在Mac上本地运行的网络安装)来探索从s3存储桶加载的数据.数据似乎加载得很好,如命令:
val p = spark.read.textFile("s3a://sparkcookbook/person") 给出结果: p: org.apache.spark.sql.Dataset[String] = [value: string] 但是,当我尝试在对象p上调用方法时,我收到错误.例如: p.take(1) 结果是: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2113) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2112) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795) at org.apache.spark.sql.Dataset.head(Dataset.scala:2112) at org.apache.spark.sql.Dataset.take(Dataset.scala:2327) 我的conf / zeppelin-env.sh与默认值相同,只是我在那里定义了亚马逊访问密钥和秘密密钥环境变量.在Zeppelin笔记本中的Spark解释器中,我添加了以下工件: org.apache.hadoop:hadoop-aws:2.7.3 com.amazonaws:aws-java-sdk:1.7.9 com.fasterxml.jackson.core:jackson-core:2.9.0 com.fasterxml.jackson.core:jackson-databind:2.9.0 com.fasterxml.jackson.core:jackson-annotations:2.9.0 (我认为只有前两个是必要的).上面的两个命令在Spark shell中工作正常,而不是在Zeppelin笔记本中(请参阅How to use s3 with Apache spark 2.2 in the Spark shell,了解它是如何设置的). 所以杰克逊图书馆之一似乎存在问题.也许我在Zeppelin解释器上使用了错误的工件? 更新:根据下面提议的答案中的建议,我删除了Zeppelin附带的杰克逊罐子,并用以下内容替换它们: jackson-annotations-2.6.0.jar jackson-core-2.6.7.jar jackson-databind-2.6.7.jar 并用这些替换了工件,所以我的工件现在是: org.apache.hadoop:hadoop-aws:2.7.3 com.amazonaws:aws-java-sdk:1.7.9 com.fasterxml.jackson.core:jackson-core:2.6.7 com.fasterxml.jackson.core:jackson-databind:2.6.7 com.fasterxml.jackson.core:jackson-annotations:2.6.0 但是,从运行上述命令得到的错误是相同的. UDPATE2:根据我从工件列表中删除了jackson库,因为它们现在已经存在于jar /文件夹中 – 唯一添加的工件现在是上面的aws工件.然后我通过在笔记本中输入以下内容来清理类路径(根据instructions): %spark.dep z.reset() 我现在得到一个不同的错误: val p = spark.read.textFile("s3a://sparkcookbook/person") p.take(1) p: org.apache.spark.sql.Dataset[String] = [value: string] java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class; at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49) at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala) at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61) at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20) at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37) at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala) at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82) at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:308) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) 更新3:根据以下建议答案的评论中的建议,我通过删除本地存储库中的所有文件来清理类路径: rm -rf local-repo/* 然后我重新启动了Zeppelin服务器.为了检查类路径,我在笔记本中执行了以下操作: val cl = ClassLoader.getSystemClassLoader cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println) 这给出了以下输出(我在这里仅包括输出中的jackson库,否则输出太长而无法粘贴): ... file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.1.1.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-annotations-2.2.3.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.1.1.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-2.2.3.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-core-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.1.1.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-databind-2.2.3.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-jaxrs-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-mapper-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/local-repo/2CT9CPAA9/jackson-xc-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-annotations-2.6.0.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-core-2.6.7.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/lib/jackson-databind-2.6.7.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-annotations-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-core-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-databind-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/zeppelin-0.7.2-bin-netinst/jackson-mapper-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-annotations-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-core-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-databind-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-jaxrs-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-mapper-asl-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-paranamer-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-module-scala_2.11-2.6.5.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/jackson-xc-1.9.13.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/json4s-jackson_2.11-3.2.11.jar file:/Users/shafiquejamal/allfiles/scala/spark/spark-2.1.0-bin-hadoop2.7/jars/parquet-jackson-1.8.1.jar ... 似乎从repo中获取了多个版本.我应该排除旧版本吗?如果是这样,我该怎么做? 解决方法
使用这个jar版本;
AWS-Java的SDK-1.7.4.jar Hadoop的AWS-2.6.0.jar 就像在这个脚本中:https://github.com/2dmitrypavlov/sparkDocker/blob/master/zeppelin.sh echo’export SPARK_SUBMIT_OPTIONS =“ – jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/aws-java-sdk-1.7.4.jar,/root/jars/hadoop- AWS-2.6.0.jar“’>> zeppelin-env.sh 之后重启zeppelin. 上面链接的代码粘贴在下面(以防链接变得陈旧): #!/bin/bash # Download jars cd /root/jars wget http://central.maven.org/maven2/mysql/mysql-connector-java/5.1.39/mysql-connector-java-5.1.39.jar cd /usr/share/ wget http://archive.apache.org/dist/zeppelin/zeppelin-0.7.1/zeppelin-0.7.1-bin-all.tgz tar -zxvf zeppelin-0.7.1-bin-all.tgz cd zeppelin-0.7.1-bin-all/conf cp zeppelin-env.sh.template zeppelin-env.sh echo 'export MASTER=spark://'$MASTERZ':7077'>>zeppelin-env.sh echo 'export SPARK_SUBMIT_OPTIONS="--jars /root/jars/mysql-connector-java-5.1.39.jar,/root/jars/hadoop-aws-2.6.0.jar"'>>zeppelin-env.sh echo 'export ZEPPELIN_NOTEBOOK_STORAGE="org.apache.zeppelin.notebook.repo.VFSNotebookRepo,org.apache.zeppelin.notebook.repo.zeppelinhub.ZeppelinHubRepo"'>>zeppelin-env.sh echo 'export ZEPPELINHUB_API_ADDRESS="https://www.zeppelinhub.com"'>>zeppelin-env.sh echo 'export ZEPPELIN_PORT=9999'>>zeppelin-env.sh echo 'export SPARK_HOME=/usr/share/spark'>>zeppelin-env.sh cd ../bin/ ./zeppelin.sh (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |