scala – java.sql.SQLException:在将DataFrame加载到Spark SQL
尝试将JDBC DataFrame加载到Spark SQL时,我遇到了非常奇怪的问题.
我在笔记本电脑上尝试过几个Spark集群 – YARN,独立集群和伪分布式模式.它可以在Spark 1.3.0和1.3.1上重现.在spark-shell和spark-submit执行代码时都会出现此问题.我试过MySQL& MS SQL JDBC驱动程序没有成功. 考虑以下示例: val driver = "com.mysql.jdbc.Driver" val url = "jdbc:mysql://localhost:3306/test" val t1 = { sqlContext.load("jdbc",Map( "url" -> url,"driver" -> driver,"dbtable" -> "t1","partitionColumn" -> "id","lowerBound" -> "0","upperBound" -> "100","numPartitions" -> "50" )) } 到目前为止,这个模式得到了很好的解决: t1: org.apache.spark.sql.DataFrame = [id: int,name: string] 但是当我评估DataFrame时: t1.take(1) 发生以下异常: 15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:270) at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158) at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150) at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317) at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 当我尝试在执行器上打开JDBC连接时: import java.sql.DriverManager sc.parallelize(0 until 2,2).map { i => Class.forName(driver) val conn = DriverManager.getConnection(url) conn.close() i }.collect() 它的工作完美: res1: Array[Int] = Array(0,1) 当我在本地Spark上运行相同的代码时,它也可以很好地工作: scala> t1.take(1) ... res0: Array[org.apache.spark.sql.Row] = Array([1,one]) 我正在使用Spark预制的Hadoop 2.4支持. 重现问题的最简单的方法是使用start-all.sh脚本以伪分布式模式启动Spark,并运行以下命令: /path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar 有没有办法呢?它看起来像一个严重的问题,所以很奇怪,谷歌搜索在这里没有帮助. 解决方法
显然这个问题最近已经报道:
https://issues.apache.org/jira/browse/SPARK-6913 问题出在java.sql.DriverManager中,看不到ClassLoaders加载的驱动程序,而不是引导ClassLoader. 作为临时的解决方法,可以添加必需的驱动程序来启动执行程序的类路径. 更新:此拉请求修复了问题:https://github.com/apache/spark/pull/5782 更新2:该修复程序合并到Spark 1.4 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |