scala – Spark多个上下文
发布时间:2020-12-16 09:26:57 所属栏目:安全 来源:网络整理
导读:简而言之 : EC2集群:1个主3个从属 Spark版本:1.3.1 我希望使用选项spark.driver.allowMultipleContexts,一个上下文本地(仅限主)和一个集群(主服务器和从服务器). 我得到这个stacktrace错误(第29行是我调用初始化第二个sparkcontext的对象): fr.entry.Mai
简而言之 :
EC2集群:1个主3个从属 Spark版本:1.3.1 我希望使用选项spark.driver.allowMultipleContexts,一个上下文本地(仅限主)和一个集群(主服务器和从服务器). 我得到这个stacktrace错误(第29行是我调用初始化第二个sparkcontext的对象): fr.entry.Main.main(Main.scala) at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1$$anonfun$apply$10.apply(SparkContext.scala:1812) at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1$$anonfun$apply$10.apply(SparkContext.scala:1808) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:1808) at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:1795) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:1795) at org.apache.spark.SparkContext$.setActiveContext(SparkContext.scala:1847) at org.apache.spark.SparkContext.<init>(SparkContext.scala:1754) at fr.entry.cluster$.<init>(Main.scala:79) at fr.entry.cluster$.<clinit>(Main.scala) at fr.entry.Main$delayedInit$body.apply(Main.scala:29) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.App$$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at fr.entry.Main$.main(Main.scala:14) at fr.entry.Main.main(Main.scala) 15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app- 20150928153330-0036/2 is now LOADING 15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app- 20150928153330-0036/0 is now RUNNING 15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app-20150928153330-0036/1 is now RUNNING 15/09/28 15:33:30 INFO SparkContext: Starting job: sum at Main.scala:29 15/09/28 15:33:30 INFO DAGScheduler: Got job 0 (sum at Main.scala:29) with 2 output partitions (allowLocal=false) 15/09/28 15:33:30 INFO DAGScheduler: Final stage: Stage 0(sum at Main.scala:29) 15/09/28 15:33:30 INFO DAGScheduler: Parents of final stage: List() 15/09/28 15:33:30 INFO DAGScheduler: Missing parents: List() 15/09/28 15:33:30 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at numericRDDToDoubleRDDFunctions at Main.scala:29),which has no missing parents 15/09/28 15:33:30 INFO MemoryStore: ensureFreeSpace(2264) called with curMem=0,maxMem=55566516879 15/09/28 15:33:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.2 KB,free 51.8 GB) 15/09/28 15:33:30 INFO MemoryStore: ensureFreeSpace(1656) called with curMem=2264,maxMem=55566516879 15/09/28 15:33:30 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1656.0 B,free 51.8 GB) 15/09/28 15:33:30 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:40476 (size: 1656.0 B,free: 51.8 GB) 15/09/28 15:33:30 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0 15/09/28 15:33:30 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839 15/09/28 15:33:30 INFO AppClient$ClientActor: Executor updated: app-20150928153330-0036/2 is now RUNNING 15/09/28 15:33:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at numericRDDToDoubleRDDFunctions at Main.scala:29) 15/09/28 15:33:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 15/09/28 15:33:45 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 15/09/28 15:34:00 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources 更多细节 : 我想运行一个做两件事的程序.首先我有一个sparkContext本地(仅在master上),我做一个RDD并做一些操作.其次,我有一个第二个sparkContext,它使用一个主服务器和3个从服务器进行初始化,这也是一个RDD并进行一些操作. 简单示例: val arr = Array(Array(1,2,3,4,5,6,7,8),Array(1,8)) println(local.sparkContext.makeRDD(arr).count()) println(cluster.sparkContext.makeRDD(arr).map(l => l.sum).sum) 我的两个SparkContexts: object local { val project = "test" val version = "1.0" val sc = new SparkConf() .setMaster("local[16]") .setAppName("Local") .set("spark.local.dir","/mnt") .setJars(Seq("target/scala-2.10/" + project + "-assembly-" + version + ".jar","target/scala-2.10/" + project + "_2.10-" + version + "-tests.jar")) .setSparkHome("/root/spark") .set("spark.driver.allowMultipleContexts","true") .set("spark.executor.memory","45g") val sparkContext = new SparkContext(sc) } object cluster { val project = "test" val version = "1.0" val sc = new SparkConf() .setMaster(masterURL) // ec2-XX-XXX-XXX-XX.compute-1.amazonaws.com .setAppName("Cluster") .set("spark.local.dir","/mnt") .setJars(Seq("target/scala-2.10/" + project + "-assembly-" + version + ".jar","target/scala-2.10/" + project + "_2.10-" + version + "-tests.jar") ++ otherJars) .setSparkHome("/root/spark") .set("spark.driver.allowMultipleContexts","true") .set("spark.executor.memory","35g") val sparkContext = new SparkContext(sc) } 我怎样才能解决这个问题? 解决方法
虽然存在配置选项spark.driver.allowMultipleContexts,但这是误导性的,因为不鼓励使用多个Spark上下文.此选项仅用于Spark内部测试,不应在用户程序中使用.在单个JVM中运行多个Spark上下文时,您可以获得意外结果.
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
相关内容
- angularjs – 使用$compileProvider.debugInfoEnabled(fals
- scala – 从方法符号和正文创建方法定义树
- angularjs学习笔记——使用requirejs动态注入控制器
- 地图无法在scala中串行化?
- angularjs – 量角器 – 在转发器中计数元素并打印
- 【WebService】——SOAP、WSDL和UDDI
- 当路径包含符号链接时,如何在unix / linux中移回一个目录?
- angular – 路由解析时RxJS forkJoin句柄错误
- angularjs – 角度ng变化延迟
- shell – 如何使用linux命令sed处理Little-endian UTF-16文