scala – 在Yarn集群上提交Spark作业
我现在因为以下问题而奋斗了2天以上.
我在SCALA中编写了一个基本的“HelloWorld”脚本 object Hello extends App{ println("WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1") val data = 1 to 15 for( a <- data ){ println( "Value of a: " + a ) Thread sleep 1000 } 然后我用SBT编译以获得JAR编译版本. 然后,我使用HDP 2.2.4.2转移了集群(在虚拟Linux机器上运行的Horthonworks沙箱)上的所有内容. 我实际上能够使用yarn-client在集群上使用以下命令运行作业: spark-submit –verbose –master yarn-client –class Hello SCALA / hello.jar 但是,尝试使用以下命令在yarn-cluster上提交相同的helloWorld作业时 spark-submit –verbose –master yarn-cluster – 类Hello SCALA / hello.jar 作业首先正常运行(输出是预期的,并且退出0)但随后作业停止,具有以下内容: 15/06/05 15:52:09 INFO Client: Application report for application_1433491352951_0010 (state: FAILED) 15/06/05 15:52:09 INFO Client: client token: N/A diagnostics: Application application_1433491352951_0010 failed 2 times due to AM Container for appattempt_1433491352951_0010_000002 exited with exitCode: 0 For more detailed output,check application tracking page:http://sandbox.hortonworks.com:8088/proxy/application_1433491352951_0010/Then,click on links to logs of each attempt. Diagnostics: Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1433519471297 final status: FAILED tracking URL: http://sandbox.hortonworks.com:8088/cluster/app/application_1433491352951_0010 user: root Error: application failed with exception org.apache.spark.SparkException: Application finished with failed status at org.apache.spark.deploy.yarn.ClientBase$class.run(ClientBase.scala:522) at org.apache.spark.deploy.yarn.Client.run(Client.scala:35) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:139) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 然后我决定使用以下命令行检查日志: yarn logs -applicationId application_1433491352951_00010 我得到: 15/06/05 15:56:33 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 15/06/05 15:56:33 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.182.129:8050 15/06/05 15:56:35 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 15/06/05 15:56:35 INFO compress.CodecPool: Got brand-new decompressor [.deflate] Container: container_e08_1433491352951_0010_01_000001 on sandbox.hortonworks.com_45454 ======================================================================================== LogType:stderr Log Upload Time:Fri Jun 05 15:52:10 +0000 2015 LogLength:2050 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/28/spark-assembly-1.2.1.2.2.4.2-2-hadoop2.6.0.2.2.4.2-2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/29/hello.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/06/05 15:51:18 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM,HUP,INT] 15/06/05 15:51:20 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1433491352951_0010_000001 15/06/05 15:51:21 INFO spark.SecurityManager: Changing view acls to: yarn,root 15/06/05 15:51:21 INFO spark.SecurityManager: Changing modify acls to: yarn,root 15/06/05 15:51:21 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn,root); users with modify permissions: Set(yarn,root) 15/06/05 15:51:21 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 15/06/05 15:51:21 INFO yarn.ApplicationMaster: Waiting for spark context initialization 15/06/05 15:51:21 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 0 15/06/05 15:51:31 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 1 15/06/05 15:51:36 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED,exitCode: 0 15/06/05 15:51:41 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application. 15/06/05 15:51:41 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED 15/06/05 15:51:41 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1433491352951_0010 LogType:stdout Log Upload Time:Fri Jun 05 15:52:10 +0000 2015 LogLength:300 Log Contents: WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1 Value of a: 1 Value of a: 2 Value of a: 3 Value of a: 4 Value of a: 5 Value of a: 6 Value of a: 7 Value of a: 8 Value of a: 9 Value of a: 10 Value of a: 11 Value of a: 12 Value of a: 13 Value of a: 14 Value of a: 15 Container: container_e08_1433491352951_0010_02_000001 on sandbox.hortonworks.com_45454 ======================================================================================== LogType:stderr Log Upload Time:Fri Jun 05 15:52:10 +0000 2015 LogLength:2050 Log Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/28/spark-assembly-1.2.1.2.2.4.2-2-hadoop2.6.0.2.2.4.2-2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/root/filecache/29/hello.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/06/05 15:51:45 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM,INT] 15/06/05 15:51:47 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1433491352951_0010_000002 15/06/05 15:51:48 INFO spark.SecurityManager: Changing view acls to: yarn,root 15/06/05 15:51:48 INFO spark.SecurityManager: Changing modify acls to: yarn,root 15/06/05 15:51:48 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn,root) 15/06/05 15:51:48 INFO yarn.ApplicationMaster: Starting the user JAR in a separate Thread 15/06/05 15:51:48 INFO yarn.ApplicationMaster: Waiting for spark context initialization 15/06/05 15:51:48 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 0 15/06/05 15:51:58 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 1 15/06/05 15:52:03 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED,exitCode: 0 15/06/05 15:52:08 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application. 15/06/05 15:52:08 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED 15/06/05 15:52:08 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1433491352951_0010 LogType:stdout Log Upload Time:Fri Jun 05 15:52:10 +0000 2015 LogLength:300 Log Contents: WELCOME TO A FIRST TEST WITH SCALA COMPILED WITH SBT counting fr. 1:15 with sleep 1 Value of a: 1 Value of a: 2 Value of a: 3 Value of a: 4 Value of a: 5 Value of a: 6 Value of a: 7 Value of a: 8 Value of a: 9 Value of a: 10 Value of a: 11 Value of a: 12 Value of a: 13 Value of a: 14 Value of a: 15 花了几个小时在网上看,看起来这可能是环境定义的问题,但我没有得到任何帮助:(. 如果你能解决这个问题我真的很感谢你… 提前感谢您花时间阅读我的帖子, 锑 谢谢你的回复Zouzias.我接受了你建议的HelloWorld项目,重新编译并重试.现在我遇到了另一个问题:当我使用以下命令提交任务时: spark-submit --verbose --master yarn-cluster SCALA/hello.jar 我得到以下评论运行无限: 15/06/08 16:42:35 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,sleepTime=1000 MILLISECONDS) 我真的不明白,因为看起来服务器没有响应,而程序应该从沙盒中运行在Hadoop集群上. 解决方法
在我的情况下我用过:
val config = new SparkConf() config.setMaster("local[*]") 并使用以下方式提交作业: spark-submit --master yarn-cluster .. 一旦我从代码中删除了config.setMaster,问题就解决了. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |