火花scala运行
发布时间:2020-12-16 18:01:19 所属栏目:安全 来源:网络整理
导读:嗨,我是新来的火花和斯卡拉.我在spark scala提示符下运行 scala代码.该程序很好,它显示“定义模块MLlib”,但它不在屏幕上打印任何东西.我做错了什么?有没有其他方法在scala shell中运行此程序spark并获得输出? import org.apache.spark.{SparkConf,SparkCo
嗨,我是新来的火花和斯卡拉.我在spark
scala提示符下运行
scala代码.该程序很好,它显示“定义模块MLlib”,但它不在屏幕上打印任何东西.我做错了什么?有没有其他方法在scala shell中运行此程序spark并获得输出?
import org.apache.spark.{SparkConf,SparkContext} import org.apache.spark.mllib.classification.LogisticRegressionWithSGD import org.apache.spark.mllib.feature.HashingTF import org.apache.spark.mllib.regression.LabeledPoint object MLlib { def main(args: Array[String]) { val conf = new SparkConf().setAppName(s"Book example: Scala") val sc = new SparkContext(conf) // Load 2 types of emails from text files: spam and ham (non-spam). // Each line has text from one email. val spam = sc.textFile("/home/training/Spam.txt") val ham = sc.textFile("/home/training/Ham.txt") // Create a HashingTF instance to map email text to vectors of 100 features. val tf = new HashingTF(numFeatures = 100) // Each email is split into words,and each word is mapped to one feature. val spamFeatures = spam.map(email => tf.transform(email.split(" "))) val hamFeatures = ham.map(email => tf.transform(email.split(" "))) // Create LabeledPoint datasets for positive (spam) and negative (ham) examples. val positiveExamples = spamFeatures.map(features => LabeledPoint(1,features)) val negativeExamples = hamFeatures.map(features => LabeledPoint(0,features)) val trainingData = positiveExamples ++ negativeExamples trainingData.cache() // Cache data since Logistic Regression is an iterative algorithm. // Create a Logistic Regression learner which uses the LBFGS optimizer. val lrLearner = new LogisticRegressionWithSGD() // Run the actual learning algorithm on the training data. val model = lrLearner.run(trainingData) // Test on a positive example (spam) and a negative one (ham). // First apply the same HashingTF feature transformation used on the training data. val posTestExample = tf.transform("O M G GET cheap stuff by sending money to ...".split(" ")) val negTestExample = tf.transform("Hi Dad,I started studying Spark the other ...".split(" ")) // Now use the learned model to predict spam/ham for new emails. println(s"Prediction for positive test example: ${model.predict(posTestExample)}") println(s"Prediction for negative test example: ${model.predict(negTestExample)}") sc.stop() } } 解决方法
有几件事:
您在Spark shell中定义了对象,因此不会立即调用主类.在定义对象后,您必须明确地调用它: MLlib.main(阵列()) 事实上,如果你继续使用shell / REPL,你可以完全取消对象;你可以直接定义这个功能.例如: import org.apache.spark.{SparkConf,SparkContext} import org.apache.spark.mllib.classification.LogisticRegressionWithSGD import org.apache.spark.mllib.feature.HashingTF import org.apache.spark.mllib.regression.LabeledPoint def MLlib { //the rest of your code } 但是,您不应该在shell中初始化SparkContext.从documentation:
因此,您必须从代码中删除该位,或者将其编译为jar并使用 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |