scala – 如何在Mesos集群上使用Spark时预先打包外部库
发布时间:2020-12-16 09:25:51 所属栏目:安全 来源:网络整理
导读:根据 Spark on Mesos docs,需要设置指向Spark发行版的spark.executor.uri: val conf = new SparkConf() .setMaster("mesos://HOST:5050") .setAppName("My app") .set("spark.executor.uri","path to spark-1.4.1.tar.gz uploaded above") 文档还指出,可以
根据
Spark on Mesos docs,需要设置指向Spark发行版的spark.executor.uri:
val conf = new SparkConf() .setMaster("mesos://HOST:5050") .setAppName("My app") .set("spark.executor.uri","<path to spark-1.4.1.tar.gz uploaded above>") 文档还指出,可以构建Spark发行版的自定义版本. 我现在的问题是是否可能/希望预先打包外部库,例如 > spark-streaming-kafka 这将用于我将通过spark-submit提交的大部分工作罐 >减少sbt组装需要包装脂肪罐的时间 如果是这样,怎么能实现呢?一般来说,是否有一些提示可以加快工作提交过程中脂肪罐的生成速度? 背景是我想为Spark作业运行一些代码生成,并立即提交这些代码并在浏览器前端异步显示结果.前端部分不应该太复杂,但我想知道后端部分是如何实现的. 解决方法
使用所有依赖项创建示例maven项目,然后使用maven插件maven-shade-plugin.它将在目标文件夹中创建一个阴影罐.
这是样品pom <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com</groupId> <artifactId>test</artifactId> <version>0.0.1</version> <properties> <java.version>1.7</java.version> <hadoop.version>2.4.1</hadoop.version> <spark.version>1.4.0</spark.version> <version.spark-csv_2.10>1.1.0</version.spark-csv_2.10> <version.spark-avro_2.10>1.0.0</version.spark-avro_2.10> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> </execution> </executions> <configuration> <!-- <minimizeJar>true</minimizeJar> --> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> <exclude>org/bdbizviz/**</exclude> </excludes> </filter> </filters> <finalName>spark-${project.version}</finalName> </configuration> </plugin> </plugins> </build> <dependencies> <dependency> <!-- Hadoop dependency --> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> <exclusions> <exclusion> <artifactId>servlet-api</artifactId> <groupId>javax.servlet</groupId> </exclusion> <exclusion> <artifactId>guava</artifactId> <groupId>com.google.guava</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>joda-time</groupId> <artifactId>joda-time</artifactId> <version>2.4</version> </dependency> <dependency> <!-- Spark Core --> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <!-- Spark SQL --> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <!-- Spark CSV --> <groupId>com.databricks</groupId> <artifactId>spark-csv_2.10</artifactId> <version>${version.spark-csv_2.10}</version> </dependency> <dependency> <!-- Spark Avro --> <groupId>com.databricks</groupId> <artifactId>spark-avro_2.10</artifactId> <version>${version.spark-avro_2.10}</version> </dependency> <dependency> <!-- Spark Hive --> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <!-- Spark Hive thriftserver --> <groupId>org.apache.spark</groupId> <artifactId>spark-hive-thriftserver_2.10</artifactId> <version>${spark.version}</version> </dependency> </dependencies> </project> (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |