加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > 安全 > 正文

scala – spark streaming kafka – spark session API

发布时间:2020-12-16 10:05:50 所属栏目:安全 来源:网络整理
导读:感谢您使用spark 2.0.2运行火花流程序的帮助. 运行错误“java.lang.ClassNotFoundException:找不到数据源:kafka”.修改后的POM文件如下. 正在创建Spark,但是在调用来自kafka的负载时出现错误. 创建火花会话: val spark = SparkSession .builder() .master
感谢您使用spark 2.0.2运行火花流程序的帮助.

运行错误“java.lang.ClassNotFoundException:找不到数据源:kafka”.修改后的POM文件如下.

正在创建Spark,但是在调用来自kafka的负载时出现错误.

创建火花会话:

val spark = SparkSession
            .builder()
            .master(master)
            .appName("Apache Log Analyzer Streaming from Kafka")
            .config("hive.metastore.warehouse.dir",hiveWarehouse)
            .config("fs.defaultFS",hdfs_FS)
            .enableHiveSupport()
            .getOrCreate()

创建kafka流媒体:

val logLinesDStream = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers","localhost:2181")
      .option("subscribe",topics)
      .load()

错误信息:

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark-packages.org

pom.xml中:

<scala.version>2.10.4</scala.version>
        <scala.compat.version>2.10</scala.compat.version>
        <spark.version>2.0.2</spark.version>
    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>
<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>

       <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-streaming-kafka-0-10_2.10</artifactId>
       <version>${spark.version}</version>
       </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.5</version>
        </dependency>
</dependencies>

解决方法

我遇到了同样的问题.我已将spark版本从2.0.0升级到2.2.0并添加了Spark-sql-kafka依赖项.它对我来说非常合适.请找到依赖项.

<spark.version>2.2.0</spark.version>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
    <version>${spark.version}</version>
    <scope>test</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.11</artifactId>
    <version>0.10.2.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
    <version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.10.2.0</version>
</dependency>

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读