加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 综合聚焦 > 服务器 > 安全 > 正文

scala – 如何在每行中添加行号?

发布时间:2020-12-16 09:07:55 所属栏目:安全 来源:网络整理
导读:假设这些是我的数据: ‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.‘Map’ is responsible to read data from input location.it will generate a key value pair.that is,an intermediate output in local machine.’Reducer’ i
假设这些是我的数据:

‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
‘Map’ is responsible to read data from input location.
it will generate a key value pair.
that is,an intermediate output in local machine.
’Reducer’ is responsible to process the intermediate.
output received from the mapper and generate the final output.

我想在每一行添加一个数字,如下面的输出:

1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
2,‘Map’ is responsible to read data from input location.
3,it will generate a key value pair.
4,that is,an intermediate output in local machine.
5,’Reducer’ is responsible to process the intermediate.
6,output received from the mapper and generate the final output.

将它们保存到文件中.

我试过了:

object DS_E5 {
  def main(args: Array[String]): Unit = {

    var i=0
    val conf = new SparkConf().setAppName("prep").setMaster("local")
    val sc = new SparkContext(conf)
    val sample1 = sc.textFile("data.txt")
    for(sample<-sample1){
      i=i+1
      val ss=sample.map(l=>(i,sample))
      println(ss)
    }
 }
}

但它的输出就像吹:

Vector((1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.))
...

如何编辑我的代码以生成像我最喜欢的输出的输出?

解决方法

zipWithIndex就是你需要的.它通过在对的第二个位置上添加索引,从RDD [T]映射到RDD [(T,Long)].

sample1
   .zipWithIndex()
   .map { case (line,i) => i.toString + "," + line }

或使用字符串插值(请参阅@ DanielC.Sobral的评论)

sample1
    .zipWithIndex()
    .map { case (line,i) => s"$i,$line" }

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读