scala – 如何在每行中添加行号？

发布时间：2020-12-16 09:07:55 所属栏目：安全来源：网络整理

导读：假设这些是我的数据： ‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.‘Map’ is responsible to read data from input location.it will generate a key value pair.that is,an intermediate output in local machine.’Reducer’ i

假设这些是我的数据：

‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
‘Map’ is responsible to read data from input location.
it will generate a key value pair.
that is,an intermediate output in local machine.
’Reducer’ is responsible to process the intermediate.
output received from the mapper and generate the final output.

我想在每一行添加一个数字,如下面的输出：

1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.
2,‘Map’ is responsible to read data from input location.
3,it will generate a key value pair.
4,that is,an intermediate output in local machine.
5,’Reducer’ is responsible to process the intermediate.
6,output received from the mapper and generate the final output.

将它们保存到文件中.

我试过了：

object DS_E5 {
  def main(args: Array[String]): Unit = {

    var i=0
    val conf = new SparkConf().setAppName("prep").setMaster("local")
    val sc = new SparkContext(conf)
    val sample1 = sc.textFile("data.txt")
    for(sample<-sample1){
      i=i+1
      val ss=sample.map(l=>(i,sample))
      println(ss)
    }
 }
}

但它的输出就像吹：

Vector((1,‘Maps‘ and ‘Reduces‘ are two phases of solving a query in HDFS.))
...

如何编辑我的代码以生成像我最喜欢的输出的输出？

解决方法

zipWithIndex就是你需要的.它通过在对的第二个位置上添加索引,从RDD [T]映射到RDD [(T,Long)].

sample1
   .zipWithIndex()
   .map { case (line,i) => i.toString + "," + line }

或使用字符串插值(请参阅@ DanielC.Sobral的评论)

sample1
    .zipWithIndex()
    .map { case (line,i) => s"$i,$line" }

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!