zip()
zip(...) is a simple and interesting operation which can be considered as general purpose operation. Basically it returns key-value pairs (ZippedPartitionsRDD2) with the first element in each RDD, second element in each RDD, etc. It is an element-wise , narrow transformation operation. Following example demonstrates the above definitions. val rdd1 = spark . sparkContext . parallelize ( Seq ( 1 , 2 , 3 , 4 , 5 ) ) val rdd2 = spark . sparkContext . parallelize ( Seq ( 6 , 7 , 8 , 9 , 10 ) ) rdd1 . zip ( rdd2) . foreach ( println ( _ ) ) Result (1,6) (2,7) (4,9) (5,10) (3,8) DAG The above result shows that, first element of rdd1 is zipped with first element of rdd2 and so on.Also the DAG confirms that zip(...) is a narrow operation. If you look at the code, you could find both rdd1 and rdd2 has same number of record counts but interesting point here is, following error will be thrown if your rdds has different record counts. Caused by: org.apache.sp