In short, there is a very basic concept in RDD called RDD operation: Transformation v.s. Action. See more details here: https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/
The part `.toRdd.mapPartitions` in the code below
lazy val rdd: RDD[T] = {
val objectType = exprEnc.deserializer.dataType
rddQueryExecution.toRdd.mapPartitions { rows =>
rows.map(_.get(0, objectType).asInstanceOf[T])
}
}
is just adding a transformation from DataFrame's Tungsten storage format of RDD[InternalRow] into RDD[JavaObject], which will not be executed if you just want to do a `getNumPartitions` operation.
This added transformation will only be executed when you have follow-up actions on the converted RDD.
There is a very basic concept in RDD called RDD operation: Transformation v.s. Action. See more details here: https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/
The part `.toRdd.mapPartitions` in the code below
lazy val rdd: RDD[T] = {
val objectType = exprEnc.deserializer.dataType
rddQueryExecution.toRdd.mapPartitions { rows =>
rows.map(_.get(0, objectType).asInstanceOf[T])
}
}
is just adding a transformation from DataFrame's Tungsten storage format of RDD[InternalRow] into RDD[JavaObject], which will not be executed if you just want to do a `getNumPartitions` operation.
This added transformation will only be executed when you have follow-up actions on the converted RDD.