WholeStageCodegenExec
It construct RDD in doExecute, which initialize BufferedRowIterator with the source generated from doCodeGen, and initialized with the input iterator.
Note that the instance constructed is subclass of BufferedRowIterator.
inputRDDs It is used to retrieve the rdd from the start of the WholeStageCodeGen. For example, in InputAdaptor, which is only used when there is one input RDD.
If there are mulitple inputRDDs, e.g., SortMergeJoinExec, its child will be replaced as InputAdapter, but the iterator is retrieved from its children directly and using next to process each rows in the SortMergeJoinExec, instead of using doProduce/doConsume.
override def inputRDDs(): Seq[RDD[InternalRow]] = {
child.execute() :: Nil
}
But in Project
override def inputRDDs(): Seq[RDD[InternalRow]] = {
child.asInstanceOf[CodegenSupport].inputRDDs()
}
doConsume: it appends the row to currentRows, invoked by upstream. Note the difference between this one and other operators. This one actually return the UnsafeRow.
processNext:: invoke child.asInstanceOf[CodegenSupport].produce(ctx, this) to start iterating on the iterator. Note that this is invoked by BufferRowIterator.hasNext