External Data Sources
External data sources framework is used to bring external data into Spark.
Two strategies are used, one is FileSourceStrategy and the other is DataSourceStrategy.
DataSourceScanExec is the bottom SparkPlan used to do scanning from external system. In detials, there are batched mode BatchedDataSourceScanExec and row mode RowDataSourceScanExec. The former is used for vectorization scan, supported by parquet (orc to be implemented).
For DataSourceStrategy, rdd is embedded in its construction, and constructed by relation.buildScan
For FileSourceStrategy, rdd is constructed by FileScanRDD