UnsafeShuffleWriter

It use ShuffleExternalSorter for write.

Requirements: Please check SortShuffleManager.canUseSerializedShuffle to see whether it can be used.

  1. No aggregator defined, because it is hard to do any aggregation on serialized data.
  2. Serialized object can be relocated, because sorting may move their location.
  3. Size limit: 16 million output partitions

Note that each spill will be closed in ShuffleExternalSorter, and later on, will merge all the spills together.

In the merge procedure, all spilled files will be open. the merge write to the final file by going through the partitions, and write the records with the length specified in the spill info. The concurrent opened file number is the number of spills.

mergeSpillsWithFileStream/mergeSpillsWithTransferTo itself will do merge join and write to output file.

results matching ""

    No results matching ""