Data Locality

The computation achieve data locality by the RDD.preferredLocation. When submitting missing tasks, the location of a partion of a specific RDD is retrieved by eitehr getCacheLocs or rdd.preferredLocations.

Take a ShuffledRDD as an example, its getPreferredLocations is realized MapOutputTrackerMaster.getPreferredLocationsForShuffle, which returns th hosts on which the most outputs for that partition are on.

results matching ""

    No results matching ""