Data Locality
The computation achieve data locality by the RDD.preferredLocation. When submitting missing tasks, the location of a partion of a specific RDD is retrieved by eitehr getCacheLocs or rdd.preferredLocations.
Take a ShuffledRDD as an example, its getPreferredLocations is realized MapOutputTrackerMaster.getPreferredLocationsForShuffle, which returns th hosts on which the most outputs for that partition are on.