c387e40fb1
This adds tracking to determine the "origin" of an RDD. Origin is defined by the boundary between the user's code and the spark code, during an RDD's instantiation. It is meant to help users understand where a Spark RDD is coming from in their code. This patch also logs origin data when stages are submitted to the scheduler. Finally, it adds a new log message to fix an inconsitency in the way that dependent stages (those missing parents) and independent stages (those without) are logged during submission. |
||
---|---|---|
.. | ||
lib | ||
src |