spark-instrumented-optimizer/python/pyspark
Matei Zaharia ca67909cd4 Merge pull request #311 from tmyklebu/master
SPARK-991: Report information gleaned from a Python stacktrace in the UI

Scala:

- Added setCallSite/clearCallSite to SparkContext and JavaSparkContext.
  These functions mutate a LocalProperty called "externalCallSite."
- Add a wrapper, getCallSite, that checks for an externalCallSite and, if
  none is found, calls the usual Utils.formatSparkCallSite.
- Change everything that calls Utils.formatSparkCallSite to call
  getCallSite instead. Except getCallSite.
- Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext.

Python:

- Add a gruesome hack to rdd.py that inspects the traceback and guesses
  what you want to see in the UI.
- Add a RAII wrapper around said gruesome hack that calls
  setCallSite/clearCallSite as appropriate.
- Wire said RAII wrapper up around three calls into the Scala code.
  I'm not sure that I hit all the spots with the RAII wrapper. I'm also
  not sure that my gruesome hack does exactly what we want.

One could also approach this change by refactoring
runJob/submitJob/runApproximateJob to take a call site, then threading
that parameter through everything that needs to know it.

One might object to the pointless-looking wrappers in JavaSparkContext.
Unfortunately, I can't directly access the SparkContext from
Python---or, if I can, I don't know how---so I need to wrap everything
that matters in JavaSparkContext.

Conflicts:
	core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
2014-01-02 15:54:54 -05:00
..
mllib Remove commented code in __init__.py. 2013-12-25 14:12:42 -05:00
__init__.py Fix some Python docs and make sure to unset SPARK_TESTING in Python 2013-12-29 20:15:07 -05:00
accumulators.py Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00
broadcast.py Fix some Python docs and make sure to unset SPARK_TESTING in Python 2013-12-29 20:15:07 -05:00
cloudpickle.py Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
conf.py Fix Python code after change of getOrElse 2014-01-01 23:21:34 -05:00
context.py Fix Python code after change of getOrElse 2014-01-01 23:21:34 -05:00
daemon.py Add Apache license headers and LICENSE and NOTICE files 2013-07-16 17:21:33 -07:00
files.py Initial work to rename package to org.apache.spark 2013-09-01 14:13:13 -07:00
java_gateway.py Merge remote-tracking branch 'origin/master' into conf2 2013-12-29 15:08:08 -05:00
join.py Change numSplits to numPartitions in PySpark. 2013-02-24 13:25:09 -08:00
rdd.py Make Python function/line appear in the UI. 2013-12-28 23:34:16 -05:00
rddsampler.py RDD sample() and takeSample() prototypes for PySpark 2013-08-28 16:46:13 -07:00
serializers.py The rest of the Python side of those bindings. 2013-12-19 01:29:51 -05:00
shell.py Typo: avaiable -> available 2013-12-24 17:25:04 -08:00
statcounter.py Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark 2013-08-21 17:05:58 -07:00
storagelevel.py Export StorageLevel and refactor 2013-09-07 14:41:31 -07:00
tests.py Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289. 2013-12-24 14:01:13 -08:00
worker.py FramedSerializer: _dumps => dumps, _loads => loads. 2013-11-10 17:53:25 -08:00