spark-instrumented-optimizer/python/pyspark
Davies Liu 091d32c52e [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode
Customized pickler should be registered before unpickling, but in executor, there is no way to register the picklers before run the tasks.

So, we need to register the picklers in the tasks itself, duplicate the javaToPython() and pythonToJava() in MLlib, call SerDe.initialize() before pickling or unpickling.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2830 from davies/fix_pickle and squashes the following commits:

0c85fb9 [Davies Liu] revert the privacy change
6b94e15 [Davies Liu] use JavaConverters instead of JavaConversions
0f02050 [Davies Liu] hotfix: Customized pickler does not work in cluster
2014-10-16 14:56:50 -07:00
..
mllib [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode 2014-10-16 14:56:50 -07:00
streaming [SPARK-2377] Python API for Streaming 2014-10-12 02:46:56 -07:00
__init__.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
accumulators.py [SPARK-3478] [PySpark] Profile the Python tasks 2014-09-30 18:24:57 -07:00
broadcast.py [SPARK-3430] [PySpark] [Doc] generate PySpark API docs using Sphinx 2014-09-16 12:51:58 -07:00
cloudpickle.py [SPARK-3679] [PySpark] pickle the exact globals of functions 2014-09-24 13:00:05 -07:00
conf.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
context.py [SPARK-3971] [MLLib] [PySpark] hotfix: Customized pickler should work in cluster mode 2014-10-16 14:56:50 -07:00
daemon.py [SPARK-3030] [PySpark] Reuse Python worker 2014-09-13 16:22:04 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-3167] Handle special driver configs in Windows 2014-08-26 22:52:16 -07:00
join.py [SPARK-546] Add full outer join to RDD and DStream. 2014-09-24 20:39:09 -07:00
rdd.py [Spark] RDD take() method: overestimate too much 2014-10-13 13:11:55 -07:00
rddsampler.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-2377] Python API for Streaming 2014-10-12 02:46:56 -07:00
shell.py [SPARK-3273][SPARK-3301]We should read the version information from the same place 2014-09-06 15:08:43 -07:00
shuffle.py [SPARK-3786] [PySpark] speedup tests 2014-10-06 14:07:53 -07:00
sql.py [SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and building warnings 2014-10-11 11:51:59 -07:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-3417] Use new-style classes in PySpark 2014-09-08 15:45:36 -07:00
tests.py [SPARK-3867][PySpark] ./python/run-tests failed when it run with Python 2.6 and unittest2 is not installed 2014-10-11 11:26:17 -07:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-3478] [PySpark] Profile the Python tasks 2014-09-30 18:24:57 -07:00