spark-instrumented-optimizer/python/pyspark
Davies Liu c246b95dd2 [SPARK-4841] fix zip with textFile()
UTF8Deserializer can not be used in BatchedSerializer, so always use PickleSerializer() when change batchSize in zip().

Also, if two RDD have the same batch size already, they did not need re-serialize any more.

Author: Davies Liu <davies@databricks.com>

Closes #3706 from davies/fix_4841 and squashes the following commits:

20ce3a3 [Davies Liu] fix bug in _reserialize()
e3ebf7c [Davies Liu] add comment
379d2c8 [Davies Liu] fix zip with textFile()
2014-12-15 22:58:26 -08:00
..
mllib [SPARK-4494][mllib] IDFModel.transform() add support for single vector 2014-12-15 13:44:15 -08:00
streaming [DOC][PySpark][Streaming] Fix docstring for sphinx 2014-11-19 14:23:18 -08:00
__init__.py [SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py 2014-11-13 10:24:54 -08:00
accumulators.py [SPARK-3478] [PySpark] Profile the Python tasks 2014-09-30 18:24:57 -07:00
broadcast.py [SPARK-4548] []SPARK-4517] improve performance of python broadcast 2014-11-24 17:17:03 -08:00
cloudpickle.py [SPARK-3679] [PySpark] pickle the exact globals of functions 2014-09-24 13:00:05 -07:00
conf.py [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
context.py [SPARK-4548] []SPARK-4517] improve performance of python broadcast 2014-11-24 17:17:03 -08:00
daemon.py [SPARK-4088] [PySpark] Python worker should exit after socket is closed by JVM 2014-10-25 01:20:39 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
heapq3.py [SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey() 2014-08-26 16:57:40 -07:00
java_gateway.py [SPARK-4415] [PySpark] JVM should exit after Python exit 2014-11-14 20:14:33 -08:00
join.py [SPARK-546] Add full outer join to RDD and DStream. 2014-09-24 20:39:09 -07:00
rdd.py [SPARK-4841] fix zip with textFile() 2014-12-15 22:58:26 -08:00
rddsampler.py [SPARK-4477] [PySpark] remove numpy from RDDSampler 2014-11-20 16:40:25 -08:00
resultiterable.py [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
serializers.py [SPARK-4841] fix zip with textFile() 2014-12-15 22:58:26 -08:00
shell.py [SPARK-3273][SPARK-3301]We should read the version information from the same place 2014-09-06 15:08:43 -07:00
shuffle.py [SPARK-4384] [PySpark] improve sort spilling 2014-11-19 15:45:37 -08:00
sql.py [SPARK-4578] fix asDict() with nested Row() 2014-11-24 16:41:23 -08:00
statcounter.py StatCounter on NumPy arrays [PYSPARK][SPARK-2012] 2014-08-01 22:33:25 -07:00
storagelevel.py [SPARK-3417] Use new-style classes in PySpark 2014-09-08 15:45:36 -07:00
tests.py [SPARK-4841] fix zip with textFile() 2014-12-15 22:58:26 -08:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
worker.py [SPARK-4548] []SPARK-4517] improve performance of python broadcast 2014-11-24 17:17:03 -08:00