spark-instrumented-optimizer/python
Davies Liu 72f36ee571 [SPARK-3886] [PySpark] use AutoBatchedSerializer by default
Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into  [64k - 640k].

In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM.

Author: Davies Liu <davies.liu@gmail.com>

Closes #2740 from davies/batchsize and squashes the following commits:

52cdb88 [Davies Liu] update docs
185f2b9 [Davies Liu] use AutoBatchedSerializer by default
2014-10-10 14:14:05 -07:00
..
docs [SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs 2014-10-07 18:09:27 -07:00
lib [SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1 2014-07-29 19:02:06 -07:00
pyspark [SPARK-3886] [PySpark] use AutoBatchedSerializer by default 2014-10-10 14:14:05 -07:00
test_support [SPARK-3634] [PySpark] User's module should take precedence over system modules 2014-09-24 12:10:09 -07:00
.gitignore SPARK-1004. PySpark on YARN 2014-04-29 23:24:34 -07:00
run-tests [SPARK-3868][PySpark] Hard to recognize which module is tested from unit-tests.log 2014-10-09 13:46:26 -07:00