spark-instrumented-optimizer

History

Davies Liu 72f36ee571 [SPARK-3886] [PySpark] use AutoBatchedSerializer by default Use AutoBatchedSerializer by default, which will choose the proper batch size based on size of serialized objects, let the size of serialized batch fall in into [64k - 640k]. In JVM, the serializer will also track the objects in batch to figure out duplicated objects, larger batch may cause OOM in JVM. Author: Davies Liu <davies.liu@gmail.com> Closes #2740 from davies/batchsize and squashes the following commits: 52cdb88 [Davies Liu] update docs 185f2b9 [Davies Liu] use AutoBatchedSerializer by default		2014-10-10 14:14:05 -07:00
..
docs	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	2014-10-07 18:09:27 -07:00
lib	[SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1	2014-07-29 19:02:06 -07:00
pyspark	[SPARK-3886] [PySpark] use AutoBatchedSerializer by default	2014-10-10 14:14:05 -07:00
test_support	[SPARK-3634] [PySpark] User's module should take precedence over system modules	2014-09-24 12:10:09 -07:00
.gitignore	SPARK-1004. PySpark on YARN	2014-04-29 23:24:34 -07:00
run-tests	[SPARK-3868][PySpark] Hard to recognize which module is tested from unit-tests.log	2014-10-09 13:46:26 -07:00