spark-instrumented-optimizer/python
Davies Liu ca23c3b014 [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark
The batch size during external sort will grow up to max 10000, then shrink down to zero, causing infinite loop.
Given the assumption that the items usually have similar size, so we don't need to adjust the batch size after first spill.

cc JoshRosen rxin angelini

Author: Davies Liu <davies@databricks.com>

Closes #6714 from davies/batch_size and squashes the following commits:

b170dfb [Davies Liu] update test
b9be832 [Davies Liu] Merge branch 'batch_size' of github.com:davies/spark into batch_size
6ade745 [Davies Liu] update test
5c21777 [Davies Liu] Update shuffle.py
e746aec [Davies Liu] fix batch size during sort
2015-06-18 13:49:32 -07:00
..
docs [SPARK-7619] [PYTHON] fix docstring signature 2015-05-14 18:16:48 -07:00
lib [SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1 2014-07-29 19:02:06 -07:00
pyspark [SPARK-8202] [PYSPARK] fix infinite loop during external sort in PySpark 2015-06-18 13:49:32 -07:00
test_support [SPARK-8060] Improve DataFrame Python test coverage and documentation. 2015-06-03 00:23:42 -07:00
.gitignore [SPARK-3946] gitignore in /python includes wrong directory 2014-10-14 14:09:39 -07:00
run-tests [MINOR] Enable PySpark SQL readerwriter and window tests 2015-06-02 12:02:07 -07:00