[SPARK-10184] [CORE] Optimization for bounds determination in RangePartitioner

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-10184

Change `cumWeight > target` to `cumWeight >= target` in `RangePartitioner.determineBounds` method to make the output partitions more balanced.

Author: ihainan <ihainan72@gmail.com>

Closes #8397 from ihainan/opt_for_rangepartitioner.
This commit is contained in:
ihainan 2015-08-30 08:26:14 +01:00 committed by Sean Owen
parent ca69fc8efd
commit 1bfd934782

View file

@ -291,7 +291,7 @@ private[spark] object RangePartitioner {
while ((i < numCandidates) && (j < partitions - 1)) {
val (key, weight) = ordered(i)
cumWeight += weight
if (cumWeight > target) {
if (cumWeight >= target) {
// Skip duplicate values.
if (previousBound.isEmpty || ordering.gt(key, previousBound.get)) {
bounds += key