[SPARK-10184] [CORE] Optimization for bounds determination in RangePartitioner
JIRA Issue: https://issues.apache.org/jira/browse/SPARK-10184 Change `cumWeight > target` to `cumWeight >= target` in `RangePartitioner.determineBounds` method to make the output partitions more balanced. Author: ihainan <ihainan72@gmail.com> Closes #8397 from ihainan/opt_for_rangepartitioner.
This commit is contained in:
parent
ca69fc8efd
commit
1bfd934782
|
@ -291,7 +291,7 @@ private[spark] object RangePartitioner {
|
|||
while ((i < numCandidates) && (j < partitions - 1)) {
|
||||
val (key, weight) = ordered(i)
|
||||
cumWeight += weight
|
||||
if (cumWeight > target) {
|
||||
if (cumWeight >= target) {
|
||||
// Skip duplicate values.
|
||||
if (previousBound.isEmpty || ordering.gt(key, previousBound.get)) {
|
||||
bounds += key
|
||||
|
|
Loading…
Reference in a new issue