spark-instrumented-optimizer/core
Sergey Serebryakov 77d046ec47 [SPARK-21782][CORE] Repartition creates skews when numPartitions is a power of 2
## Problem
When an RDD (particularly with a low item-per-partition ratio) is repartitioned to numPartitions = power of 2, the resulting partitions are very uneven-sized, due to using fixed seed to initialize PRNG, and using the PRNG only once. See details in https://issues.apache.org/jira/browse/SPARK-21782

## What changes were proposed in this pull request?
Instead of directly using `0, 1, 2,...` seeds to initialize `Random`, hash them with `scala.util.hashing.byteswap32()`.

## How was this patch tested?
`build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.rdd.RDDSuite test`

Author: Sergey Serebryakov <sserebryakov@tesla.com>

Closes #18990 from megaserg/repartition-skew.
2017-08-21 08:21:25 +01:00
..
src [SPARK-21782][CORE] Repartition creates skews when numPartitions is a power of 2 2017-08-21 08:21:25 +01:00
pom.xml [SPARK-21276][CORE] Update lz4-java to the latest (v1.4.0) 2017-08-09 17:31:52 +02:00