21db4336b0
When trying to coalesce a UnionRDD of two large FileScanRDDs (each with a few million partitions) into around 8k partitions the driver can stall for over an hour. Profiler shows that over 90% of the time is spent in TimSort which is invoked by `pickBin`. This patch replaces sorting with a more efficient `min` for the purpose of finding the least occupied PartitionGroup Closes #23986 from fitermay/SPARK-27070. Authored-by: fitermay <fiterman@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com> |
||
---|---|---|
.. | ||
CoalescedRDDBenchmark-results.txt | ||
KryoBenchmark-results.txt | ||
KryoSerializerBenchmark-results.txt | ||
XORShiftRandomBenchmark-results.txt |