spark-instrumented-optimizer

History

Stephen Haberman 680f42e6cd Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
..
resources	Changed locations for unit test logs.	2013-01-07 16:06:07 -08:00
scala/spark	Change defaultPartitioner to use upstream split size.	2013-02-10 02:27:03 -06:00

Stephen Haberman 680f42e6cd Change defaultPartitioner to use upstream split size.

Previously it used the SparkContext.defaultParallelism, which occassionally
ended up being a very bad guess. Looking at upstream RDDs seems to make
better use of the context.

Also sorted the upstream RDDs by partition size first, as if we have
a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely
we want the resulting RDD to be tiny-partitioned.

2013-02-10 02:27:03 -06:00

resources

Changed locations for unit test logs.

2013-01-07 16:06:07 -08:00

scala/spark

Change defaultPartitioner to use upstream split size.

2013-02-10 02:27:03 -06:00