spark-instrumented-optimizer

History

Wenchen Fan 4a9c9d8f9a [SPARK-25159][SQL] json schema inference should only trigger one job ## What changes were proposed in this pull request? This fixes a perf regression caused by https://github.com/apache/spark/pull/21376 . We should not use `RDD#toLocalIterator`, which triggers one Spark job per RDD partition. This is very bad for RDDs with a lot of small partitions. To fix it, this PR introduces a way to access SQLConf in the scheduler event loop thread, so that we don't need to use `RDD#toLocalIterator` anymore in `JsonInferSchema`. ## How was this patch tested? a new test Closes #22152 from cloud-fan/conf. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Xiao Li <gatorsmile@gmail.com>		2018-08-21 22:21:08 -07:00
..
benchmarks	[SPARK-24549][SQL] Support Decimal type push down to the parquet data sources	2018-07-16 15:44:51 +08:00
src	[SPARK-25159][SQL] json schema inference should only trigger one job	2018-08-21 22:21:08 -07:00
pom.xml	[SPARK-25019][BUILD] Fix orc dependency to use the same exclusion rules	2018-08-06 12:00:39 -07:00