spark-instrumented-optimizer

History

wangguangxin.cn 73183b3c8c [SPARK-11412][SQL] Support merge schema for ORC ## What changes were proposed in this pull request? Currently, ORC's `inferSchema` is implemented as randomly choosing one ORC file and reading its schema. This PR follows the behavior of Parquet, it implements merge schemas logic by reading all ORC files in parallel through a spark job. Users can enable merge schema by `spark.read.orc("xxx").option("mergeSchema", "true")` or by setting `spark.sql.orc.mergeSchema` to `true`, the prior one has higher priority. ## How was this patch tested? tested by UT OrcUtilsSuite.scala Closes #24043 from WangGuangxin/SPARK-11412. Lead-authored-by: wangguangxin.cn <wangguangxin.cn@gmail.com> Co-authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>		2019-06-29 17:08:31 -07:00
..
benchmarks	[SPARK-26584][SQL] Remove `spark.sql.orc.copyBatchToSpark` internal conf	2019-01-10 08:42:23 -08:00
compatibility/src/test/scala/org/apache/spark/sql/hive/execution	Revert [SPARK-19355][SPARK-25352]	2018-09-20 20:18:31 +08:00
src	[SPARK-11412][SQL] Support merge schema for ORC	2019-06-29 17:08:31 -07:00
pom.xml	[SPARK-27831][SQL][TEST] Move Hive test jars to maven dependency	2019-06-02 20:23:08 -07:00