ODIn/spark-instrumented-optimizer

History

Michael Armbrust cd1d4110cf [SPARK-6908] [SQL] Use isolated Hive client This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client. By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile. This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future. Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options: - a colon-separated list of jar files or directories for hive and hadoop. - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. - `maven` - download the correct version of hive on demand from maven. By default, `builtin` is used for Hive 13. This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores. However, the full removal of the Shim is deferred until a later PR. Remaining TODOs: - Remove the Hive Shims and inline code for Hive 13. - Several HiveCompatibility tests are not yet passing. - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer. However, we currently only handle the common cases and not things like CTAS where the null format is specified. - `combine1` now leaks state about compression somehow, breaking all subsequent tests. As such we currently add it to the blacklist - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore. We are correctly propagating the information - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests. It seems our `RESET` mechanism may not be as robust as it used to be? Other required changes: - `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline. Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`. The full parsing here is not yet complete as detailed above in the remaining TODOs. Since the operator is Hive specific, it is moved to the hive package. - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated. Author: Michael Armbrust <michael@databricks.com> Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits: 258d000 [Michael Armbrust] really really correct path handling e56fd4a [Michael Armbrust] getAbsolutePath 5a259f5 [Michael Armbrust] fix typos 81bb366 [Michael Armbrust] comments from vanzin 5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 4b5cd41 [Michael Armbrust] yin's comments f5de7de [Michael Armbrust] cleanup 11e9c72 [Michael Armbrust] better coverage in versions suite 7e8f010 [Michael Armbrust] better error messages and jar handling e7b3941 [Michael Armbrust] more permisive checking for function registration da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 5fe5894 [Michael Armbrust] fix serialization suite 81711c4 [Michael Armbrust] Initial support for running without maven 1d8ae44 [Michael Armbrust] fix final tests? 1c50813 [Michael Armbrust] more comments a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient a6f5df1 [Michael Armbrust] style ab07f7e [Michael Armbrust] WIP 4d8bf02 [Michael Armbrust] Remove hive 12 compilation 8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client		2015-05-07 19:36:24 -07:00
..
audit-release	[SQL] [Minor] Fix for SqlApp.scala	2015-04-13 18:23:35 -07:00
create-release	[SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact	2015-04-27 11:27:56 -07:00
tests	[WIP][HOTFIX][SPARK-4123]: Fix bug in PR dependency (all deps. removed issue)	2015-04-13 22:31:44 -07:00
.gitignore	[SPARK-6219] Reuse pep8.py	2015-04-18 16:46:28 -07:00
change-version-to-2.10.sh	[SPARK-7087] [BUILD] Fix path issue change version script	2015-04-23 17:23:15 -04:00
change-version-to-2.11.sh	[SPARK-7087] [BUILD] Fix path issue change version script	2015-04-23 17:23:15 -04:00
check-license	[SPARK-6773][Tests]Fix RAT checks still passed issue when download rat jar failed	2015-04-10 20:02:35 +01:00
github_jira_sync.py	SPARK-2596 HOTFIX: Deal with non-existent JIRAs.	2014-07-19 20:06:28 -07:00
lint-python	[SPARK-6219] Reuse pep8.py	2015-04-18 16:46:28 -07:00
lint-scala	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
merge_spark_pr.py	[SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix	2015-04-21 18:08:29 -07:00
mima	[minor] [build] Set java options when generating mima ignores.	2015-04-21 16:35:37 -07:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
run-tests	[SPARK-6908] [SQL] Use isolated Hive client	2015-05-07 19:36:24 -07:00
run-tests-codes.sh	[SPARK-5654] Integrate SparkR	2015-04-08 22:45:40 -07:00
run-tests-jenkins	HOTFIX: Disable buggy dependency checker	2015-04-30 22:39:58 -07:00
scalastyle	[SPARK-6765] Enable scalastyle on test code.	2015-04-13 09:29:04 -07:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.