spark-instrumented-optimizer/dev
Michael Armbrust cd1d4110cf [SPARK-6908] [SQL] Use isolated Hive client
This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client.  By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile.  This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future.

Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options:
 - a colon-separated list of jar files or directories for hive and hadoop.
 - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This
            option is only valid when using the execution version of Hive.
 - `maven` - download the correct version of hive on demand from maven.

By default, `builtin` is used for Hive 13.

This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores.  However, the full removal of the Shim is deferred until a later PR.

Remaining TODOs:
 - Remove the Hive Shims and inline code for Hive 13.
 - Several HiveCompatibility tests are not yet passing.
  - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer.  However, we currently only handle the common cases and not things like CTAS where the null format is specified.
  - `combine1` now leaks state about compression somehow, breaking all subsequent tests.  As such we currently add it to the blacklist
  - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore.  We are correctly propagating the information
  - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests.  It seems our `RESET` mechanism may not be as robust as it used to be?

Other required changes:
 -  `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline.  Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`.  The full parsing here is not yet complete as detailed above in the remaining TODOs.  Since the operator is Hive specific, it is moved to the hive package.
 - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated.

Author: Michael Armbrust <michael@databricks.com>

Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits:

258d000 [Michael Armbrust] really really correct path handling
e56fd4a [Michael Armbrust] getAbsolutePath
5a259f5 [Michael Armbrust] fix typos
81bb366 [Michael Armbrust] comments from vanzin
5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
4b5cd41 [Michael Armbrust] yin's comments
f5de7de [Michael Armbrust] cleanup
11e9c72 [Michael Armbrust] better coverage in versions suite
7e8f010 [Michael Armbrust] better error messages and jar handling
e7b3941 [Michael Armbrust] more permisive checking for function registration
da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
5fe5894 [Michael Armbrust] fix serialization suite
81711c4 [Michael Armbrust] Initial support for running without maven
1d8ae44 [Michael Armbrust] fix final tests?
1c50813 [Michael Armbrust] more comments
a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient
a6f5df1 [Michael Armbrust] style
ab07f7e [Michael Armbrust] WIP
4d8bf02 [Michael Armbrust] Remove hive 12 compilation
8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
2015-05-07 19:36:24 -07:00
..
audit-release [SQL] [Minor] Fix for SqlApp.scala 2015-04-13 18:23:35 -07:00
create-release [SPARK-4925] Publish Spark SQL hive-thriftserver maven artifact 2015-04-27 11:27:56 -07:00
tests [WIP][HOTFIX][SPARK-4123]: Fix bug in PR dependency (all deps. removed issue) 2015-04-13 22:31:44 -07:00
.gitignore [SPARK-6219] Reuse pep8.py 2015-04-18 16:46:28 -07:00
change-version-to-2.10.sh [SPARK-7087] [BUILD] Fix path issue change version script 2015-04-23 17:23:15 -04:00
change-version-to-2.11.sh [SPARK-7087] [BUILD] Fix path issue change version script 2015-04-23 17:23:15 -04:00
check-license [SPARK-6773][Tests]Fix RAT checks still passed issue when download rat jar failed 2015-04-10 20:02:35 +01:00
github_jira_sync.py SPARK-2596 HOTFIX: Deal with non-existent JIRAs. 2014-07-19 20:06:28 -07:00
lint-python [SPARK-6219] Reuse pep8.py 2015-04-18 16:46:28 -07:00
lint-scala [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
merge_spark_pr.py [SPARK-1684] [PROJECT INFRA] Merge script should standardize SPARK-XXX prefix 2015-04-21 18:08:29 -07:00
mima [minor] [build] Set java options when generating mima ignores. 2015-04-21 16:35:37 -07:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
run-tests [SPARK-6908] [SQL] Use isolated Hive client 2015-05-07 19:36:24 -07:00
run-tests-codes.sh [SPARK-5654] Integrate SparkR 2015-04-08 22:45:40 -07:00
run-tests-jenkins HOTFIX: Disable buggy dependency checker 2015-04-30 22:39:58 -07:00
scalastyle [SPARK-6765] Enable scalastyle on test code. 2015-04-13 09:29:04 -07:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.