The `HiveThriftServer2Test` relies on proper logging behavior to assert whether the Thrift server daemon process is started successfully. However, some other jar files listed in the classpath may potentially contain an unexpected Log4J configuration file which overrides the logging behavior.
This PR writes a temporary `log4j.properties` and prepend it to driver classpath before starting the testing Thrift server process to ensure proper logging behavior.
cc andrewor14 yhuai
Author: Cheng Lian <lian@databricks.com>
Closes#6493 from liancheng/override-log4j and squashes the following commits:
c489e0e [Cheng Lian] Fixes minor Scala styling issue
b46ef0d [Cheng Lian] Uses a temporary log4j.properties in HiveThriftServer2Test to ensure expected logging behavior
(cherry picked from commit 4782e13040)
Signed-off-by: Andrew Or <andrew@databricks.com>
When starting `HiveThriftServer2` via `startWithContext`, property `spark.sql.hive.version` isn't set. This causes Simba ODBC driver 1.0.8.1006 behaves differently and fails simple queries.
Hive2 JDBC driver works fine in this case. Also, when starting the server with `start-thriftserver.sh`, both Hive2 JDBC driver and Simba ODBC driver works fine.
Please refer to [SPARK-7950] [1] for details.
[1]: https://issues.apache.org/jira/browse/SPARK-7950
Author: Cheng Lian <lian@databricks.com>
Closes#6500 from liancheng/odbc-bugfix and squashes the following commits:
051e3a3 [Cheng Lian] Fixes import order
3a97376 [Cheng Lian] Sets spark.sql.hive.version in HiveThriftServer2.startWithContext()
(cherry picked from commit e7b6177557)
Signed-off-by: Yin Huai <yhuai@databricks.com>
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6476 from rxin/whitespace-catalyst and squashes the following commits:
650409d [Reynold Xin] Fixed tests.
51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module.
(cherry picked from commit 8da560d7de)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6477 from rxin/whitespace-sql-core and squashes the following commits:
ce6e369 [Reynold Xin] Fixed tests.
6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core.
(cherry picked from commit ff44c711ab)
Signed-off-by: Reynold Xin <rxin@databricks.com>
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6478 from rxin/whitespace-hive and squashes the following commits:
e01b0e0 [Reynold Xin] Fixed tests.
a3bba22 [Reynold Xin] [SPARK-7927] whitespace fixes for Hive and ThriftServer.
(cherry picked from commit ee6a0e12fb)
Signed-off-by: Reynold Xin <rxin@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7853
This fixes the problem introduced by my change in https://github.com/apache/spark/pull/6435, which causes that Hive Context fails to create in spark shell because of the class loader issue.
Author: Yin Huai <yhuai@databricks.com>
Closes#6459 from yhuai/SPARK-7853 and squashes the following commits:
37ad33e [Yin Huai] Do not use hiveQlTable at all.
47cdb6d [Yin Huai] Move hiveconf.set to the end of setConf.
005649b [Yin Huai] Update comment.
35d86f3 [Yin Huai] Access TTable directly to make sure Hive will not internally use any metastore utility functions.
3737766 [Yin Huai] Recursively find all jars.
(cherry picked from commit 572b62cafe)
Signed-off-by: Yin Huai <yhuai@databricks.com>
This PR has three changes:
1. Renaming the table of `ThriftServer` to `SQL`;
2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; and
3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`.
https://issues.apache.org/jira/browse/SPARK-7907
Author: Yin Huai <yhuai@databricks.com>
Closes#6448 from yhuai/JDBCServer and squashes the following commits:
eadcc3d [Yin Huai] Update test.
9168005 [Yin Huai] Use SQL as the tab name.
221831e [Yin Huai] Rename ThriftServer to JDBCServer.
(cherry picked from commit 3c1f1baaf0)
Signed-off-by: Yin Huai <yhuai@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7897
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6438 from viirya/jdbc_unsigned_bigint and squashes the following commits:
ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint.
(cherry picked from commit a1e092eae5)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark SQL should use context classloader to load SerDe classes.
yhuai helped updating the test case, and I fixed a bug in the original `CliSuite`: while testing the CLI tool with `runCliWithin`, we don't append `\n` to the last query, thus the last query is never executed.
Original PR description is pasted below.
----
```
bin/spark-sql --jars ./sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
```
Throws exception like
```
15/05/26 00:16:33 ERROR SparkSQLDriver: Failed in [CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe']
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727)
at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57)
```
Author: Cheng Hao <hao.cheng@intel.com>
Author: Cheng Lian <lian@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#6435 from liancheng/classLoader and squashes the following commits:
d4c4845 [Cheng Lian] Fixes CliSuite
75e80e2 [Yin Huai] Update the fix.
fd26533 [Cheng Hao] scalastyle
dd78775 [Cheng Hao] workaround for classloader of IsolatedClientLoader
(cherry picked from commit db3fd054f2)
Signed-off-by: Yin Huai <yhuai@databricks.com>
As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test suites locally pretty frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on `TestHive.reset` heavily) using various `withXxx` utility methods in `SQLTestUtils` to ask each test case to cleanup their own mess so that we can avoid calling `TestHive.reset`.
Author: Cheng Lian <lian@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#6353 from liancheng/workaround-spark-7684 and squashes the following commits:
26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll.
a423d48 [Cheng Lian] Fixes Scala style issue
dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround SPARK-7684
92a116d [Cheng Lian] Fixes minor styling issues
(cherry picked from commit b97ddff000)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#6318 from adrian-wang/dynpart and squashes the following commits:
ad73b61 [Daoyuan Wang] not use sqlTestUtils for try catch because dont have sqlcontext here
6c33b51 [Daoyuan Wang] fix according to liancheng
f0f8074 [Daoyuan Wang] some specific types as dynamic partition
(cherry picked from commit 8161562eab)
Signed-off-by: Yin Huai <yhuai@databricks.com>
This should also close#6243.
Author: Reynold Xin <rxin@databricks.com>
Closes#6431 from rxin/JavaTypeInference-guava and squashes the following commits:
e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
(cherry picked from commit 6fec1a9409)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Please refer to [SPARK-7847] [1] for details.
[1]: https://issues.apache.org/jira/browse/SPARK-7847
Author: Cheng Lian <lian@databricks.com>
Closes#6389 from liancheng/spark-7847 and squashes the following commits:
935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value
f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions
d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
(cherry picked from commit 15459db4f6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Two minor changes.
cc brkyvz
Author: Reynold Xin <rxin@databricks.com>
Closes#6428 from rxin/math-func-cleanup and squashes the following commits:
5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
(cherry picked from commit 3e7d7d6b3d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7697
The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6229 from viirya/unsignedint_as_long and squashes the following commits:
dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
(cherry picked from commit 4f98d7a7f1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.
Author: Cheolsoo Park <cheolsoop@netflix.com>
Closes#6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
(cherry picked from commit 6dd645870d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.
Author: Cheng Lian <lian@databricks.com>
Closes#6411 from liancheng/spark-7868 and squashes the following commits:
273ea36 [Cheng Lian] Ignores _temporary directories
(cherry picked from commit b463e6d618)
Signed-off-by: Yin Huai <yhuai@databricks.com>
In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows. However, we should be using the output schema instead, since our scan might return a subset of the relation's columns.
This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests:
> In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested.
Closes#5986.
Author: Josh Rosen <joshrosen@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Author: Cheng Lian <liancheng@users.noreply.github.com>
Closes#6400 from JoshRosen/SPARK-7858 and squashes the following commits:
e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732b2
9968fba [Cheng Lian] Tests the data type conversion code paths
(cherry picked from commit 0c33c7b4a6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#6350 from rxin/unused-logical-dsl and squashes the following commits:
90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.
(cherry picked from commit c9adcad81a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
When committing/aborting a write task issued in `InsertIntoHadoopFsRelation`, if an exception is thrown from `OutputWriter.close()`, the committing/aborting process will be interrupted, and leaves messy stuff behind (e.g., the `_temporary` directory created by `FileOutputCommitter`).
This PR makes these two process more robust by catching potential exceptions and falling back to normal task committment/abort.
Author: Cheng Lian <lian@databricks.com>
Closes#6378 from liancheng/spark-7838 and squashes the following commits:
f18253a [Cheng Lian] Makes task committing/aborting in InsertIntoHadoopFsRelation more robust
(cherry picked from commit 8af1bf10b7)
Signed-off-by: Cheng Lian <lian@databricks.com>
The "Database does not exist" error reported in SPARK-7684 was caused by `HiveContext.newTemporaryConfiguration()`, which always creates a new temporary metastore directory and returns a metastore configuration pointing that directory. This makes `TestHive.reset()` always replaces old temporary metastore with an empty new one.
Author: Cheng Lian <lian@databricks.com>
Closes#6359 from liancheng/spark-7684 and squashes the following commits:
95d2eb8 [Cheng Lian] Addresses @marmbrust's comment
042769d [Cheng Lian] Don't create new temp directory in HiveContext.newTemporaryConfiguration()
(cherry picked from commit bfeedc69a2)
Signed-off-by: Cheng Lian <lian@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7805
Because `sql/hive`'s tests depend on the test jar of `sql/core`, we do not need to store `SQLTestUtils` and `ParquetTest` in `src/main`. We should only add stuff that will be needed by `sql/console` or Python tests (for Python, we need it in `src/main`, right? davies).
Author: Yin Huai <yhuai@databricks.com>
Closes#6334 from yhuai/SPARK-7805 and squashes the following commits:
af6d0c9 [Yin Huai] mima
b86746a [Yin Huai] Move SQLTestUtils.scala and ParquetTest.scala to src/test.
(cherry picked from commit ed21476bc0)
Signed-off-by: Yin Huai <yhuai@databricks.com>
This one continues the work of https://github.com/apache/spark/pull/6216.
Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6366 from yhuai/insert and squashes the following commits:
3d717fb [Yin Huai] Use insertInto to handle the casue when table exists and Append is used for saveAsTable.
56d2540 [Yin Huai] Add PreWriteCheck to HiveContext's analyzer.
c636e35 [Yin Huai] Remove unnecessary empty lines.
cf83837 [Yin Huai] Move insertInto to write. Also, remove the partition columns from InsertIntoHadoopFsRelation.
0841a54 [Reynold Xin] Removed experimental tag for deprecated methods.
33ed8ef [Reynold Xin] [SPARK-7654][SQL] Move insertInto into reader/writer interface.
(cherry picked from commit 2b7e63585d)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6363 from marmbrus/windowErrors and squashes the following commits:
516b02d [Michael Armbrust] [SPARK-7834] [SQL] Better window error messages
(cherry picked from commit 3c1305107a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7270
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#5864 from viirya/dyn_partition_insert and squashes the following commits:
b5627df [Liang-Chi Hsieh] For comments.
3b21e4b [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into dyn_partition_insert
8a4352d [Liang-Chi Hsieh] Consider dynamic partition when inserting into hive table.
(cherry picked from commit 126d7235de)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Santiago M. Mola <santi@mola.io>
Closes#6327 from smola/feature/catalyst-dsl-set-ops and squashes the following commits:
11db778 [Santiago M. Mola] [SPARK-7724] [SQL] Support Intersect/Except in Catalyst DSL.
(cherry picked from commit e4aef91fe7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
https://issues.apache.org/jira/browse/SPARK-7758
When initializing `executionHive`, we only masks
`javax.jdo.option.ConnectionURL` to override metastore location. However,
other properties that relates to the actual Hive metastore data source are not
masked. For example, when using Spark SQL with a PostgreSQL backed Hive
metastore, `executionHive` actually tries to use settings read from
`hive-site.xml`, which talks about PostgreSQL, to connect to the temporary
Derby metastore, thus causes error.
To fix this, we need to mask all metastore data source properties.
Specifically, according to the code of [Hive `ObjectStore.getDataSourceProps()`
method] [1], all properties whose name mentions "jdo" and "datanucleus" must be
included.
[1]: https://github.com/apache/hive/blob/release-0.13.1/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L288
Have tested using postgre sql as metastore, it worked fine.
Author: WangTaoTheTonic <wangtao111@huawei.com>
Closes#6314 from WangTaoTheTonic/SPARK-7758 and squashes the following commits:
ca7ae7c [WangTaoTheTonic] add comments
86caf2c [WangTaoTheTonic] delete unused import
e4f0feb [WangTaoTheTonic] block more data source related property
92a81fa [WangTaoTheTonic] fix style check
e3e683d [WangTaoTheTonic] override more configs to avoid failuer connecting to postgre sql
(cherry picked from commit 31d5d463e7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6165 from marmbrus/wrongColumn and squashes the following commits:
4fad158 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into wrongColumn
aad7eab [Michael Armbrust] rxins comments
f1e8df1 [Michael Armbrust] [SPARK-6743][SQL] Fix empty projections of cached data
(cherry picked from commit 3b68cb0430)
Signed-off-by: Michael Armbrust <michael@databricks.com>
This Selenium test case has been flaky for a while and led to frequent Jenkins build failure. Let's disable it temporarily until we figure out a proper solution.
Author: Cheng Lian <lian@databricks.com>
Closes#6345 from liancheng/ignore-selenium-test and squashes the following commits:
09996fe [Cheng Lian] Ignores Thrift server UISeleniumSuite
(cherry picked from commit 4e5220c317)
Signed-off-by: Cheng Lian <lian@databricks.com>