Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4586 from adrian-wang/addjar and squashes the following commits:
efdd602 [Daoyuan Wang] move jar to another place
6c707e8 [Daoyuan Wang] restrict hive version for test
32c4fb8 [Daoyuan Wang] fix style and add a test
9957d87 [Daoyuan Wang] use sessionstate classloader in makeRDDforTable
0810e71 [Daoyuan Wang] remove variable substitution
1898309 [Daoyuan Wang] fix classnotfound
95a40da [Daoyuan Wang] support env argus in add jar, and set add jar ret to 0
In `leftsemijoin.q`, there is a data loading command for table `sales` already, but in `TestHive`, it also created the table `sales`, which causes duplicated records inserted into the `sales`.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4506 from chenghao-intel/df_table and squashes the following commits:
0be05f7 [Cheng Hao] Remove the table `sales` creating from TestHive
[SHOW PRINCIPALS role_name]
Lists all roles and users who belong to this role.
Only the admin role has privilege for this.
[SHOW COMPACTIONS]
It returns a list of all tables and partitions currently being compacted or scheduled for compaction when Hive transactions are being used.
[SHOW TRANSACTIONS]
It is for use by administrators when Hive transactions are being used. It returns a list of all currently open and aborted transactions in the system.
Author: DoingDone9 <799203320@qq.com>
Author: Zhongshuai Pei <799203320@qq.com>
Author: Xu Tingjun <xutingjun@huawei.com>
Closes#4902 from DoingDone9/SHOW_PRINCIPALS and squashes the following commits:
4add42f [Zhongshuai Pei] for test
311f806 [Zhongshuai Pei] for test
0c7550a [DoingDone9] Update HiveQl.scala
c8aeb1c [Xu Tingjun] aa
802261c [DoingDone9] Merge pull request #7 from apache/master
d00303b [DoingDone9] Merge pull request #6 from apache/master
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
This PR follow up PR #3907 & #3891 & #4356.
According to marmbrus liancheng 's comments, I try to use fs.globStatus to retrieve all FileStatus objects under path(s), and then do the filtering locally.
[1]. get pathPattern by path, and put it into pathPatternSet. (hdfs://cluster/user/demo/2016/08/12 -> hdfs://cluster/user/demo/*/*/*)
[2]. retrieve all FileStatus objects ,and cache them by undating existPathSet.
[3]. do the filtering locally
[4]. if we have new pathPattern,do 1,2 step again. (external table maybe have more than one partition pathPattern)
chenghao-intel jeanlyn
Author: lazymam500 <lazyman500@gmail.com>
Author: lazyman <lazyman500@gmail.com>
Closes#5059 from lazyman500/SPARK-5068 and squashes the following commits:
5bfcbfd [lazyman] move spark.sql.hive.verifyPartitionPath to SQLConf,fix scala style
e1d6386 [lazymam500] fix scala style
f23133f [lazymam500] bug fix
47e0023 [lazymam500] fix scala style,add config flag,break the chaining
04c443c [lazyman] SPARK-5068: fix bug when partition path doesn't exists #2
41f60ce [lazymam500] Merge pull request #1 from apache/master
Author: haiyang <huhaiyang@huawei.com>
Closes#4929 from haiyangsea/cte and squashes the following commits:
220b67d [haiyang] add golden files for cte test
d3c7681 [haiyang] Merge branch 'master' into cte-repair
0ba2070 [haiyang] modify code style
9ce6b58 [haiyang] fix conflict
ff74741 [haiyang] add comment for With plan
0d56af4 [haiyang] code indention
776a440 [haiyang] add comments for resolve relation strategy
2fccd7e [haiyang] add comments for resolve relation strategy
241bbe2 [haiyang] fix cte problem of view
e9e1237 [haiyang] fix test case problem
614182f [haiyang] add test cases for CTE feature
32e415b [haiyang] add comment
1cc8c15 [haiyang] support with
03f1097 [haiyang] support with
e960099 [haiyang] support with
9aaa874 [haiyang] support with
0566978 [haiyang] support with
a99ecd2 [haiyang] support with
c3fa4c2 [haiyang] support with
3b6077f [haiyang] support with
5f8abe3 [haiyang] support with
4572b05 [haiyang] support with
f801f54 [haiyang] support with
```SQL
select key, v from src lateral view stack(3, 1+1, 2+2, 3) d as v;
```
Will cause exception
```
java.lang.ClassNotFoundException: stack
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at org.apache.spark.sql.hive.HiveFunctionWrapper.createFunction(Shim13.scala:148)
at org.apache.spark.sql.hive.HiveGenericUdtf.function$lzycompute(hiveUdfs.scala:274)
at org.apache.spark.sql.hive.HiveGenericUdtf.function(hiveUdfs.scala:274)
at org.apache.spark.sql.hive.HiveGenericUdtf.outputInspector$lzycompute(hiveUdfs.scala:280)
at org.apache.spark.sql.hive.HiveGenericUdtf.outputInspector(hiveUdfs.scala:280)
at org.apache.spark.sql.hive.HiveGenericUdtf.outputDataTypes$lzycompute(hiveUdfs.scala:285)
at org.apache.spark.sql.hive.HiveGenericUdtf.outputDataTypes(hiveUdfs.scala:285)
at org.apache.spark.sql.hive.HiveGenericUdtf.makeOutput(hiveUdfs.scala:291)
at org.apache.spark.sql.catalyst.expressions.Generator.output(generators.scala:60)
at org.apache.spark.sql.catalyst.plans.logical.Generate$$anonfun$2.apply(basicOperators.scala:60)
at org.apache.spark.sql.catalyst.plans.logical.Generate$$anonfun$2.apply(basicOperators.scala:60)
at scala.Option.map(Option.scala:145)
at org.apache.spark.sql.catalyst.plans.logical.Generate.generatorOutput(basicOperators.scala:60)
at org.apache.spark.sql.catalyst.plans.logical.Generate.output(basicOperators.scala:70)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveChildren$1.apply(LogicalPlan.scala:117)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveChildren$1.apply(LogicalPlan.scala:117)
```
Author: Cheng Hao <hao.cheng@intel.com>
Closes#5444 from chenghao-intel/hive_udtf and squashes the following commits:
065a98c [Cheng Hao] fix bug of Hive UDTF in Lateral View (ClassNotFound)
Otherwise we end up rewriting predicates to be trivially equal (i.e. `a#1 = a#2` -> `a#3 = a#3`), at which point the query is no longer valid.
Author: Michael Armbrust <michael@databricks.com>
Closes#5458 from marmbrus/selfJoinParquet and squashes the following commits:
22df77c [Michael Armbrust] [SPARK-6851][SQL] Create new instance for each converted parquet relation
'(' and ')' are special characters used in Parquet schema for type annotation. When we run an aggregation query, we will obtain attribute name such as "MAX(a)".
If we directly store the generated DataFrame as Parquet file, it causes failure when reading and parsing the stored schema string.
Several methods can be adopted to solve this. This pr uses a simplest one to just replace attribute names before generating Parquet schema based on these attributes.
Another possible method might be modifying all aggregation expression names from "func(column)" to "func[column]".
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#5263 from viirya/parquet_aggregation_name and squashes the following commits:
2d70542 [Liang-Chi Hsieh] Address comment.
463dff4 [Liang-Chi Hsieh] Instead of replacing special chars, showing error message to user to suggest using Alias.
1de001d [Liang-Chi Hsieh] Replace special characters '(' and ')' of Parquet schema.
When union non-decimal types with decimals, we use the following rules:
- FIRST `intTypeToFixed`, then fixed union decimals with precision/scale p1/s2 and p2/s2 will be promoted to
DecimalType(max(p1, p2), max(s1, s2))
- FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE (this is the same as Hive,
but note that unlimited decimals are considered bigger than doubles in WidenTypes)
Author: guowei2 <guowei2@asiainfo.com>
Closes#4004 from guowei2/SPARK-5203 and squashes the following commits:
ff50f5f [guowei2] fix code style
11df1bf [guowei2] fix decimal union with double, double->Decimal(15,15)
0f345f9 [guowei2] fix structType merge with decimal
101ed4d [guowei2] fix build error after rebase
0b196e4 [guowei2] code style
fe2c2ca [guowei2] handle union decimal precision in 'DecimalPrecision'
421d840 [guowei2] fix union types for decimal precision
ef2c661 [guowei2] fix union with different decimal type
Just fix a typo.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#5352 from viirya/fix_a_typo and squashes the following commits:
303b2d2 [Liang-Chi Hsieh] Fix typo.
This builds on my earlier pull requests and turns on the explicit type checking in scalastyle.
Author: Reynold Xin <rxin@databricks.com>
Closes#5342 from rxin/SPARK-6428 and squashes the following commits:
7b531ab [Reynold Xin] import ordering
2d9a8a5 [Reynold Xin] jl
e668b1c [Reynold Xin] override
9b9e119 [Reynold Xin] Parenthesis.
82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
https://issues.apache.org/jira/browse/SPARK-6575
Author: Yin Huai <yhuai@databricks.com>
This patch had conflicts when merged, resolved by
Committer: Cheng Lian <lian@databricks.com>
Closes#5339 from yhuai/parquetRelationCache and squashes the following commits:
b0e1a42 [Yin Huai] Address comments.
83d9846 [Yin Huai] Remove unnecessary change.
c0dc7a4 [Yin Huai] Cache converted parquet relations.
NotImplementedError in scala 2.10 is a fatal exception, which is not very nice to throw when not actually fatal.
Author: Michael Armbrust <michael@databricks.com>
Closes#5315 from marmbrus/throwUnsupported and squashes the following commits:
c29e03b [Michael Armbrust] [SQL] Throw UnsupportedOperationException instead of NotImplementedError
052e05b [Michael Armbrust] [SQL] Throw UnsupportedOperationException instead of NotImplementedError
In order to do inbound checking and type conversion, we should use Literal.create() instead of constructor.
Author: Davies Liu <davies@databricks.com>
Closes#5320 from davies/literal and squashes the following commits:
1667604 [Davies Liu] fix style and add comment
5f8c0fd [Davies Liu] use Literal.create instread of constructor
1. Test JARs are built & published
1. log4j.resources is explicitly excluded. Without this, downstream test run logging depends on the order the JARs are listed/loaded
1. sql/hive pulls in spark-sql &...spark-catalyst for its test runs
1. The copied in test classes were rm'd, and a test edited to remove its now duplicate assert method
1. Spark streaming is now build with the same plugin/phase as the rest, but its shade plugin declaration is kept in (so different from the rest of the test plugins). Due to (#2), this means the test JAR no longer includes its log4j file.
Outstanding issues:
* should the JARs be shaded? `spark-streaming-test.jar` does, but given these are test jars for developers only, especially in the same spark source tree, it's hard to justify.
* `maven-jar-plugin` v 2.6 was explicitly selected; without this the apache-1.4 parent template JAR version (2.4) chosen.
* Are there any other resources to exclude?
Author: Steve Loughran <stevel@hortonworks.com>
Closes#5119 from steveloughran/stevel/patches/SPARK-6433-test-jars and squashes the following commits:
81ceb01 [Steve Loughran] SPARK-6433 add a clearer comment explaining what the plugin is doing & why
a6dca33 [Steve Loughran] SPARK-6433 : pull configuration section form archive plugin
c2b5f89 [Steve Loughran] SPARK-6433 omit "jar" goal from jar plugin
fdac51b [Steve Loughran] SPARK-6433 -002; indentation & delegate plugin version to parent
650f442 [Steve Loughran] SPARK-6433 patch 001: test JARs are built; sql/hive pulls in spark-sql & spark-catalyst for its test runs
Before it was possible for a query to flip back and forth from a resolved state, allowing resolution to propagate up before coercion had stabilized. The issue was that `ResolvedReferences` would run after `FunctionArgumentConversion`, but before `PropagateTypes` had run. This PR ensures we correctly `PropagateTypes` after any coercion has applied.
Author: Michael Armbrust <michael@databricks.com>
Closes#5278 from marmbrus/unionNull and squashes the following commits:
dc3581a [Michael Armbrust] [SPARK-5371][SQL] Propogate types after function conversion / before futher resolution
Consider a metastore Parquet table that
1. doesn't have schema evolution issue
2. has lots of data files and/or partitions
In this case, driver schema merging can be both slow and unnecessary. Would be good to have a configuration to let the use disable schema merging when converting such a metastore Parquet table.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5231)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5231 from liancheng/spark-6575 and squashes the following commits:
cd96159 [Cheng Lian] Adds configuration to disable schema merging while converting metastore Parquet tables
Also removes temporary workarounds made in #5183 and #5251.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5289)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5289 from liancheng/spark-6555 and squashes the following commits:
d0095ac [Cheng Lian] Removes unused imports
cfafeeb [Cheng Lian] Removes outdated comment
75a2746 [Cheng Lian] Overrides equals() and hashCode() for MetastoreRelation
JIRA: https://issues.apache.org/jira/browse/SPARK-6618
Author: Yin Huai <yhuai@databricks.com>
Closes#5281 from yhuai/lookupRelationLock and squashes the following commits:
591b4be [Yin Huai] A test?
b3a9625 [Yin Huai] Just protect client.
This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables.
This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side.
TODO
- [ ] Add tests
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5139 from liancheng/spark-6369 and squashes the following commits:
72eb628 [Cheng Lian] Fixes typo in javadoc
9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments
dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables
Now that we have `DataFrame`s it is possible to have multiple copies in a single query plan. As such, it needs to inherit from `MultiInstanceRelation` or self joins will break. I also add better debugging errors when our self join handling fails in case there are future bugs.
Author: Michael Armbrust <michael@databricks.com>
Closes#5251 from marmbrus/multiMetaStore and squashes the following commits:
4272f6d [Michael Armbrust] [SPARK-6595][SQL] MetastoreRelation should be MuliInstanceRelation
Author: Reynold Xin <rxin@databricks.com>
Closes#5226 from rxin/empty-df and squashes the following commits:
1306d88 [Reynold Xin] Proper fix.
e135bb9 [Reynold Xin] [SPARK-6564][SQL] SQLContext.emptyDataFrame should contain 0 rows, not 1 row.
If the tests in "sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala" are running before CachedTableSuite.scala, the test("Drop cached table") will failed. Because the table test is created in SQLQuerySuite.scala ,and this table not droped. So when running "drop cached table", table test already exists.
There is error info:
01:18:35.738 ERROR hive.ql.exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: AlreadyExistsException(message:Table test already exists)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:616)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)test”
And the test about "create table test" in "sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala,is:
test("SPARK-4825 save join to table") {
val testData = sparkContext.parallelize(1 to 10).map(i => TestData(i, i.toString)).toDF()
sql("CREATE TABLE test1 (key INT, value STRING)")
testData.insertInto("test1")
sql("CREATE TABLE test2 (key INT, value STRING)")
testData.insertInto("test2")
testData.insertInto("test2")
sql("CREATE TABLE test AS SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key")
checkAnswer(
table("test"),
sql("SELECT COUNT(a.value) FROM test1 a JOIN test2 b ON a.key = b.key").collect().toSeq)
}
Author: KaiXinXiaoLei <huleilei1@huawei.com>
Closes#5150 from KaiXinXiaoLei/testFailed and squashes the following commits:
7534b02 [KaiXinXiaoLei] The UT test of spark is failed.
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4930 from adrian-wang/testvs and squashes the following commits:
2ce590f [Daoyuan Wang] add explicit function types
b1d68bf [Daoyuan Wang] only substitute for parseSql
9c4a950 [Daoyuan Wang] add a comment explaining
18fb481 [Daoyuan Wang] enable variable substitute on test framework
Author: DoingDone9 <799203320@qq.com>
Closes#4973 from DoingDone9/sort_token and squashes the following commits:
855fa10 [DoingDone9] Update HiveQl.scala
c7080b3 [DoingDone9] Sort these tokens in alphabetic order to avoid further duplicate in HiveQl
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
In hive,the schema of partition may be difference from the table schema.When we use spark-sql to query the data of partition which schema is difference from the table schema,we will get the exceptions as the description of the [jira](https://issues.apache.org/jira/browse/SPARK-5498) .For example:
* We take a look of the schema for the partition and the table
```sql
DESCRIBE partition_test PARTITION (dt='1');
id int None
name string None
dt string None
# Partition Information
# col_name data_type comment
dt string None
```
```
DESCRIBE partition_test;
OK
id bigint None
name string None
dt string None
# Partition Information
# col_name data_type comment
dt string None
```
* run the sql
```sql
SELECT * FROM partition_test where dt='1';
```
we will get the cast exception `java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.MutableLong cannot be cast to org.apache.spark.sql.catalyst.expressions.MutableInt`
Author: jeanlyn <jeanlyn92@gmail.com>
Closes#4289 from jeanlyn/schema and squashes the following commits:
9c8da74 [jeanlyn] fix style
b41d6b9 [jeanlyn] fix compile errors
07d84b6 [jeanlyn] Merge branch 'master' into schema
535b0b6 [jeanlyn] reduce conflicts
d6c93c5 [jeanlyn] fix bug
1e8b30c [jeanlyn] fix code style
0549759 [jeanlyn] fix code style
c879aa1 [jeanlyn] clean the code
2a91a87 [jeanlyn] add more test case and clean the code
12d800d [jeanlyn] fix code style
63d170a [jeanlyn] fix compile problem
7470901 [jeanlyn] reduce conflicts
afc7da5 [jeanlyn] make getConvertedOI compatible between 0.12.0 and 0.13.1
b1527d5 [jeanlyn] fix type mismatch
10744ca [jeanlyn] Insert a space after the start of the comment
3b27af3 [jeanlyn] SPARK-5498:fix bug when query the data when partition schema does not match table schema
The `ParquetConversions` analysis rule generates a hash map, which maps from the original `MetastoreRelation` instances to the newly created `ParquetRelation2` instances. However, `MetastoreRelation.equals` doesn't compare output attributes. Thus, if a single metastore Parquet table appears multiple times in a query, only a single entry ends up in the hash map, and the conversion is not correctly performed.
Proper fix for this issue should be overriding `equals` and `hashCode` for MetastoreRelation. Unfortunately, this breaks more tests than expected. It's possible that these tests are ill-formed from the very beginning. As 1.3.1 release is approaching, we'd like to make the change more surgical to avoid potential regressions. The proposed fix here is to make both the metastore relations and their output attributes as keys in the hash map used in ParquetConversions.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5183)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#5183 from liancheng/spark-6450 and squashes the following commits:
3536780 [Cheng Lian] Fixes metastore Parquet table conversion
spark avoid old inteface of hive, then some udaf can not work like "org.apache.hadoop.hive.ql.udf.generic.GenericUDAFAverage"
Author: DoingDone9 <799203320@qq.com>
Closes#5131 from DoingDone9/udaf and squashes the following commits:
9de08d0 [DoingDone9] Update HiveUdfSuite.scala
49c62dc [DoingDone9] Update hiveUdfs.scala
98b134f [DoingDone9] Merge pull request #5 from apache/master
161cae3 [DoingDone9] Merge pull request #4 from apache/master
c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
cb1852d [DoingDone9] Merge pull request #2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again. However, with eager analysis in `DataFrame`s this can cause errors for queries such as:
```scala
val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()
```
As a result, in this PR we defer the elimination of subqueries until the optimization phase.
Author: Michael Armbrust <michael@databricks.com>
Closes#5160 from marmbrus/subqueriesInDfs and squashes the following commits:
a9bb262 [Michael Armbrust] Update Optimizer.scala
27d25bf [Michael Armbrust] fix hive tests
9137e03 [Michael Armbrust] add type
81cd597 [Michael Armbrust] Avoid eliminating subqueries until optimization
Author: Michael Armbrust <michael@databricks.com>
Closes#5155 from marmbrus/errorMessages and squashes the following commits:
b898188 [Michael Armbrust] Fix formatting of error messages.
Author: Reynold Xin <rxin@databricks.com>
Closes#5108 from rxin/hive-public-type and squashes the following commits:
a320328 [Reynold Xin] [SPARK-6428][SQL] Added explicit type for all public methods for Hive module.
This PR creates a trait `DataTypeParser` used to parse data types. This trait aims to be single place to provide the functionality of parsing data types' string representation. It is currently mixed in with `DDLParser` and `SqlParser`. It is also used to parse the data type for `DataFrame.cast` and to convert Hive metastore's data type string back to a `DataType`.
JIRA: https://issues.apache.org/jira/browse/SPARK-6250
Author: Yin Huai <yhuai@databricks.com>
Closes#5078 from yhuai/ddlKeywords and squashes the following commits:
0e66097 [Yin Huai] Special handle struct<>.
fea6012 [Yin Huai] Style.
c9733fb [Yin Huai] Create a trait to parse data types.
SELECT sum('a'), avg('a'), variance('a'), std('a') FROM src;
Should give output as
0.0 NULL NULL NULL
This fixes hive udaf_number_format.q
Author: Venkata Ramana G <ramana.gollamudihuawei.com>
Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
Closes#4466 from gvramana/sum_fix and squashes the following commits:
42e14d1 [Venkata Ramana Gollamudi] Added comments
39415c0 [Venkata Ramana Gollamudi] Handled the partitioned Sum expression scenario
df66515 [Venkata Ramana Gollamudi] code style fix
4be2606 [Venkata Ramana Gollamudi] Add udaf_number_format to whitelist and golden answer
330fd64 [Venkata Ramana Gollamudi] fix sum function for all null data
Use `Utils.createTempDir()` to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
Author: Sean Owen <sowen@cloudera.com>
Closes#5029 from srowen/SPARK-6338 and squashes the following commits:
27b740a [Sean Owen] Fix hive-thriftserver tests that don't expect an existing dir
4a212fa [Sean Owen] Standardize a bit more temp dir management
9004081 [Sean Owen] Revert some added recursive-delete calls
57609e4 [Sean Owen] Use Utils.createTempDir() to replace other temp file mechanisms used in some tests, to further ensure they are cleaned up, and simplify
We need to handle ambiguous `exprId`s that are produced by new aliases as well as those caused by leaf nodes (`MultiInstanceRelation`).
Attempting to fix this revealed a bug in `equals` for `Alias` as these objects were comparing equal even when the expression ids did not match. Additionally, `LocalRelation` did not correctly provide statistics, and some tests in `catalyst` and `hive` were not using the helper functions for comparing plans.
Based on #4991 by chenghao-intel
Author: Michael Armbrust <michael@databricks.com>
Closes#5062 from marmbrus/selfJoins and squashes the following commits:
8e9b84b [Michael Armbrust] check qualifier too
8038a36 [Michael Armbrust] handle aggs too
0b9c687 [Michael Armbrust] fix more tests
c3c574b [Michael Armbrust] revert change.
725f1ab [Michael Armbrust] add statistics
a925d08 [Michael Armbrust] check for conflicting attributes in join resolution
b022ef7 [Michael Armbrust] Handle project aliases.
d8caa40 [Michael Armbrust] test case: SPARK-6247
f9c67c2 [Michael Armbrust] Check for duplicate attributes in join resolution.
898af73 [Michael Armbrust] Fix Alias equality.
Now spark version is only support
```create table table_in_database_creation.test1 as select * from src limit 1;``` in HiveContext.
This patch is used to support
```create table `table_in_database_creation.test2` as select * from src limit 1;``` in HiveContext.
Author: watermen <qiyadong2010@gmail.com>
Author: q00251598 <qiyadong@huawei.com>
Closes#4427 from watermen/SPARK-5651 and squashes the following commits:
c5c8ed1 [watermen] add the generated golden files
1f0e42e [q00251598] add input64 in blacklist and add test suit
`ResolveUdtfsAlias` in `hiveUdfs` only considers the `HiveGenericUdtf` with multiple alias. When only single alias is used with `HiveGenericUdtf`, the alias is not working.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4692 from viirya/udft_alias and squashes the following commits:
8a3bae4 [Liang-Chi Hsieh] No need to test selected column from DataFrame since DataFrame API is updated.
160a379 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into udft_alias
e6531cc [Liang-Chi Hsieh] Selected column from DataFrame should not re-analyze logical plan.
a45cc2a [Liang-Chi Hsieh] Resolve UdtfsAlias when only single Alias is used.
---- comment;
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4500 from adrian-wang/semicolon and squashes the following commits:
70b8abb [Daoyuan Wang] use mkstring instead of reduce
2d49738 [Daoyuan Wang] remove outdated golden file
317346e [Daoyuan Wang] only skip comment with semicolon at end of line, to avoid golden file outdated
d3ae01e [Daoyuan Wang] fix error
a11602d [Daoyuan Wang] fix comment with semicolon at end
Still, we keep only a single HiveContext within ThriftServer, and we also create a object called `SQLSession` for isolating the different user states.
Developers can obtain/release a new user session via `openSession` and `closeSession`, and `SQLContext` and `HiveContext` will also provide a default session if no `openSession` called, for backward-compatibility.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4885 from chenghao-intel/multisessions_singlecontext and squashes the following commits:
1c47b2a [Cheng Hao] rename the tss => tlSession
815b27a [Cheng Hao] code style issue
57e3fa0 [Cheng Hao] openSession is not compatible between Hive0.12 & 0.13.1
4665b0d [Cheng Hao] thriftservice with single context
Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.
Author: Sean Owen <sowen@cloudera.com>
Closes#4950 from srowen/SPARK-6225 and squashes the following commits:
3080972 [Sean Owen] Ordered imports: Java, Scala, 3rd party, Spark
c67985b [Sean Owen] Resolve javac, scalac warnings of various types -- deprecations, Scala lang, unchecked cast, etc.
Option 1 of 2: Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
Author: Sean Owen <sowen@cloudera.com>
Closes#4912 from srowen/SPARK-6182.1 and squashes the following commits:
eff60de [Sean Owen] Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
- Various Fixes to docs
- Make data source traits actually interfaces
Based on #4862 but with fixed conflicts.
Author: Reynold Xin <rxin@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#4868 from marmbrus/pr/4862 and squashes the following commits:
fe091ea [Michael Armbrust] Merge remote-tracking branch 'origin/master' into pr/4862
0208497 [Reynold Xin] Test fixes.
34e0a28 [Reynold Xin] [SPARK-5310][SQL] Various fixes to Spark SQL docs.
This PR contains the following changes:
1. Add a new method, `DataType.equalsIgnoreCompatibleNullability`, which is the middle ground between DataType's equality check and `DataType.equalsIgnoreNullability`. For two data types `from` and `to`, it does `equalsIgnoreNullability` as well as if the nullability of `from` is compatible with that of `to`. For example, the nullability of `ArrayType(IntegerType, containsNull = false)` is compatible with that of `ArrayType(IntegerType, containsNull = true)` (for an array without null values, we can always say it may contain null values). However, the nullability of `ArrayType(IntegerType, containsNull = true)` is incompatible with that of `ArrayType(IntegerType, containsNull = false)` (for an array that may have null values, we cannot say it does not have null values).
2. For the `resolved` field of `InsertIntoTable`, use `equalsIgnoreCompatibleNullability` to replace the equality check of the data types.
3. For our data source write path, when appending data, we always use the schema of existing table to write the data. This is important for parquet, since nullability direct impacts the way to encode/decode values. If we do not do this, we may see corrupted values when reading values from a set of parquet files generated with different nullability settings.
4. When generating a new parquet table, we always set nullable/containsNull/valueContainsNull to true. So, we will not face situations that we cannot append data because containsNull/valueContainsNull in an Array/Map column of the existing table has already been set to `false`. This change makes the whole data pipeline more robust.
5. Update the equality check of JSON relation. Since JSON does not really cares nullability, `equalsIgnoreNullability` seems a better choice to compare schemata from to JSON tables.
JIRA: https://issues.apache.org/jira/browse/SPARK-5950
Thanks viirya for the initial work in #4729.
cc marmbrus liancheng
Author: Yin Huai <yhuai@databricks.com>
Closes#4826 from yhuai/insertNullabilityCheck and squashes the following commits:
3b61a04 [Yin Huai] Revert change on equals.
80e487e [Yin Huai] asNullable in UDT.
587d88b [Yin Huai] Make methods private.
0cb7ea2 [Yin Huai] marmbrus's comments.
3cec464 [Yin Huai] Cheng's comments.
486ed08 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
d3747d1 [Yin Huai] Remove unnecessary change.
8360817 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
8a3f237 [Yin Huai] Use equalsIgnoreNullability instead of equality check.
0eb5578 [Yin Huai] Fix tests.
f6ed813 [Yin Huai] Update old parquet path.
e4f397c [Yin Huai] Unit tests.
b2c06f8 [Yin Huai] Ignore nullability in JSON relation's equality check.
8bd008b [Yin Huai] nullable, containsNull, and valueContainsNull will be always true for parquet data.
bf50d73 [Yin Huai] When appending data, we use the schema of the existing table instead of the schema of the new data.
0a703e7 [Yin Huai] Test failed again since we cannot read correct content.
9a26611 [Yin Huai] Make InsertIntoTable happy.
8f19fe5 [Yin Huai] equalsIgnoreCompatibleNullability
4ec17fd [Yin Huai] Failed test.
Author: Michael Armbrust <michael@databricks.com>
Closes#4855 from marmbrus/explodeBug and squashes the following commits:
a712249 [Michael Armbrust] [SPARK-6114][SQL] Avoid metastore conversions before plan is resolved
HiveQL expression like `select count(1) from src tablesample(1 percent);` means take 1% sample to select. But it means 100% in the current version of the Spark.
Author: q00251598 <qiyadong@huawei.com>
Closes#4789 from watermen/SPARK-6040 and squashes the following commits:
2453ebe [q00251598] check and adjust the fraction.
When run ```select * from nzhang_part where hr = 'file,';```, it throws exception ```java.lang.IllegalArgumentException: Can not create a Path from an empty string```
. Because the path of hdfs contains comma, and FileInputFormat.setInputPaths will split path by comma.
### SQL
```
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
create table nzhang_part like srcpart;
insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select key, value, hr from srcpart where ds='2008-04-08';
insert overwrite table nzhang_part partition (ds='2010-08-15', hr=11) select key, value from srcpart where ds='2008-04-08';
insert overwrite table nzhang_part partition (ds='2010-08-15', hr)
select * from (
select key, value, hr from srcpart where ds='2008-04-08'
union all
select '1' as key, '1' as value, 'file,' as hr from src limit 1) s;
select * from nzhang_part where hr = 'file,';
```
### Error Log
```
15/02/10 14:33:16 ERROR SparkSQLDriver: Failed in [select * from nzhang_part where hr = 'file,']
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.<init>(Path.java:135)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:251)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:172)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)
Author: q00251598 <qiyadong@huawei.com>
Closes#4532 from watermen/SPARK-5741 and squashes the following commits:
9758ab1 [q00251598] fix bug
1db1a1c [q00251598] use setInputPaths(Job job, Path... inputPaths)
b788a72 [q00251598] change FileInputFormat.setInputPaths to jobConf.set and add test suite
JIRA: https://issues.apache.org/jira/browse/SPARK-6073
liancheng
Author: Yin Huai <yhuai@databricks.com>
Closes#4824 from yhuai/refreshCache and squashes the following commits:
b9542ef [Yin Huai] Refresh metadata cache in the Catalog in CreateMetastoreDataSourceAsSelect.
JIRA: https://issues.apache.org/jira/browse/SPARK-6024
Author: Yin Huai <yhuai@databricks.com>
Closes#4795 from yhuai/wideSchema and squashes the following commits:
4882e6f [Yin Huai] Address comments.
73e71b4 [Yin Huai] Address comments.
143927a [Yin Huai] Simplify code.
cc1d472 [Yin Huai] Make the schema wider.
12bacae [Yin Huai] If the JSON string of a schema is too large, split it before storing it in metastore.
e9b4f70 [Yin Huai] Failed test.
Please see JIRA (https://issues.apache.org/jira/browse/SPARK-6016) for details of the bug.
Author: Yin Huai <yhuai@databricks.com>
Closes#4775 from yhuai/parquetFooterCache and squashes the following commits:
78787b1 [Yin Huai] Remove footerCache in FilteringParquetRowInputFormat.
dff6fba [Yin Huai] Failed unit test.
https://issues.apache.org/jira/browse/SPARK-5286
Author: Yin Huai <yhuai@databricks.com>
Closes#4755 from yhuai/SPARK-5286-throwable and squashes the following commits:
4c0c450 [Yin Huai] Catch Throwable instead of Exception.
Author: Michael Armbrust <michael@databricks.com>
Closes#4746 from marmbrus/hiveLock and squashes the following commits:
8b871cf [Michael Armbrust] [SPARK-5952][SQL] Lock when using hive metastore client
Also added test cases for checking the serializability of HiveContext and SQLContext.
Author: Reynold Xin <rxin@databricks.com>
Closes#4628 from rxin/SPARK-5840 and squashes the following commits:
ecb3bcd [Reynold Xin] test cases and reviews.
55eb822 [Reynold Xin] [SPARK-5840][SQL] HiveContext cannot be serialized due to tuple extraction.
Although we've migrated to the DataFrame API, lots of code still uses `rdd` or `srdd` as local variable names. This PR tries to address these naming inconsistencies and some other minor DataFrame related style issues.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4670)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4670 from liancheng/df-cleanup and squashes the following commits:
3e14448 [Cheng Lian] Cleans up DataFrame variable names and toDF() calls
JIRA: https://issues.apache.org/jira/browse/SPARK-5723
Author: Yin Huai <yhuai@databricks.com>
This patch had conflicts when merged, resolved by
Committer: Michael Armbrust <michael@databricks.com>
Closes#4639 from yhuai/defaultCTASFileFormat and squashes the following commits:
a568137 [Yin Huai] Merge remote-tracking branch 'upstream/master' into defaultCTASFileFormat
ad2b07d [Yin Huai] Update tests and error messages.
8af5b2a [Yin Huai] Update conf key and unit test.
5a67903 [Yin Huai] Use data source write path for Hive's CTAS statements when no storage format/handler is specified.
https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause.
Author: Yin Huai <yhuai@databricks.com>
Closes#4663 from yhuai/projectResolved and squashes the following commits:
472f7b6 [Yin Huai] If a logical.Project has any AggregateExpression or Generator, it's resolved field should be false.
The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE TABLE test (a int) STORED AS PARQUET`), Hive will create an empty dir for us, which cause our data source `ParquetRelation2` fail to get the schema of the table. See JIRA for the case to reproduce the bug and the exception.
This PR is based on #4562 from chenghao-intel.
JIRA: https://issues.apache.org/jira/browse/SPARK-5852
Author: Yin Huai <yhuai@databricks.com>
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4655 from yhuai/CTASParquet and squashes the following commits:
b8b3450 [Yin Huai] Update tests.
2ac94f7 [Yin Huai] Update tests.
3db3d20 [Yin Huai] Minor update.
d7e2308 [Yin Huai] Revert changes in HiveMetastoreCatalog.scala.
36978d1 [Cheng Hao] Update the code as feedback
a04930b [Cheng Hao] fix bug of scan an empty parquet based table
442ffe0 [Cheng Hao] passdown the schema for Parquet File in HiveContext
In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)
https://github.com/apache/hive/blob/branch-0.13/data/scripts/q_test_init.sql
And in the reflect.q, test failed for expression `reflect("java.lang.Integer", "valueOf", key, 16)`, which expect the argument `key` as STRING not INT.
This PR doesn't aim to change the `src` schema, we can do that after 1.3 released, however, we probably need to re-generate all the golden files.
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4584 from chenghao-intel/reflect and squashes the following commits:
e5bdc3a [Cheng Hao] Move the test case reflect into blacklist
184abfd [Cheng Hao] revert the change to table src1
d9bcf92 [Cheng Hao] Update the HiveContext Unittest
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4649 from viirya/use_checkpath and squashes the following commits:
0f9a1a1 [Liang-Chi Hsieh] Use same function to check path parameter.
Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore Parquet tables. Since the transformUp operation is recursive, it should be better to only perform it once.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#4651 from viirya/parquet_atonce and squashes the following commits:
c1ed29d [Liang-Chi Hsieh] Fix bug.
e0f919b [Liang-Chi Hsieh] Only transformUp the given plan once.
Added a bunch of tags.
Also changed parquetFile to take varargs rather than a string followed by varargs.
Author: Reynold Xin <rxin@databricks.com>
Closes#4636 from rxin/df-doc and squashes the following commits:
651f80c [Reynold Xin] Fixed parquetFile in PySpark.
8dc3024 [Reynold Xin] [SQL] Various DataFrame doc changes.
This PR adds a `ShowTablesCommand` to support `SHOW TABLES [IN databaseName]` SQL command. The result of `SHOW TABLE` has two columns, `tableName` and `isTemporary`. For temporary tables, the value of `isTemporary` column will be `false`.
JIRA: https://issues.apache.org/jira/browse/SPARK-4865
Author: Yin Huai <yhuai@databricks.com>
Closes#4618 from yhuai/showTablesCommand and squashes the following commits:
0c09791 [Yin Huai] Use ShowTablesCommand.
85ee76d [Yin Huai] Since SHOW TABLES is not a Hive native command any more and we will not see "OK" (originally generated by Hive's driver), use SHOW DATABASES in the test.
94bacac [Yin Huai] Add SHOW TABLES to the list of noExplainCommands.
d71ed09 [Yin Huai] Fix test.
a4a6ec3 [Yin Huai] Add SHOW TABLE command.
JIRA: https://issues.apache.org/jira/browse/SPARK-5839
Author: Yin Huai <yhuai@databricks.com>
Closes#4626 from yhuai/SPARK-5839 and squashes the following commits:
f779d85 [Yin Huai] Use subqeury to wrap replaced ParquetRelation.
2695f13 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-5839
f1ba6ca [Yin Huai] Address comment.
2c7fa08 [Yin Huai] Use Subqueries to wrap a data source table.
JIRA: https://issues.apache.org/jira/browse/SPARK-5746
liancheng marmbrus
Author: Yin Huai <yhuai@databricks.com>
Closes#4617 from yhuai/insertOverwrite and squashes the following commits:
8e3019d [Yin Huai] Fix compilation error.
499e8e7 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
e76e85a [Yin Huai] Address comments.
ac31b3c [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
f30bdad [Yin Huai] Use toDF.
99da57e [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertOverwrite
6b7545c [Yin Huai] Add a pre write check to the data source API.
a88c516 [Yin Huai] DDLParser will take a parsering function to take care CTAS statements.
Lifts `HiveMetastoreCatalog.refreshTable` to `Catalog`. Adds `RefreshTable` command to refresh (possibly cached) metadata in external data sources tables.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4624)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4624 from liancheng/refresh-table and squashes the following commits:
8d1aa4c [Cheng Lian] Adds REFRESH TABLE command
Author: Michael Armbrust <michael@databricks.com>
Closes#4587 from marmbrus/position and squashes the following commits:
0810052 [Michael Armbrust] fix tests
395c019 [Michael Armbrust] Merge remote-tracking branch 'marmbrus/position' into position
e155dce [Michael Armbrust] more errors
f3efa51 [Michael Armbrust] Update AnalysisException.scala
d45ff60 [Michael Armbrust] [SQL] Initial support for reporting location of error in sql string
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4609 from adrian-wang/ctas and squashes the following commits:
0a75d5a [Daoyuan Wang] reorder import
93d1863 [Daoyuan Wang] add null format in ctas and set default col comment to null
This PR migrates the Parquet data source to the new data source write support API. Now users can also overwriting and appending to existing tables. Notice that inserting into partitioned tables is not supported yet.
When Parquet data source is enabled, insertion to Hive Metastore Parquet tables is also fullfilled by the Parquet data source. This is done by the newly introduced `HiveMetastoreCatalog.ParquetConversions` rule, which is a "proper" implementation of the original hacky `HiveStrategies.ParquetConversion`. The latter is still preserved, and can be removed together with the old Parquet support in the future.
TODO:
- [x] Update outdated comments in `newParquet.scala`.
<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4563)
<!-- Reviewable:end -->
Author: Cheng Lian <lian@databricks.com>
Closes#4563 from liancheng/parquet-refining and squashes the following commits:
fa98d27 [Cheng Lian] Fixes test cases which should disable off Parquet data source
2476e82 [Cheng Lian] Fixes compilation error introduced during rebasing
a83d290 [Cheng Lian] Passes Hive Metastore partitioning information to ParquetRelation2
Author: Yin Huai <yhuai@databricks.com>
Closes#4542 from yhuai/moveSaveMode and squashes the following commits:
65a4425 [Yin Huai] Move SaveMode to sql package.
Author: Michael Armbrust <michael@databricks.com>
Author: wangfei <wangfei1@huawei.com>
Closes#4558 from marmbrus/errorMessages and squashes the following commits:
5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
fa38881 [wangfei] fix for grouping__id
f279a71 [wangfei] make right references for ScriptTransformation
d29fbde [Michael Armbrust] extra case
1a797b4 [Michael Armbrust] comments
d4e9015 [Michael Armbrust] add comment
af9e668 [Michael Armbrust] no braces
34eb3a4 [Michael Armbrust] more work
6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
This PR fixed the resolving problem described in https://issues.apache.org/jira/browse/SPARK-3688
```
CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;
```
Author: tianyi <tianyi.asiainfo@gmail.com>
Closes#4524 from tianyi/SPARK-3688 and squashes the following commits:
237a256 [tianyi] resolve a name with table.column pattern first.
Also I fix a bunch of bad output in test cases.
Author: Michael Armbrust <michael@databricks.com>
Closes#4520 from marmbrus/selfJoin and squashes the following commits:
4f4a85c [Michael Armbrust] comments
49c8e26 [Michael Armbrust] fix tests
6fc38de [Michael Armbrust] fix style
55d64b3 [Michael Armbrust] fix dataframe selfjoins
Author: Cheng Hao <hao.cheng@intel.com>
Closes#4496 from chenghao-intel/df_explain and squashes the following commits:
552aa58 [Cheng Hao] Add explain support for DF
Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
Author: Davies Liu <davies@databricks.com>
Closes#4498 from davies/create and squashes the following commits:
08469c1 [Davies Liu] remove Scala/Java API for now
c80a7a9 [Davies Liu] fix hive test
d1bd8f2 [Davies Liu] cleanup applySchema
9526e97 [Davies Liu] createDataFrame from RDD with columns
Also start from the bottom so we show the first error instead of the top error.
Author: Michael Armbrust <michael@databricks.com>
Closes#4439 from marmbrus/analysisException and squashes the following commits:
45862a0 [Michael Armbrust] fix hive test
a773bba [Michael Armbrust] Merge remote-tracking branch 'origin/master' into analysisException
f88079f [Michael Armbrust] update more cases
fede90a [Michael Armbrust] newline
fbf4bc3 [Michael Armbrust] move to sql
6235db4 [Michael Armbrust] [SQL] Add an exception for analysis errors.
Users will not need to put `Options()` in a CREATE TABLE statement when there is not option provided.
Author: Yin Huai <yhuai@databricks.com>
Closes#4515 from yhuai/makeOptionsOptional and squashes the following commits:
1a898d3 [Yin Huai] Make options optional.
show current roles
Author: OopsOutOfMemory <victorshengli@126.com>
Closes#4471 from OopsOutOfMemory/show_current_role and squashes the following commits:
1c6b210 [OopsOutOfMemory] add show current roles
flowing sql get URISyntaxException:
```
create table sc as select *
from (select '2011-01-11', '2011-01-11+14:18:26' from src tablesample (1 rows)
union all
select '2011-01-11', '2011-01-11+15:18:26' from src tablesample (1 rows)
union all
select '2011-01-11', '2011-01-11+16:18:26' from src tablesample (1 rows) ) s;
create table sc_part (key string) partitioned by (ts string) stored as rcfile;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table sc_part partition(ts) select * from sc;
```
java.net.URISyntaxException: Relative path in absolute URI: ts=2011-01-11+15:18:26
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.hadoop.fs.Path.<init>(Path.java:94)
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWriterContainer$$newWriter$1(hiveWriterContainers.scala:230)
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:243)
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContainers.scala:243)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:243)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:113)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1$1.apply(InsertIntoHiveTable.scala:105)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:105)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:87)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ts=2011-01-11+15:18:26
at java.net.URI.checkPath(URI.java:1804)
at java.net.URI.<init>(URI.java:752)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
Author: wangfei <wangfei1@huawei.com>
Author: Fei Wang <wangfei1@huawei.com>
Closes#4368 from scwf/SPARK-5592 and squashes the following commits:
aa55ef4 [Fei Wang] comments addressed
f8f8bb1 [wangfei] added test case
f24624f [wangfei] Merge branch 'master' of https://github.com/apache/spark into SPARK-5592
9998177 [wangfei] added test case
ea81daf [wangfei] fix URISyntaxException
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#4502 from adrian-wang/utf8 and squashes the following commits:
4d7b0ee [Daoyuan Wang] remove useless import
606f981 [Daoyuan Wang] support TOK_CHARSETLITERAL in HiveQl
make hivecontext support "alter ... unset tblproperties("key")"
like :
alter view viewName unset tblproperties("k")
alter table tableName unset tblproperties("k")
Author: DoingDone9 <799203320@qq.com>
Closes#4424 from DoingDone9/unset and squashes the following commits:
6dd8bee [DoingDone9] support "alter ... unset tblproperties("key")"
I added an unnecessary line of code in 13531dd97c.
My bad. Let's delete it.
Author: Yin Huai <yhuai@databricks.com>
Closes#4482 from yhuai/unnecessaryCode and squashes the following commits:
3645af0 [Yin Huai] Code cleanup.