case cs CombineSum(expr) =>
val calcType = expr.dataType
expr.dataType match {
case DecimalType.Fixed(_, _) =>
DecimalType.Unlimited
case _ =>
expr.dataType
}
calcType is always expr.dataType. credits are all belong to IntelliJ
Author: navis.ryu <navis@apache.org>
Closes#6736 from navis/SPARK-8285 and squashes the following commits:
20382c1 [navis.ryu] [SPARK-8285] [SQL] CombineSum should be calculated as unlimited decimal first
(cherry picked from commit 6a47114bc2)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This is a follow-up patch to #6577 to replace columnEnclosing to quoteIdentifier.
I also did some minor cleanup to the JdbcDialect file.
Author: Reynold Xin <rxin@databricks.com>
Closes#6689 from rxin/jdbc-quote and squashes the following commits:
bad365f [Reynold Xin] Fixed test compilation...
e39e14e [Reynold Xin] Fixed compilation.
db9a8e0 [Reynold Xin] [SPARK-8004][SQL] Quote identifier in JDBC data source.
(cherry picked from commit d6d601a07b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-8141
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6687 from viirya/reuse_partition_column_types and squashes the following commits:
dab0688 [Liang-Chi Hsieh] Reuse partitionColumnTypes.
(cherry picked from commit 26d07f1ece)
Signed-off-by: Cheng Lian <lian@databricks.com>
As described in SPARK-8079, when writing a DataFrame to a `HadoopFsRelation`, if `HadoopFsRelation.prepareForWriteJob` throws exception, an unexpected NPE will be thrown during job abortion. (This issue doesn't bring much damage since the job is failing anyway.)
This PR makes the job/task abortion logic in `InsertIntoHadoopFsRelation` more robust to avoid such confusing exceptions.
Author: Cheng Lian <lian@databricks.com>
Closes#6612 from liancheng/spark-8079 and squashes the following commits:
87cd81e [Cheng Lian] Addresses @rxin's comment
1864c75 [Cheng Lian] Addresses review comments
9e6dbb3 [Cheng Lian] Makes InsertIntoHadoopFsRelation job/task abortion more robust
(cherry picked from commit 16fc49617e)
Signed-off-by: Cheng Lian <lian@databricks.com>
cc davies sun-rui
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#6620 from shivaram/sparkr-read-schema and squashes the following commits:
16a6726 [Shivaram Venkataraman] Fix loadDF to pass schema Also add a unit test
a229877 [Shivaram Venkataraman] Use wrapper function to DataFrameReader
ee70ba8 [Shivaram Venkataraman] Support user-specified schema in read.df
(cherry picked from commit 12f5eaeee1)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests. Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:
514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
6bf7c0e [Mike Dusenberry] Minor code formatting change.
e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests. Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.
(cherry picked from commit df7da07a86)
Signed-off-by: Reynold Xin <rxin@databricks.com>
This includes the following commits:
original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492
---
Original patch #6441
Branch-1.3 patch #6602
Author: Andrew Or <andrew@databricks.com>
Closes#6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:
4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
Author: Reynold Xin <rxin@databricks.com>
Closes#6608 from rxin/parquet-analysis and squashes the following commits:
b5dc8e2 [Reynold Xin] Code review feedback.
5617cf6 [Reynold Xin] [SPARK-8074] Parquet should throw AnalysisException during setup for data type/name related failures.
(cherry picked from commit 939e4f3d8d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The current code references the schema of the DataFrame to be written before checking save mode. This triggers expensive metadata discovery prematurely. For save mode other than `Append`, this metadata discovery is useless since we either ignore the result (for `Ignore` and `ErrorIfExists`) or delete existing files (for `Overwrite`) later.
This PR fixes this issue by deferring metadata discovery after save mode checking.
Author: Cheng Lian <lian@databricks.com>
Closes#6583 from liancheng/spark-8014 and squashes the following commits:
1aafabd [Cheng Lian] Updates comments
088abaa [Cheng Lian] Avoids schema merging and partition discovery when data schema and partition schema are defined
8fbd93f [Cheng Lian] Fixes SPARK-8014
(cherry picked from commit 686a45f0b9)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6581 from liancheng/spark-8037 and squashes the following commits:
d08e97b [Cheng Lian] Ignores files whose name starts with dot in HadoopFsRelation
(cherry picked from commit 1bb5d716c0)
Signed-off-by: Cheng Lian <lian@databricks.com>
This closes#6570.
Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6573 from rxin/deterministic and squashes the following commits:
356cd22 [Reynold Xin] Added unit test for the optimizer.
da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
da56200 [Yin Huai] Comments.
e38f264 [Yin Huai] Comment.
f9d6a73 [Yin Huai] Add a deterministic method to Expression.
(cherry picked from commit 0f80990bfa)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala
https://issues.apache.org/jira/browse/SPARK-8020
Author: Yin Huai <yhuai@databricks.com>
Closes#6571 from yhuai/SPARK-8020-1 and squashes the following commits:
0398f5b [Yin Huai] First populate the SQLConf and then construct executionHive and metadataHive.
(cherry picked from commit 7b7f7b6c6f)
Signed-off-by: Yin Huai <yhuai@databricks.com>
cc yhuai
Author: Davies Liu <davies@databricks.com>
Closes#6558 from davies/decimalType and squashes the following commits:
c877ca8 [Davies Liu] Update ParquetConverter.scala
48cc57c [Davies Liu] Update ParquetConverter.scala
b43845c [Davies Liu] add test
3b4a94f [Davies Liu] DecimalType is not read back when non-native type exists
(cherry picked from commit bcb47ad771)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6569 from rxin/freqItemsWarning and squashes the following commits:
7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
(cherry picked from commit 4c868b9943)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6565 from rxin/alias and squashes the following commits:
286d880 [Reynold Xin] [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
(cherry picked from commit 89f642a0e8)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6566 from rxin/crosstab and squashes the following commits:
e0ace1c [Reynold Xin] [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear
(cherry picked from commit 6396cc0303)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6541 from rxin/trailing-whitespace-on and squashes the following commits:
f72ebe4 [Reynold Xin] [SPARK-3850] Turn style checker on for trailing whitespaces.
(cherry picked from commit 866652c903)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6535 from rxin/whitespace-sql and squashes the following commits:
de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.
(cherry picked from commit 63a50be13d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Author: Reynold Xin <rxin@databricks.com>
This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>
Closes#6527 from rxin/covariant-equals and squashes the following commits:
e7d7784 [Reynold Xin] [SPARK-7975] Enforce CovariantEqualsChecker
(cherry picked from commit 7896e99b2a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Cheng Lian <lian@databricks.com>
Closes#6529 from liancheng/schemardd-deprecation-fix and squashes the following commits:
49765c2 [Cheng Lian] Adds @deprecated Scaladoc entry for SchemaRDD
(cherry picked from commit 8764dccebd)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Scala deprecated annotation actually doesn't show up in JavaDoc.
Author: Reynold Xin <rxin@databricks.com>
Closes#6523 from rxin/df-deprecated-javadoc and squashes the following commits:
26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods.
(cherry picked from commit c63e1a742b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I went through all the JavaDocs and tightened up visibility.
Author: Reynold Xin <rxin@databricks.com>
Closes#6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits:
bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.
(cherry picked from commit 14b314dc2c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6477 from rxin/whitespace-sql-core and squashes the following commits:
ce6e369 [Reynold Xin] Fixed tests.
6095fed [Reynold Xin] [SPARK-7927] whitespace fixes for SQL core.
(cherry picked from commit ff44c711ab)
Signed-off-by: Reynold Xin <rxin@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7897
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6438 from viirya/jdbc_unsigned_bigint and squashes the following commits:
ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint.
(cherry picked from commit a1e092eae5)
Signed-off-by: Reynold Xin <rxin@databricks.com>
As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test suites locally pretty frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on `TestHive.reset` heavily) using various `withXxx` utility methods in `SQLTestUtils` to ask each test case to cleanup their own mess so that we can avoid calling `TestHive.reset`.
Author: Cheng Lian <lian@databricks.com>
Author: Yin Huai <yhuai@databricks.com>
Closes#6353 from liancheng/workaround-spark-7684 and squashes the following commits:
26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll.
a423d48 [Cheng Lian] Fixes Scala style issue
dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround SPARK-7684
92a116d [Cheng Lian] Fixes minor styling issues
(cherry picked from commit b97ddff000)
Signed-off-by: Yin Huai <yhuai@databricks.com>
This should also close#6243.
Author: Reynold Xin <rxin@databricks.com>
Closes#6431 from rxin/JavaTypeInference-guava and squashes the following commits:
e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
(cherry picked from commit 6fec1a9409)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Please refer to [SPARK-7847] [1] for details.
[1]: https://issues.apache.org/jira/browse/SPARK-7847
Author: Cheng Lian <lian@databricks.com>
Closes#6389 from liancheng/spark-7847 and squashes the following commits:
935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value
f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions
d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
(cherry picked from commit 15459db4f6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
JIRA: https://issues.apache.org/jira/browse/SPARK-7697
The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#6229 from viirya/unsignedint_as_long and squashes the following commits:
dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
(cherry picked from commit 4f98d7a7f1)
Signed-off-by: Reynold Xin <rxin@databricks.com>