This is branch 1.4 backport of https://github.com/apache/spark/pull/6888.
Below is the original description.
In earlier versions of Spark SQL we casted `TimestampType` and `DataType` to `StringType` when it was involved in a binary comparison with a `StringType`. This allowed comparing a timestamp with a partial date as a user would expect.
- `time > "2014-06-10"`
- `time > "2014"`
In 1.4.0 we tried to cast the String instead into a Timestamp. However, since partial dates are not a valid complete timestamp this results in `null` which results in the tuple being filtered.
This PR restores the earlier behavior. Note that we still special case equality so that these comparisons are not affected by not printing zeros for subsecond precision.
Author: Michael Armbrust <michaeldatabricks.com>
Closes#6888 from marmbrus/timeCompareString and squashes the following commits:
bdef29c [Michael Armbrust] test partial date
1f09adf [Michael Armbrust] special handling of equality
1172c60 [Michael Armbrust] more test fixing
4dfc412 [Michael Armbrust] fix tests
aaa9508 [Michael Armbrust] newline
04d908f [Michael Armbrust] [SPARK-8420][SQL] Fix comparision of timestamps/dates with strings
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
Author: Michael Armbrust <michael@databricks.com>
Closes#6914 from yhuai/timeCompareString-1.4 and squashes the following commits:
9882915 [Michael Armbrust] [SPARK-8420] [SQL] Fix comparision of timestamps/dates with strings
rxin this is the fix you requested for the break introduced by backporting #6793
Author: Punya Biswal <pbiswal@palantir.com>
Closes#6850 from punya/feature/fix-backport-break and squashes the following commits:
fdc3693 [Punya Biswal] Fix break introduced by backport
This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()'
Tested using spark-shell and verified that the following works:
sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show()
Author: tedyu <yuzhihong@gmail.com>
Closes#6793 from tedyu/master and squashes the following commits:
62fd97b [tedyu] Create RandomSuite
750f92c [tedyu] Add test for Rand() with seed
a1d66c5 [tedyu] Fix NullPointerException with functions.rand()
(cherry picked from commit 1a62d61696)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6811 from marmbrus/aliasExplodeStar and squashes the following commits:
fbd2065 [Michael Armbrust] more style
806a373 [Michael Armbrust] fix style
7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa
(cherry picked from commit 9073a426e4)
Signed-off-by: Michael Armbrust <michael@databricks.com>
UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size. This means that we end up allocating 8x too much conversion space.
This patch fixes this by allocating a `byte[]` array instead. This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows.
Author: Josh Rosen <joshrosen@databricks.com>
Closes#6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits:
6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size
(cherry picked from commit ea7fd2ff64)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
This includes the following commits:
original: 9eb222c
hotfix1: 8c99793
hotfix2: a4f2412
scalastyle check: 609c492
---
Original patch #6441
Branch-1.3 patch #6602
Author: Andrew Or <andrew@databricks.com>
Closes#6598 from andrewor14/demarcate-tests-1.4 and squashes the following commits:
4c3c566 [Andrew Or] Merge branch 'branch-1.4' of github.com:apache/spark into demarcate-tests-1.4
e217b78 [Andrew Or] [SPARK-7558] Guard against direct uses of FunSuite / FunSuiteLike
46d4361 [Andrew Or] Various whitespace changes (minor)
3d9bf04 [Andrew Or] Make all test suites extend SparkFunSuite instead of FunSuite
eaa520e [Andrew Or] Fix tests?
b4d93de [Andrew Or] Fix tests
634a777 [Andrew Or] Fix log message
a932e8d [Andrew Or] Fix manual things that cannot be covered through automation
8bc355d [Andrew Or] Add core tests as dependencies in all modules
75d361f [Andrew Or] Introduce base abstract class for all test suites
87941ff8c4 accidentally removed the EvaluatedType.
Author: Yin Huai <yhuai@databricks.com>
Closes#6589 from yhuai/getBackEvaluatedType and squashes the following commits:
618c2eb [Yin Huai] Add EvaluatedType back.
This closes#6570.
Author: Yin Huai <yhuai@databricks.com>
Author: Reynold Xin <rxin@databricks.com>
Closes#6573 from rxin/deterministic and squashes the following commits:
356cd22 [Reynold Xin] Added unit test for the optimizer.
da3fde1 [Reynold Xin] Merge pull request #6570 from yhuai/SPARK-8023
da56200 [Yin Huai] Comments.
e38f264 [Yin Huai] Comment.
f9d6a73 [Yin Huai] Add a deterministic method to Expression.
(cherry picked from commit 0f80990bfa)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/random.scala
Author: Reynold Xin <rxin@databricks.com>
Closes#6535 from rxin/whitespace-sql and squashes the following commits:
de50316 [Reynold Xin] [SPARK-3850] Trim trailing spaces for SQL.
(cherry picked from commit 63a50be13d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Scala deprecated annotation actually doesn't show up in JavaDoc.
Author: Reynold Xin <rxin@databricks.com>
Closes#6523 from rxin/df-deprecated-javadoc and squashes the following commits:
26da2b2 [Reynold Xin] [SPARK-7971] Add JavaDoc style deprecation for deprecated DataFrame methods.
(cherry picked from commit c63e1a742b)
Signed-off-by: Reynold Xin <rxin@databricks.com>
I went through all the JavaDocs and tightened up visibility.
Author: Reynold Xin <rxin@databricks.com>
Closes#6526 from rxin/sql-1.4-visibility-for-docs and squashes the following commits:
bc37d1e [Reynold Xin] Tighten up visibility for JavaDoc.
(cherry picked from commit 14b314dc2c)
Signed-off-by: Reynold Xin <rxin@databricks.com>
So we can enable a whitespace enforcement rule in the style checker to save code review time.
Author: Reynold Xin <rxin@databricks.com>
Closes#6476 from rxin/whitespace-catalyst and squashes the following commits:
650409d [Reynold Xin] Fixed tests.
51a9e5d [Reynold Xin] [SPARK-7927] whitespace fixes for Catalyst module.
(cherry picked from commit 8da560d7de)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Conflicts:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
This should also close#6243.
Author: Reynold Xin <rxin@databricks.com>
Closes#6431 from rxin/JavaTypeInference-guava and squashes the following commits:
e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
(cherry picked from commit 6fec1a9409)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Two minor changes.
cc brkyvz
Author: Reynold Xin <rxin@databricks.com>
Closes#6428 from rxin/math-func-cleanup and squashes the following commits:
5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
(cherry picked from commit 3e7d7d6b3d)
Signed-off-by: Reynold Xin <rxin@databricks.com>
The Catalyst DSL is no longer used as a public facing API. This pull request removes the UDF and writeToFile feature from it since they are not used in unit tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#6350 from rxin/unused-logical-dsl and squashes the following commits:
90b3de6 [Reynold Xin] [SQL][minor] Removed unused Catalyst logical plan DSL.
(cherry picked from commit c9adcad81a)
Signed-off-by: Reynold Xin <rxin@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6363 from marmbrus/windowErrors and squashes the following commits:
516b02d [Michael Armbrust] [SPARK-7834] [SQL] Better window error messages
(cherry picked from commit 3c1305107a)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Santiago M. Mola <santi@mola.io>
Closes#6327 from smola/feature/catalyst-dsl-set-ops and squashes the following commits:
11db778 [Santiago M. Mola] [SPARK-7724] [SQL] Support Intersect/Except in Catalyst DSL.
(cherry picked from commit e4aef91fe7)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Closes#6165 from marmbrus/wrongColumn and squashes the following commits:
4fad158 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into wrongColumn
aad7eab [Michael Armbrust] rxins comments
f1e8df1 [Michael Armbrust] [SPARK-6743][SQL] Fix empty projections of cached data
(cherry picked from commit 3b68cb0430)
Signed-off-by: Michael Armbrust <michael@databricks.com>