spark-instrumented-optimizer

History

Gengliang Wang 99eb3ff226 [SPARK-36227][SQL][3.2] Remove TimestampNTZ type support in Spark 3.2 ### What changes were proposed in this pull request? Remove TimestampNTZ type support in the production code of Spark 3.2. To archive the goal, this PR adds the check "Utils.isTesting" in the following code branches: - keyword "timestamp_ntz" and "timestamp_ltz" in parser - New expressions from https://issues.apache.org/jira/browse/SPARK-35662 - Using java.time.localDateTime as the external type for TimestampNTZType - `SQLConf.timestampType` which determines the default timestamp type of Spark SQL. This is to minimize the code difference between the master branch. So that future users won't think TimestampNTZ is already available in Spark 3.2. The downside is that users can still find TimestampNTZType under package `org.apache.spark.sql.types`. There should be nothing left other than this. ### Why are the changes needed? As of now, there are some blockers for delivering the TimestampNTZ project in Spark 3.2: - In the Hive Thrift server, both TimestampType and TimestampNTZType are mapped to the same timestamp type, which can cause confusion for users. - For the Parquet data source, the new written TimestampNTZType Parquet columns will be read as TimestampType in old Spark releases. Also, we need to decide the merge schema for files mixed with TimestampType and TimestampNTZ type. - The type coercion rules for TimestampNTZType are incomplete. For example, what should the data type of the in clause "IN(Timestamp'2020-01-01 00:00:00', TimestampNtz'2020-01-01 00:00:00') be. - It is tricky to support TimestampNTZType in JSON/CSV data readers. We need to avoid regressions as possible as we can. There are 10 days left for the expected 3.2 RC date. So, I propose to release the TimestampNTZ type in Spark 3.3 instead of Spark 3.2. So that we have enough time to make considerate designs for the issues. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing Unit tests + manual tests from spark-shell to validate the changes are gone. New functions ``` spark.sql("select to_timestamp_ntz'2021-01-01 00:00:00'").show() spark.sql("select to_timestamp_ltz'2021-01-01 00:00:00'").show() spark.sql("select make_timestamp_ntz(1,1,1,1,1,1)").show() spark.sql("select make_timestamp_ltz(1,1,1,1,1,1)").show() spark.sql("select localtimestamp()").show() ``` The SQL configuration `spark.sql.timestampType` should not work in 3.2 ``` spark.conf.set("spark.sql.timestampType", "TIMESTAMP_NTZ") spark.sql("select make_timestamp(1,1,1,1,1,1)").schema spark.sql("select to_timestamp('2021-01-01 00:00:00')").schema spark.sql("select timestamp'2021-01-01 00:00:00'").schema Seq((1, java.sql.Timestamp.valueOf("2021-01-01 00:00:00"))).toDF("i", "ts").write.partitionBy("ts").parquet("/tmp/test") spark.read.parquet("/tmp/test").schema ``` LocalDateTime is not supported as a built-in external type: ``` Seq(LocalDateTime.now()).toDF() org.apache.spark.sql.catalyst.expressions.Literal(java.time.LocalDateTime.now()) org.apache.spark.sql.catalyst.expressions.Literal(0L, TimestampNTZType) ``` Closes #33444 from gengliangwang/banNTZ. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>		2021-07-21 09:55:09 -07:00
..
benchmarks	[SPARK-34981][SQL][FOLLOWUP] Use SpecificInternalRow in ApplyFunctionExpression	2021-05-24 17:25:24 +09:00
src	[SPARK-36227][SQL][3.2] Remove TimestampNTZ type support in Spark 3.2	2021-07-21 09:55:09 -07:00
pom.xml	[SPARK-35784][SS] Implementation for RocksDB instance	2021-06-29 17:46:45 -07:00