spark-instrumented-optimizer/sql/core
Hyukjin Kwon 9c5bcac61e [SPARK-36626][PYTHON] Support TimestampNTZ in createDataFrame/toPandas and Python UDFs
### What changes were proposed in this pull request?

This PR proposes to implement `TimestampNTZType` support in PySpark's `SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas UDFs with and without Arrow.

### Why are the changes needed?

To complete `TimestampNTZType` support.

### Does this PR introduce _any_ user-facing change?

Yes.

- Users now can use `TimestampNTZType` type in `SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas UDFs with and without Arrow.

- If `spark.sql.timestampType` is configured to `TIMESTAMP_NTZ`, PySpark will infer the `datetime` without timezone as `TimestampNTZType`. If it has a timezone, it will be inferred as `TimestampType` in `SparkSession.createDataFrame`.

    - If `TimestampType` and `TimestampNTZType` conflict during merging inferred schema, `TimestampType` has a higher precedence.

- If the type is `TimestampNTZType`, treat this internally as an unknown timezone, and compute w/ UTC (same as JVM side), and avoid localization externally.

### How was this patch tested?

Manually tested and unittests were added.

Closes #33876 from HyukjinKwon/SPARK-36626.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: Dominik Gehl <dog@open.ch>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-09-02 14:00:27 +09:00
..
benchmarks [SPARK-34981][SQL][FOLLOWUP] Use SpecificInternalRow in ApplyFunctionExpression 2021-05-24 17:25:24 +09:00
src [SPARK-36626][PYTHON] Support TimestampNTZ in createDataFrame/toPandas and Python UDFs 2021-09-02 14:00:27 +09:00
pom.xml Revert "[SPARK-34309][BUILD][CORE][SQL][K8S] Use Caffeine instead of Guava Cache" 2021-08-22 09:36:15 +09:00