9c5bcac61e
### What changes were proposed in this pull request? This PR proposes to implement `TimestampNTZType` support in PySpark's `SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas UDFs with and without Arrow. ### Why are the changes needed? To complete `TimestampNTZType` support. ### Does this PR introduce _any_ user-facing change? Yes. - Users now can use `TimestampNTZType` type in `SparkSession.createDataFrame`, `DataFrame.toPandas`, Python UDFs, and pandas UDFs with and without Arrow. - If `spark.sql.timestampType` is configured to `TIMESTAMP_NTZ`, PySpark will infer the `datetime` without timezone as `TimestampNTZType`. If it has a timezone, it will be inferred as `TimestampType` in `SparkSession.createDataFrame`. - If `TimestampType` and `TimestampNTZType` conflict during merging inferred schema, `TimestampType` has a higher precedence. - If the type is `TimestampNTZType`, treat this internally as an unknown timezone, and compute w/ UTC (same as JVM side), and avoid localization externally. ### How was this patch tested? Manually tested and unittests were added. Closes #33876 from HyukjinKwon/SPARK-36626. Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org> Co-authored-by: Dominik Gehl <dog@open.ch> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |