spark-instrumented-optimizer/python/pyspark/sql
Takuya UESHIN 64817c423c [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone
## What changes were proposed in this pull request?

When converting Pandas DataFrame/Series from/to Spark DataFrame using `toPandas()` or pandas udfs, timestamp values behave to respect Python system timezone instead of session timezone.

For example, let's say we use `"America/Los_Angeles"` as session timezone and have a timestamp value `"1970-01-01 00:00:01"` in the timezone. Btw, I'm in Japan so Python timezone would be `"Asia/Tokyo"`.

The timestamp value from current `toPandas()` will be the following:

```
>>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
>>> df = spark.createDataFrame([28801], "long").selectExpr("timestamp(value) as ts")
>>> df.show()
+-------------------+
|                 ts|
+-------------------+
|1970-01-01 00:00:01|
+-------------------+

>>> df.toPandas()
                   ts
0 1970-01-01 17:00:01
```

As you can see, the value becomes `"1970-01-01 17:00:01"` because it respects Python timezone.
As we discussed in #18664, we consider this behavior is a bug and the value should be `"1970-01-01 00:00:01"`.

## How was this patch tested?

Added tests and existing tests.

Author: Takuya UESHIN <ueshin@databricks.com>

Closes #19607 from ueshin/issues/SPARK-22395.
2017-11-28 16:45:22 +08:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
column.py [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column 2017-08-24 20:29:03 +09:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-20586][SQL] Add deterministic to ScalaUDF 2017-07-25 17:19:44 -07:00
dataframe.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
functions.py [SPARK-22541][SQL] Explicitly claim that Python udfs can't be conditionally executed with short-curcuit evaluation 2017-11-21 09:36:37 +01:00
group.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
readwriter.py [SPARK-22484][DOC] Document PySpark DataFrame csv writer behavior whe… 2017-11-28 10:14:35 +09:00
session.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
streaming.py [SPARK-21756][SQL] Add JSON option to allow unquoted control characters 2017-08-25 10:18:03 -07:00
tests.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
types.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
udf.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
utils.py [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo 2017-01-04 15:07:29 +00:00
window.py [SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames 2016-12-02 17:39:28 -08:00