spark-instrumented-optimizer

History

0x0FFF bf550a4b55 [SPARK-10162] [SQL] Fix the timezone omitting for PySpark Dataframe filter function This PR addresses [SPARK-10162](https://issues.apache.org/jira/browse/SPARK-10162) The issue is with DataFrame filter() function, if datetime.datetime is passed to it: * Timezone information of this datetime is ignored * This datetime is assumed to be in local timezone, which depends on the OS timezone setting Fix includes both code change and regression test. Problem reproduction code on master: ```python import pytz from datetime import datetime from pyspark.sql import * from pyspark.sql.types import * sqc = SQLContext(sc) df = sqc.createDataFrame([], StructType([StructField("dt", TimestampType())])) m1 = pytz.timezone('UTC') m2 = pytz.timezone('Etc/GMT+3') df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() ``` It gives the same timestamp ignoring time zone: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946713600000000) Scan PhysicalRDD[dt#0] ``` After the fix: ``` >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m1)).explain() Filter (dt#0 > 946684800000000) Scan PhysicalRDD[dt#0] >>> df.filter(df.dt > datetime(2000, 01, 01, tzinfo=m2)).explain() Filter (dt#0 > 946695600000000) Scan PhysicalRDD[dt#0] ``` PR [8536](https://github.com/apache/spark/pull/8536) was occasionally closed by me dropping the repo Author: 0x0FFF <programmerag@gmail.com> Closes #8555 from 0x0FFF/SPARK-10162.		2015-09-01 14:34:59 -07:00
..
__init__.py	[SPARK-8060] Improve DataFrame Python test coverage and documentation.	2015-06-03 00:23:34 -07:00
column.py	[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses to JavaConverters	2015-08-25 12:33:13 +01:00
context.py	[SPARK-9942] [PYSPARK] [SQL] ignore exceptions while try to import pandas	2015-08-13 14:03:55 -07:00
dataframe.py	[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses to JavaConverters	2015-08-25 12:33:13 +01:00
functions.py	[DOCS] [SQL] [PYSPARK] Fix typo in ntile function	2015-08-19 09:42:41 +01:00
group.py	[SPARK-8770][SQL] Create BinaryOperator abstract class.	2015-07-01 21:14:13 -07:00
readwriter.py	[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String for JSON	2015-08-26 22:19:11 -07:00
tests.py	[SPARK-10162] [SQL] Fix the timezone omitting for PySpark Dataframe filter function	2015-09-01 14:34:59 -07:00
types.py	[SPARK-10162] [SQL] Fix the timezone omitting for PySpark Dataframe filter function	2015-09-01 14:34:59 -07:00
utils.py	[SPARK-9166][SQL][PYSPARK] Capture and hide IllegalArgumentException in Python API	2015-07-19 00:32:56 -07:00
window.py	[SPARK-9978] [PYSPARK] [SQL] fix Window.orderBy and doc of ntile()	2015-08-14 13:55:29 -07:00