spark-instrumented-optimizer

History

HyukjinKwon 5f7aceb9df [SPARK-28240][PYTHON] Fix Arrow tests to pass with Python 2.7 and latest PyArrow and Pandas in PySpark ## What changes were proposed in this pull request? In Python 2.7 with latest PyArrow and Pandas, the error message seems a bit different with Python 3. This PR simply fixes the test. ``` ====================================================================== FAIL: test_createDataFrame_with_incorrect_schema (pyspark.sql.tests.test_arrow.ArrowTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/.../spark/python/pyspark/sql/tests/test_arrow.py", line 275, in test_createDataFrame_with_incorrect_schema self.spark.createDataFrame(pdf, schema=wrong_schema) AssertionError: "integer.required.got.str" does not match "('Exception thrown when converting pandas.Series (object) to Arrow Array (int32). It can be caused by overflows or other unsafe conversions warned by Arrow. Arrow safe type check can be disabled by using SQL config `spark.sql.execution.pandas.arrowSafeTypeConversion`.', ArrowTypeError('an integer is required',))" ====================================================================== FAIL: test_createDataFrame_with_incorrect_schema (pyspark.sql.tests.test_arrow.EncryptionArrowTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/.../spark/python/pyspark/sql/tests/test_arrow.py", line 275, in test_createDataFrame_with_incorrect_schema self.spark.createDataFrame(pdf, schema=wrong_schema) AssertionError: "integer.required.got.str" does not match "('Exception thrown when converting pandas.Series (object) to Arrow Array (int32). It can be caused by overflows or other unsafe conversions warned by Arrow. Arrow safe type check can be disabled by using SQL config `spark.sql.execution.pandas.arrowSafeTypeConversion`.', ArrowTypeError('an integer is required',))" ``` ## How was this patch tested? Manually tested. ``` cd python ./run-tests --python-executables=python --modules pyspark-sql ``` Closes #25042 from HyukjinKwon/SPARK-28240. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-07-03 17:46:31 +09:00
..
avro	[SPARK-26856][PYSPARK][FOLLOWUP] Fix UT failure due to wrong patterns for Kinesis assembly	2019-04-02 14:52:56 +09:00
tests	[SPARK-28240][PYTHON] Fix Arrow tests to pass with Python 2.7 and latest PyArrow and Pandas in PySpark	2019-07-03 17:46:31 +09:00
__init__.py	[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark	2017-11-02 15:22:52 +01:00
catalog.py	[SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs	2018-08-17 10:18:08 +08:00
column.py	[SPARK-28031][PYSPARK][TEST] Improve doctest on over function of Column	2019-06-13 11:04:41 +09:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
dataframe.py	[SPARK-28198][PYTHON] Add mapPartitionsInPandas to allow an iterator of DataFrames	2019-07-02 10:54:16 +09:00
functions.py	[SPARK-28056][.2][PYTHON][SQL] add docstring/doctest for SCALAR_ITER Pandas UDF	2019-06-28 15:09:57 -07:00
group.py	[SPARK-24722][SQL] pivot() with Column type argument	2018-08-04 14:17:32 +08:00
readwriter.py	[SPARK-28058][DOC] Add a note to doc of mode of CSV for column pruning	2019-06-18 13:48:32 +09:00
session.py	[SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized	2019-06-11 18:43:59 +09:00
streaming.py	[SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources	2019-05-09 08:41:43 +09:00
types.py	[SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows	2019-05-06 10:00:49 -07:00
udf.py	[SPARK-26412][PYSPARK][SQL] Allow Pandas UDF to take an iterator of pd.Series or an iterator of tuple of pd.Series	2019-06-15 08:29:20 -07:00
utils.py	[SPARK-28041][PYTHON] Increase minimum supported Pandas to 0.23.2	2019-06-18 09:10:58 +09:00
window.py	[MINOR][PYSPARK][SQL][DOC] Fix rowsBetween doc in Window	2019-06-14 09:56:37 +09:00