spark-instrumented-optimizer

History

Bryan Cutler f62f44f2a2 [SPARK-27387][PYTHON][TESTS] Replace sqlutils.assertPandasEqual with Pandas assert_frame_equals ## What changes were proposed in this pull request? Running PySpark tests with Pandas 0.24.x causes a failure in `test_pandas_udf_grouped_map` test_supported_types: `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()` This is because a column is an ArrayType and the method `sqlutils ReusedSQLTestCase.assertPandasEqual ` does not properly check this. This PR removes `assertPandasEqual` and replaces it with the built-in `pandas.util.testing.assert_frame_equal` which can properly handle columns of ArrayType and also prints out better diff between the DataFrames when an error occurs. Additionally, imports of pandas and pyarrow were moved to the top of related test files to avoid duplicating the same import many times. ## How was this patch tested? Existing tests Closes #24306 from BryanCutler/python-pandas-assert_frame_equal-SPARK-27387. Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-04-10 07:50:25 +09:00
..
ml	[SPARK-9792] Make DenseMatrix equality semantical	2019-04-01 09:30:33 -07:00
mllib	[SPARK-9792] Make DenseMatrix equality semantical	2019-04-01 09:30:33 -07:00
sql	[SPARK-27387][PYTHON][TESTS] Replace sqlutils.assertPandasEqual with Pandas assert_frame_equals	2019-04-10 07:50:25 +09:00
streaming	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
testing	[SPARK-27387][PYTHON][TESTS] Replace sqlutils.assertPandasEqual with Pandas assert_frame_equals	2019-04-10 07:50:25 +09:00
tests	[SPARK-27000][PYTHON] Upgrades cloudpickle to v0.8.0	2019-02-28 02:33:10 +09:00
__init__.py	[SPARK-25248][.1][PYSPARK] update barrier Python API	2018-08-29 07:22:03 -07:00
_globals.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
accumulators.py	[SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator	2018-10-08 15:18:08 +08:00
broadcast.py	[SPARK-18161][PYTHON] Update cloudpickle to v0.6.1	2019-02-02 10:49:45 +08:00
cloudpickle.py	[SPARK-27000][PYTHON] Upgrades cloudpickle to v0.8.0	2019-02-28 02:33:10 +09:00
conf.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
context.py	[SPARK-27102][R][PYTHON][CORE] Remove the references to Python's Scala codes in R's Scala codes	2019-03-10 15:08:23 +09:00
daemon.py	[PYSPARK] Update py4j to version 0.10.7.	2018-05-09 10:47:35 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
find_spark_home.py	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
heapq3.py	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
java_gateway.py	[SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway	2019-02-15 18:08:06 -08:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
rdd.py	[SPARK-26771][CORE][GRAPHX] Make .unpersist(), .destroy() consistently non-blocking by default	2019-02-01 18:29:55 -06:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.	2019-03-25 11:26:09 -07:00
shell.py	[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4	2018-09-13 11:19:43 +08:00
shuffle.py	[SPARK-25696] The storage memory displayed on spark Application UI is…	2018-12-10 18:27:01 -06:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3	2018-11-07 22:48:50 -06:00
taskcontext.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
version.py	[SPARK-25592] Setting version to 3.0.0-SNAPSHOT	2018-10-02 08:48:24 -07:00
worker.py	[SPARK-27240][PYTHON] Use pandas DataFrame for struct type argument in Scalar Pandas UDF.	2019-03-25 11:26:09 -07:00