spark-instrumented-optimizer

History

TigerYang414 60a899b8c3 [SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue ## What changes were proposed in this pull request? With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7. This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory. The proposed fix will use imap in python 2.7 and it has been verified. ## How was this patch tested? Manual test. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23954 from TigerYang414/patch-1. Lead-authored-by: TigerYang414 <39265202+TigerYang414@users.noreply.github.com> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>		2019-03-12 10:23:26 -05:00
..
__init__.py	[SPARK-26032][PYTHON] Break large sql/tests.py files into smaller files	2018-11-14 14:51:11 +08:00
test_appsubmit.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_arrow.py	[SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly instead of creating datetime64 as intermediate data.	2019-02-18 11:48:10 +08:00
test_catalog.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_column.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_conf.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_context.py	[SPARK-26676][PYTHON] Make HiveContextSQLTests.test_unbounded_frames test compatible with Python 2 and PyPy	2019-01-21 14:27:17 -08:00
test_dataframe.py	[SPARK-23647][PYTHON][SQL] Adds more types for hint in pyspark	2018-12-01 10:37:03 +08:00
test_datasources.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_functions.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_group.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_pandas_udf.py	[SPARK-25811][PYSPARK] Raise a proper error when unsafe cast is detected by PyArrow	2019-01-22 14:54:41 +08:00
test_pandas_udf_grouped_agg.py	[SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*	2018-12-14 10:45:24 +08:00
test_pandas_udf_grouped_map.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
test_pandas_udf_scalar.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
test_pandas_udf_window.py	[SPARK-24561][SQL][PYTHON] User-defined window aggregation functions with Pandas UDF (bounded window)	2018-12-18 09:15:21 +08:00
test_readwriter.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_serde.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00
test_session.py	[SPARK-27101][PYTHON] Drop the created database after the test in test_session	2019-03-09 09:12:33 +09:00
test_streaming.py	[SPARK-26945][PYTHON][SS][TESTS] Fix flaky test_*_await_termination in PySpark SS tests	2019-02-23 14:57:04 +08:00
test_types.py	[SPARK-26645][PYTHON] Support decimals with negative scale when parsing datatype	2019-01-20 17:43:50 +08:00
test_udf.py	[SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue	2019-03-12 10:23:26 -05:00
test_utils.py	[SPARK-26036][PYTHON] Break large tests.py files into smaller files	2018-11-15 12:30:52 +08:00