spark-instrumented-optimizer

History

TigerYang414 60a899b8c3 [SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue ## What changes were proposed in this pull request? With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7. This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory. The proposed fix will use imap in python 2.7 and it has been verified. ## How was this patch tested? Manual test. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #23954 from TigerYang414/patch-1. Lead-authored-by: TigerYang414 <39265202+TigerYang414@users.noreply.github.com> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Sean Owen <sean.owen@databricks.com>		2019-03-12 10:23:26 -05:00
..
avro	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
tests	[SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue	2019-03-12 10:23:26 -05:00
__init__.py	[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark	2017-11-02 15:22:52 +01:00
catalog.py	[SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs	2018-08-17 10:18:08 +08:00
column.py	[SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark	2018-04-08 12:09:06 +08:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
dataframe.py	[SPARK-26449][PYTHON] Add transform method to DataFrame API	2019-02-26 18:22:36 -06:00
functions.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
group.py	[SPARK-24722][SQL] pivot() with Column type argument	2018-08-04 14:17:32 +08:00
readwriter.py	[SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8	2019-03-05 08:03:39 +09:00
session.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
streaming.py	[SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8	2019-03-05 08:03:39 +09:00
types.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
udf.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
utils.py	[SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy	2018-08-28 10:57:13 +08:00
window.py	[SPARK-26860][PYSPARK][SPARKR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation	2019-03-11 08:53:09 -05:00