spark-instrumented-optimizer/python/pyspark/sql
TigerYang414 60a899b8c3 [SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue
## What changes were proposed in this pull request?

With large partition, pyspark may exceeds executor memory limit and trigger out of memory for python 2.7.
This is because map() is used. Unlike in python3.x, python 2.7 map() will generate a list and need to read all data into memory.

The proposed fix will use imap in python 2.7 and it has been verified.

## How was this patch tested?
Manual test.
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

Please review http://spark.apache.org/contributing.html before opening a pull request.

Closes #23954 from TigerYang414/patch-1.

Lead-authored-by: TigerYang414 <39265202+TigerYang414@users.noreply.github.com>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-03-12 10:23:26 -05:00
..
avro [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00
tests [SPARK-27041][PYSPARK] Use imap() for python 2.x to resolve oom issue 2019-03-12 10:23:26 -05:00
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs 2018-08-17 10:18:08 +08:00
column.py [SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark 2018-04-08 12:09:06 +08:00
conf.py [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
context.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
dataframe.py [SPARK-26449][PYTHON] Add transform method to DataFrame API 2019-02-26 18:22:36 -06:00
functions.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
group.py [SPARK-24722][SQL] pivot() with Column type argument 2018-08-04 14:17:32 +08:00
readwriter.py [SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8 2019-03-05 08:03:39 +09:00
session.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
streaming.py [SPARK-26016][DOCS] Clarify that text DataSource read/write, and RDD methods that read text, always use UTF-8 2019-03-05 08:03:39 +09:00
types.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
udf.py [SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF 2019-03-07 08:52:24 -08:00
utils.py [SPARK-24721][SQL] Exclude Python UDFs filters in FileSourceStrategy 2018-08-28 10:57:13 +08:00
window.py [SPARK-26860][PYSPARK][SPARKR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation 2019-03-11 08:53:09 -05:00