spark-instrumented-optimizer

History

hyukjinkwon d6632d185e [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame ## What changes were proposed in this pull request? This PR adds a configuration to control the fallback of Arrow optimization for `toPandas` and `createDataFrame` with Pandas DataFrame. ## How was this patch tested? Manually tested and unit tests added. You can test this by: `createDataFrame` ```python spark.conf.set("spark.sql.execution.arrow.enabled", False) pdf = spark.createDataFrame([[{'a': 1}]]).toPandas() spark.conf.set("spark.sql.execution.arrow.enabled", True) spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True) spark.createDataFrame(pdf, "a: map<string, int>") ``` ```python spark.conf.set("spark.sql.execution.arrow.enabled", False) pdf = spark.createDataFrame([[{'a': 1}]]).toPandas() spark.conf.set("spark.sql.execution.arrow.enabled", True) spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False) spark.createDataFrame(pdf, "a: map<string, int>") ``` `toPandas` ```python spark.conf.set("spark.sql.execution.arrow.enabled", True) spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True) spark.createDataFrame([[{'a': 1}]]).toPandas() ``` ```python spark.conf.set("spark.sql.execution.arrow.enabled", True) spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False) spark.createDataFrame([[{'a': 1}]]).toPandas() ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #20678 from HyukjinKwon/SPARK-23380-conf.		2018-03-08 20:22:07 +09:00
..
__init__.py	[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark	2017-11-02 15:22:52 +01:00
catalog.py	[SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark	2018-01-18 14:51:05 +09:00
column.py	[SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column	2017-08-24 20:29:03 +09:00
conf.py	[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code	2016-05-23 18:14:48 -07:00
context.py	[SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark	2018-01-18 14:51:05 +09:00
dataframe.py	[SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame	2018-03-08 20:22:07 +09:00
functions.py	[SPARK-23329][SQL] Fix documentation of trigonometric functions	2018-03-05 23:46:40 +09:00
group.py	[SPARK-23261][PYSPARK] Rename Pandas UDFs	2018-01-30 21:55:55 +09:00
readwriter.py	[SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document	2018-02-28 11:00:54 +09:00
session.py	[SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame	2018-03-08 20:22:07 +09:00
streaming.py	[SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document	2018-02-28 11:00:54 +09:00
tests.py	[SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame	2018-03-08 20:22:07 +09:00
types.py	[SPARK-20090][FOLLOW-UP] Revert the deprecation of `names` in PySpark	2018-02-13 15:05:13 +09:00
udf.py	[SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style type-annotated functions	2018-03-05 13:36:42 +09:00
utils.py	[SPARK-23319][TESTS] Explicitly specify Pandas and PyArrow versions in PySpark tests (to skip or test)	2018-02-07 23:28:10 +09:00
window.py	[SPARK-23084][PYTHON] Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark	2018-02-11 18:55:38 +09:00