spark-instrumented-optimizer/python/pyspark/sql
hyukjinkwon d6632d185e [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame
## What changes were proposed in this pull request?

This PR adds a configuration to control the fallback of Arrow optimization for `toPandas` and `createDataFrame` with Pandas DataFrame.

## How was this patch tested?

Manually tested and unit tests added.

You can test this by:

**`createDataFrame`**

```python
spark.conf.set("spark.sql.execution.arrow.enabled", False)
pdf = spark.createDataFrame([[{'a': 1}]]).toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True)
spark.createDataFrame(pdf, "a: map<string, int>")
```

```python
spark.conf.set("spark.sql.execution.arrow.enabled", False)
pdf = spark.createDataFrame([[{'a': 1}]]).toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False)
spark.createDataFrame(pdf, "a: map<string, int>")
```

**`toPandas`**

```python
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True)
spark.createDataFrame([[{'a': 1}]]).toPandas()
```

```python
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False)
spark.createDataFrame([[{'a': 1}]]).toPandas()
```

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #20678 from HyukjinKwon/SPARK-23380-conf.
2018-03-08 20:22:07 +09:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark 2018-01-18 14:51:05 +09:00
column.py [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column 2017-08-24 20:29:03 +09:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-23122][PYTHON][SQL] Deprecate register* for UDFs in SQLContext and Catalog in PySpark 2018-01-18 14:51:05 +09:00
dataframe.py [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame 2018-03-08 20:22:07 +09:00
functions.py [SPARK-23329][SQL] Fix documentation of trigonometric functions 2018-03-05 23:46:40 +09:00
group.py [SPARK-23261][PYSPARK] Rename Pandas UDFs 2018-01-30 21:55:55 +09:00
readwriter.py [SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document 2018-02-28 11:00:54 +09:00
session.py [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame 2018-03-08 20:22:07 +09:00
streaming.py [SPARK-23448][SQL] Clarify JSON and CSV parser behavior in document 2018-02-28 11:00:54 +09:00
tests.py [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas DataFrame 2018-03-08 20:22:07 +09:00
types.py [SPARK-20090][FOLLOW-UP] Revert the deprecation of names in PySpark 2018-02-13 15:05:13 +09:00
udf.py [SPARK-23569][PYTHON] Allow pandas_udf to work with python3 style type-annotated functions 2018-03-05 13:36:42 +09:00
utils.py [SPARK-23319][TESTS] Explicitly specify Pandas and PyArrow versions in PySpark tests (to skip or test) 2018-02-07 23:28:10 +09:00
window.py [SPARK-23084][PYTHON] Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark 2018-02-11 18:55:38 +09:00