spark-instrumented-optimizer/python/pyspark
Jalpan Randeri 339b0ecadb [SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow
Handle Pandas category type while converting from python with Arrow enabled. The category column will be converted to whatever type the category elements are as is the case with Arrow disabled.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
New unit tests were added for `createDataFrame` and scalar `pandas_udf`

Closes #26585 from jalpan-randeri/feature-pyarrow-dictionary-type.

Authored-by: Jalpan Randeri <randerij@amazon.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
2020-05-27 17:27:29 -07:00
..
ml [SPARK-31734][ML][PYSPARK] Add weight support in ClusteringEvaluator 2020-05-25 09:18:08 -05:00
mllib [SPARK-31739][PYSPARK][DOCS][MINOR] Fix docstring syntax issues and misplaced space characters 2020-05-18 20:25:02 +09:00
resource [SPARK-31748][PYTHON] Document resource module in PySpark doc and rename/move classes 2020-05-19 17:09:37 -07:00
sql [SPARK-25351][SQL][PYTHON] Handle Pandas category type when converting from Python with Arrow 2020-05-27 17:27:29 -07:00
streaming [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
testing [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
tests Revert "[SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs" 2020-05-27 10:15:33 +09:00
__init__.py [SPARK-31767][PYTHON][CORE] Remove ResourceInformation in pyspark module's namespace 2020-05-19 22:36:36 -07:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
broadcast.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
cloudpickle.py [SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 2019-10-22 16:18:34 +09:00
conf.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
context.py Revert "[SPARK-31788][CORE][PYTHON] Fix UnionRDD of PairRDDs" 2020-05-27 10:15:33 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
find_spark_home.py [SPARK-31382][BUILD] Show a better error message for different python and pip installation mistake 2020-04-09 11:04:35 +09:00
heapq3.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
java_gateway.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-31748][PYTHON] Document resource module in PySpark doc and rename/move classes 2020-05-19 17:09:37 -07:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-30205][PYSPARK] Import ABCs from collections.abc to remove deprecation warnings 2019-12-10 11:08:13 -08:00
serializers.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
shell.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
shuffle.py [SPARK-25696] The storage memory displayed on spark Application UI is… 2018-12-10 18:27:01 -06:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 2018-11-07 22:48:50 -06:00
taskcontext.py [SPARK-31344][CORE] Polish implementation of barrier() and allGather() 2020-04-16 21:23:32 -07:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
version.py [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
worker.py [SPARK-31748][PYTHON] Document resource module in PySpark doc and rename/move classes 2020-05-19 17:09:37 -07:00