spark-instrumented-optimizer/python/pyspark
Bryan Cutler e599837248 [SPARK-23009][PYTHON] Fix for non-str col names to createDataFrame from Pandas
## What changes were proposed in this pull request?

This the case when calling `SparkSession.createDataFrame` using a Pandas DataFrame that has non-str column labels.

The column name conversion logic to handle non-string or unicode in python2 is:
```
if column is not any type of string:
    name = str(column)
else if column is unicode in Python 2:
    name = column.encode('utf-8')
```

## How was this patch tested?

Added a new test with a Pandas DataFrame that has int column labels

Author: Bryan Cutler <cutlerb@gmail.com>

Closes #20210 from BryanCutler/python-createDataFrame-int-col-error-SPARK-23009.
2018-01-10 14:55:24 +09:00
..
ml [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
mllib [SPARK-22399][ML] update the location of reference paper 2017-10-31 08:20:23 +00:00
sql [SPARK-23009][PYTHON] Fix for non-str col names to createDataFrame from Pandas 2018-01-10 14:55:24 +09:00
streaming [SPARK-22313][PYTHON][FOLLOWUP] Explicitly import warnings namespace in flume.py 2017-12-29 14:46:03 +09:00
__init__.py [MINOR] Fix some typo of the document 2017-06-19 20:35:58 +01:00
accumulators.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
broadcast.py [SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry 2017-08-02 07:12:23 +09:00
cloudpickle.py [SPARK-21070][PYSPARK] Attempt to update cloudpickle again 2017-08-22 11:17:53 +09:00
conf.py [SPARK-18447][DOCS] Fix the markdown for Note:/NOTE:/Note that across Python API documentation 2016-11-22 11:40:18 +00:00
context.py [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas 2017-11-13 13:16:01 +09:00
daemon.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
find_spark_home.py [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed 2016-11-16 14:22:15 -08:00
heapq3.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
java_gateway.py [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFrame from Pandas 2017-11-13 13:16:01 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod() 2015-06-26 08:12:22 -07:00
rdd.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0 2017-12-21 20:43:56 +09:00
shell.py [SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell 2017-04-12 10:54:50 -07:00
shuffle.py [SPARK-10710] Remove ability to disable spilling in core and SQL 2015-09-19 21:40:21 -07:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api 2016-04-12 23:06:55 -07:00
taskcontext.py [SPARK-18576][PYTHON] Add basic TaskContext information to PySpark 2016-12-20 15:51:21 -08:00
tests.py [SPARK-22043][PYTHON] Improves error message for show_profiles and dump_profiles 2017-09-18 13:20:11 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-19505][PYTHON] AttributeError on Exception.message in Python3 2017-04-11 12:18:31 -07:00
version.py [MINOR] Bump SparkR and PySpark version to 2.3.0. 2017-06-19 11:13:03 +01:00
worker.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00