spark-instrumented-optimizer/python/pyspark/sql
Dongjoon Hyun 0f576a5748 [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent.
## What changes were proposed in this pull request?

**createDataFrame** returns inconsistent types for column names.
```python
>>> from pyspark.sql.types import StructType, StructField, StringType
>>> schema = StructType([StructField(u"col", StringType())])
>>> df1 = spark.createDataFrame([("a",)], schema)
>>> df1.columns # "col" is str
['col']
>>> df2 = spark.createDataFrame([("a",)], [u"col"])
>>> df2.columns # "col" is unicode
[u'col']
```

The reason is only **StructField** has the following code.
```
if not isinstance(name, str):
    name = name.encode('utf-8')
```
This PR adds the same logic into **createDataFrame** for consistency.
```
if isinstance(schema, list):
    schema = [x.encode('utf-8') if not isinstance(x, str) else x for x in schema]
```

## How was this patch tested?

Pass the Jenkins test (with new python doctest)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13097 from dongjoon-hyun/SPARK-15244.
2016-05-17 13:05:07 -07:00
..
__init__.py [SPARK-14945][PYTHON] SparkSession Python API 2016-04-28 10:55:48 -07:00
catalog.py [SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView 2016-05-12 15:51:53 +08:00
column.py [SPARK-15278] [SQL] Remove experimental tag from Python DataFrame 2016-05-11 15:12:27 -07:00
conf.py [SPARK-15126][SQL] RuntimeConfig.set should return Unit 2016-05-04 14:26:05 -07:00
context.py [SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView 2016-05-12 15:51:53 +08:00
dataframe.py [SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView 2016-05-12 15:51:53 +08:00
functions.py [SPARK-14639] [PYTHON] [R] Add bround function in Python/R. 2016-04-19 22:28:11 -07:00
group.py [SPARK-12756][SQL] use hash expression in Exchange 2016-01-13 22:43:28 -08:00
readwriter.py [SPARK-15072][SQL][PYSPARK][HOT-FIX] Remove SparkSession.withHiveSupport from readwrite.py 2016-05-11 21:43:56 -07:00
session.py [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent. 2016-05-17 13:05:07 -07:00
streaming.py [SPARK-14896][SQL] Deprecate HiveContext in python 2016-05-04 17:39:30 -07:00
tests.py [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent. 2016-05-17 13:05:07 -07:00
types.py [SPARK-12200][SQL] Add __contains__ implementation to Row 2016-05-11 13:15:11 -07:00
utils.py [SPARK-14603][SQL] Verification of Metadata Operations by Session Catalog 2016-05-10 11:25:55 -07:00
window.py [SPARK-14058][PYTHON] Incorrect docstring in Window.order 2016-03-21 23:52:33 -07:00