spark-instrumented-optimizer

History

Dongjoon Hyun 0f576a5748 [SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent. ## What changes were proposed in this pull request? createDataFrame returns inconsistent types for column names. ```python >>> from pyspark.sql.types import StructType, StructField, StringType >>> schema = StructType([StructField(u"col", StringType())]) >>> df1 = spark.createDataFrame([("a",)], schema) >>> df1.columns # "col" is str ['col'] >>> df2 = spark.createDataFrame([("a",)], [u"col"]) >>> df2.columns # "col" is unicode [u'col'] ``` The reason is only StructField has the following code. ``` if not isinstance(name, str): name = name.encode('utf-8') ``` This PR adds the same logic into createDataFrame for consistency. ``` if isinstance(schema, list): schema = [x.encode('utf-8') if not isinstance(x, str) else x for x in schema] ``` ## How was this patch tested? Pass the Jenkins test (with new python doctest) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13097 from dongjoon-hyun/SPARK-15244.		2016-05-17 13:05:07 -07:00
..
__init__.py	[SPARK-14945][PYTHON] SparkSession Python API	2016-04-28 10:55:48 -07:00
catalog.py	[SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView	2016-05-12 15:51:53 +08:00
column.py	[SPARK-15278] [SQL] Remove experimental tag from Python DataFrame	2016-05-11 15:12:27 -07:00
conf.py	[SPARK-15126][SQL] RuntimeConfig.set should return Unit	2016-05-04 14:26:05 -07:00
context.py	[SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView	2016-05-12 15:51:53 +08:00
dataframe.py	[SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempView	2016-05-12 15:51:53 +08:00
functions.py	[SPARK-14639] [PYTHON] [R] Add `bround` function in Python/R.	2016-04-19 22:28:11 -07:00
group.py	[SPARK-12756][SQL] use hash expression in Exchange	2016-01-13 22:43:28 -08:00
readwriter.py	[SPARK-15072][SQL][PYSPARK][HOT-FIX] Remove SparkSession.withHiveSupport from readwrite.py	2016-05-11 21:43:56 -07:00
session.py	[SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent.	2016-05-17 13:05:07 -07:00
streaming.py	[SPARK-14896][SQL] Deprecate HiveContext in python	2016-05-04 17:39:30 -07:00
tests.py	[SPARK-15244] [PYTHON] Type of column name created with createDataFrame is not consistent.	2016-05-17 13:05:07 -07:00
types.py	[SPARK-12200][SQL] Add __contains__ implementation to Row	2016-05-11 13:15:11 -07:00
utils.py	[SPARK-14603][SQL] Verification of Metadata Operations by Session Catalog	2016-05-10 11:25:55 -07:00
window.py	[SPARK-14058][PYTHON] Incorrect docstring in Window.order	2016-03-21 23:52:33 -07:00