spark-instrumented-optimizer

History

Tibor Csögör eec1a3c286 [SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows This is PR is meant to replace #20503, which lay dormant for a while. The solution in the original PR is still valid, so this is just that patch rebased onto the current master. Original summary follows. ## What changes were proposed in this pull request? Fix `__repr__` behaviour for Rows. Rows `__repr__` assumes data is a string when column name is missing. Examples, ``` >>> from pyspark.sql.types import Row >>> Row ("Alice", "11") <Row(Alice, 11)> >>> Row (name="Alice", age=11) Row(age=11, name='Alice') >>> Row ("Alice", 11) <snip stack trace> TypeError: sequence item 1: expected string, int found ``` This is because Row () when called without column names assumes everything is a string. ## How was this patch tested? Manually tested and a unit test was added to `python/pyspark/sql/tests/test_types.py`. Closes #24448 from tbcs/SPARK-23299. Lead-authored-by: Tibor Csögör <tibi@tiborius.net> Co-authored-by: Shashwat Anand <me@shashwat.me> Signed-off-by: Holden Karau <holden@pigscanfly.ca>		2019-05-06 10:00:49 -07:00
..
avro	[SPARK-26856][PYSPARK][FOLLOWUP] Fix UT failure due to wrong patterns for Kinesis assembly	2019-04-02 14:52:56 +09:00
tests	[SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows	2019-05-06 10:00:49 -07:00
__init__.py	[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark	2017-11-02 15:22:52 +01:00
catalog.py	[SPARK-24665][PYSPARK][FOLLOWUP] Use SQLConf in PySpark to manage all sql configs	2018-08-17 10:18:08 +08:00
column.py	[SPARK-23847][PYTHON][SQL] Add asc_nulls_first, asc_nulls_last to PySpark	2018-04-08 12:09:06 +08:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
dataframe.py	[SPARK-27276][PYTHON][SQL] Increase minimum version of pyarrow to 0.12.1 and remove prior workarounds	2019-04-22 19:30:31 +09:00
functions.py	[SPARK-23619][DOCS] Add output description for some generator expressions / functions	2019-04-27 10:30:12 +09:00
group.py	[SPARK-24722][SQL] pivot() with Column type argument	2018-08-04 14:17:32 +08:00
readwriter.py	[MINOR][DOC][SQL] Remove out-of-date doc about ORC in DataFrameReader and Writer	2019-04-03 09:11:09 -07:00
session.py	[SPARK-27276][PYTHON][SQL] Increase minimum version of pyarrow to 0.12.1 and remove prior workarounds	2019-04-22 19:30:31 +09:00
streaming.py	[SPARK-23014][SS] Fully remove V1 memory sink.	2019-04-29 09:44:23 -07:00
types.py	[SPARK-23299][SQL][PYSPARK] Fix __repr__ behaviour for Rows	2019-05-06 10:00:49 -07:00
udf.py	[SPARK-23836][PYTHON] Add support for StructType return in Scalar Pandas UDF	2019-03-07 08:52:24 -08:00
utils.py	[SPARK-23014][SS] Fully remove V1 memory sink.	2019-04-29 09:44:23 -07:00
window.py	[SPARK-26860][PYSPARK][SPARKR] Fix for RangeBetween and RowsBetween docs to be in sync with spark documentation	2019-03-11 08:53:09 -05:00