All data sources show up as "PhysicalRDD" in physical plan explain. It'd be better if we can show the name of the data source.
Without this patch:
```
== Physical Plan ==
NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Final,isDistinct=false))
Exchange hashpartitioning(date#0,cat#1)
NewAggregate with UnsafeHybridAggregationIterator ArrayBuffer(date#0, cat#1) ArrayBuffer((sum(CAST((CAST(count#2, IntegerType) + 1), LongType))2,mode=Partial,isDistinct=false))
PhysicalRDD [date#0,cat#1,count#2], MapPartitionsRDD[3] at
```
With this patch:
```
== Physical Plan ==
TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Final,isDistinct=false)]
Exchange hashpartitioning(date#0,cat#1)
TungstenAggregate(key=[date#0,cat#1], value=[(sum(CAST((CAST(count#2, IntegerType) + 1), LongType)),mode=Partial,isDistinct=false)]
ConvertToUnsafe
Scan ParquetRelation[file:/scratch/rxin/spark/sales4][date#0,cat#1,count#2]
```
Author: Reynold Xin <rxin@databricks.com>
Closes#8024 from rxin/SPARK-9733 and squashes the following commits:
811b90e [Reynold Xin] Fixed Python test case.
52cab77 [Reynold Xin] Cast.
eea9ccc [Reynold Xin] Fix test case.
fcecb22 [Reynold Xin] [SPARK-9733][SQL] Improve explain message for data source scan node.
This was previously committed but then reverted due to test failures (see #6769).
Author: Xiangrui Meng <meng@databricks.com>
Closes#7755 from rxin/SPARK-7157 and squashes the following commits:
fbf9044 [Xiangrui Meng] fix python test
542bd37 [Xiangrui Meng] update test
604fe6d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
f051afd [Xiangrui Meng] use udf instead of building expression
f4e9425 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
8fb990b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7157
103beb3 [Xiangrui Meng] add Java-friendly sampleBy
991f26f [Xiangrui Meng] fix seed
4a14834 [Xiangrui Meng] move sampleBy to stat
832f7cc [Xiangrui Meng] add sampleBy to DataFrame
We forgot to update doc. brkyvz
Author: Xiangrui Meng <meng@databricks.com>
Closes#7608 from mengxr/SPARK-9243 and squashes the following commits:
0ea3236 [Xiangrui Meng] null -> zero in crosstab doc
This PR fix the long standing issue of serialization between Python RDD and DataFrame, it change to using a customized Pickler for InternalRow to enable customized unpickling (type conversion, especially for UDT), now we can support UDT for UDF, cc mengxr .
There is no generated `Row` anymore.
Author: Davies Liu <davies@databricks.com>
Closes#7301 from davies/sql_ser and squashes the following commits:
81bef71 [Davies Liu] address comments
e9217bd [Davies Liu] add regression tests
db34167 [Davies Liu] Refactor of serialization for Python DataFrame
Our current BinaryExpression abstract class is not for generic binary expressions, i.e. it requires left/right children to have the same type. However, due to its name, contributors build new binary expressions that don't have that assumption (e.g. Sha) and still extend BinaryExpression.
This patch creates a new BinaryOperator abstract class, and update the analyzer o only apply type casting rule there. This patch also adds the notion of "prettyName" to expressions, which defines the user-facing name for the expression.
Author: Reynold Xin <rxin@databricks.com>
Closes#7174 from rxin/binary-opterator and squashes the following commits:
f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into binary-opterator
d8518cf [Reynold Xin] Updated Python tests.
Use UTF-8 to encode the name of column in Python 2, or it may failed to encode with default encoding ('ascii').
This PR also fix a bug when there is Java exception without error message.
Author: Davies Liu <davies@databricks.com>
Closes#7165 from davies/non_ascii and squashes the following commits:
02cb61a [Davies Liu] fix tests
3b09d31 [Davies Liu] add encoding in header
867754a [Davies Liu] support non-ascii character in column names
Sometimes the user may want to show the complete content of cells. Now `sql("set -v").show()` displays:
![screen shot 2015-06-18 at 4 34 51 pm](https://cloud.githubusercontent.com/assets/1000778/8227339/14d3c5ea-15d9-11e5-99b9-f00b7e93beef.png)
The user needs to use something like `sql("set -v").collect().foreach(r => r.toSeq.mkString("\t"))` to show the complete content.
This PR adds a `pretty` parameter to show. If `pretty` is false, `show` won't truncate strings or align cells right.
![screen shot 2015-06-18 at 4 21 44 pm](https://cloud.githubusercontent.com/assets/1000778/8227407/b6f8dcac-15d9-11e5-8219-8079280d76fc.png)
Author: zsxwing <zsxwing@gmail.com>
Closes#6877 from zsxwing/show and squashes the following commits:
22e28e9 [zsxwing] pretty -> truncate
e582628 [zsxwing] Add pretty parameter to the show method in R
a3cd55b [zsxwing] Fix calling showString in R
923cee4 [zsxwing] Add a "pretty" parameter to show to display long strings
Add `sampleBy` to DataFrame. rxin
Author: Xiangrui Meng <meng@databricks.com>
Closes#6769 from mengxr/SPARK-7157 and squashes the following commits:
991f26f [Xiangrui Meng] fix seed
4a14834 [Xiangrui Meng] move sampleBy to stat
832f7cc [Xiangrui Meng] add sampleBy to DataFrame
MatrixUDT was recently coded in scala. This has been ported to PySpark
Author: MechCoder <manojkumarsivaraj334@gmail.com>
Closes#6354 from MechCoder/spark-6390 and squashes the following commits:
fc4dc1e [MechCoder] Better error message
c940a44 [MechCoder] Added test
aa9c391 [MechCoder] Add pyUDT to MatrixUDT
62a2a7d [MechCoder] [SPARK-6390] Port MatrixUDT to PySpark
This patch switches to using FunctionRegistry for built-in expressions. It is based on #6463, but with some work to simplify it along with unit tests.
TODOs for future pull requests:
- Use static registration so we don't need to register all functions every time we start a new SQLContext
- Switch to using this in HiveContext
Author: Reynold Xin <rxin@databricks.com>
Author: Santiago M. Mola <santi@mola.io>
Closes#6710 from rxin/udf-registry and squashes the following commits:
6930822 [Reynold Xin] Fixed Python test.
b802c9a [Reynold Xin] Made UDF case insensitive.
e60d815 [Reynold Xin] Made UDF case insensitive.
852f9c0 [Reynold Xin] Fixed style violation.
e76a3c1 [Reynold Xin] Fixed parser.
52ddaba [Reynold Xin] Fixed compilation.
ee7854f [Reynold Xin] Improved error reporting.
ff906f2 [Reynold Xin] More robust constructor calling.
77b46f1 [Reynold Xin] Simplified the code.
2a2a149 [Reynold Xin] Merge pull request #6463 from smola/SPARK-7886
8616924 [Santiago M. Mola] [SPARK-7886] Add built-in expressions to FunctionRegistry.
Author: Reynold Xin <rxin@databricks.com>
Closes#6688 from rxin/df-alias-replace and squashes the following commits:
774c19c [Reynold Xin] [SPARK-8146] DataFrame Python API: Alias replace in DataFrameNaFunctions.
This is a minor change.
Author: amey <amey@skytree.net>
Closes#6655 from ameyc/JIRA-7991/support-passing-list-to-describe and squashes the following commits:
e8a1dff [amey] Adding support for passing lists to describe.
Added a `DataFrame.drop` function that accepts a `Column` reference rather than a `String`, and added associated unit tests. Basically iterates through the `DataFrame` to find a column with an expression that is equivalent to that of the `Column` argument supplied to the function.
Author: Mike Dusenberry <dusenberrymw@gmail.com>
Closes#6585 from dusenberrymw/SPARK-7969_Drop_method_on_Dataframes_should_handle_Column and squashes the following commits:
514727a [Mike Dusenberry] Updating the @since tag of the drop(Column) function doc to reflect version 1.4.1 instead of 1.4.0.
2f1bb4e [Mike Dusenberry] Adding an additional assert statement to the 'drop column after join' unit test in order to make sure the correct column was indeed left over.
6bf7c0e [Mike Dusenberry] Minor code formatting change.
e583888 [Mike Dusenberry] Adding more Python doctests for the df.drop with column reference function to test joined datasets that have columns with the same name.
5f74401 [Mike Dusenberry] Updating DataFrame.drop with column reference function to use logicalPlan.output to prevent ambiguities resulting from columns with the same name. Also added associated unit tests for joined datasets with duplicate column names.
4b8bbe8 [Mike Dusenberry] Adding Python support for Dataframe.drop with a Column reference.
986129c [Mike Dusenberry] Added a DataFrame.drop function that accepts a Column reference rather than a String, and added associated unit tests. Basically iterates through the DataFrame to find a column with an expression that is equivalent to one supplied to the function.
Author: Reynold Xin <rxin@databricks.com>
Closes#6569 from rxin/freqItemsWarning and squashes the following commits:
7eec145 [Reynold Xin] [minor doc] Add exploratory data analysis warning for DataFrame.stat.freqItem API.
Add tests later.
Author: Davies Liu <davies@databricks.com>
Closes#6375 from davies/insertInto and squashes the following commits:
826423e [Davies Liu] add insertInto() to Writer
Author: Davies Liu <davies@databricks.com>
Closes#6311 from davies/rollup and squashes the following commits:
0261db1 [Davies Liu] use @since
a51ca6b [Davies Liu] Merge branch 'master' of github.com:apache/spark into rollup
8ad5af4 [Davies Liu] Update dataframe.py
ade3841 [Davies Liu] add DataFrame.rollup/cube in Python
Add version info for public Python SQL API.
cc rxin
Author: Davies Liu <davies@databricks.com>
Closes#6295 from davies/versions and squashes the following commits:
cfd91e6 [Davies Liu] add more version for DataFrame API
600834d [Davies Liu] add version to SQL API docs
cc rxin, please take a quick look, I'm working on tests.
Author: Davies Liu <davies@databricks.com>
Closes#6238 from davies/readwrite and squashes the following commits:
c7200eb [Davies Liu] update tests
9cbf01b [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
f0c5a04 [Davies Liu] use sqlContext.read.load
5f68bc8 [Davies Liu] update tests
6437e9a [Davies Liu] Merge branch 'master' of github.com:apache/spark into readwrite
bcc6668 [Davies Liu] add reader amd writer API in Python
Fixed the following warnings in `make clean html` under `python/docs`:
~~~
/Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:3: ERROR: Unexpected indentation.
/Users/meng/src/spark/python/pyspark/mllib/evaluation.py:docstring of pyspark.mllib.evaluation.RankingMetrics.ndcgAt:4: WARNING: Block quote ends without a blank line; unexpected unindent.
/Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:3: ERROR: Unexpected indentation.
/Users/meng/src/spark/python/pyspark/mllib/fpm.py:docstring of pyspark.mllib.fpm.FPGrowth.train:4: WARNING: Block quote ends without a blank line; unexpected unindent.
/Users/meng/src/spark/python/pyspark/sql/__init__.py:docstring of pyspark.sql.DataFrame.replace:16: WARNING: Field list ends without a blank line; unexpected unindent.
/Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:8: ERROR: Unexpected indentation.
/Users/meng/src/spark/python/pyspark/streaming/kafka.py:docstring of pyspark.streaming.kafka.KafkaUtils.createRDD:9: WARNING: Block quote ends without a blank line; unexpected unindent.
~~~
davies
Author: Xiangrui Meng <meng@databricks.com>
Closes#6221 from mengxr/SPARK-6657 and squashes the following commits:
e3f83fe [Xiangrui Meng] fix sql and streaming doc warnings
2b4371e [Xiangrui Meng] fix mllib python doc warnings
dataframe.py is splited into column.py, group.py and dataframe.py:
```
360 column.py
1223 dataframe.py
183 group.py
```
Author: Davies Liu <davies@databricks.com>
Closes#6201 from davies/split_df and squashes the following commits:
fc8f5ab [Davies Liu] split dataframe.py into multiple files
Add an `explode` function for dataframes and modify the analyzer so that single table generating functions can be present in a select clause along with other expressions. There are currently the following restrictions:
- only top level TGFs are allowed (i.e. no `select(explode('list) + 1)`)
- only one may be present in a single select to avoid potentially confusing implicit Cartesian products.
TODO:
- [ ] Python
Author: Michael Armbrust <michael@databricks.com>
Closes#6107 from marmbrus/explodeFunction and squashes the following commits:
7ee2c87 [Michael Armbrust] whitespace
6f80ba3 [Michael Armbrust] Update dataframe.py
c176c89 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
81b5da3 [Michael Armbrust] style
d3faa05 [Michael Armbrust] fix self join case
f9e1e3e [Michael Armbrust] fix python, add since
4f0d0a9 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into explodeFunction
e710fe4 [Michael Armbrust] add java and python
52ca0dc [Michael Armbrust] [SPARK-7548][SQL] Add explode function for dataframes.
This builds on https://github.com/apache/spark/pull/5932 and should close https://github.com/apache/spark/pull/5932 as well.
As an example:
```python
df.select(when(df['age'] == 2, 3).otherwise(4).alias("age")).collect()
```
Author: Reynold Xin <rxin@databricks.com>
Author: kaka1992 <kaka_1992@163.com>
Closes#6072 from rxin/when-expr and squashes the following commits:
8f49201 [Reynold Xin] Throw exception if otherwise is applied twice.
0455eda [Reynold Xin] Reset run-tests.
bfb9d9f [Reynold Xin] Updated documentation and test cases.
762f6a5 [Reynold Xin] Merge pull request #5932 from kaka1992/IFCASE
95724c6 [kaka1992] Update
8218d0a [kaka1992] Update
801009e [kaka1992] Update
76d6346 [kaka1992] [SPARK-7321][SQL] Add Column expression for conditional statements (if, case)
Author: Daoyuan Wang <daoyuan.wang@intel.com>
Closes#6003 from adrian-wang/pynareplace and squashes the following commits:
672efba [Daoyuan Wang] remove py2.7 feature
4a148f7 [Daoyuan Wang] to_replace support dict, value support single value, and add full tests
9e232e7 [Daoyuan Wang] rename scala map
af0268a [Daoyuan Wang] remove na
63ac579 [Daoyuan Wang] add na.replace in pyspark
Author: Reynold Xin <rxin@databricks.com>
Closes#6068 from rxin/drop-column and squashes the following commits:
9d7d5ec [Reynold Xin] [SPARK-7509][SQL] DataFrame.drop in Python for dropping columns.
This should also close https://github.com/apache/spark/pull/5870
Author: Reynold Xin <rxin@databricks.com>
Closes#6066 from rxin/dropDups and squashes the following commits:
130692f [Reynold Xin] [SPARK-7324][SQL] DataFrame.dropDuplicates
Updated Java, Scala, Python, and R.
Author: Reynold Xin <rxin@databricks.com>
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes#5996 from rxin/groupby-retain and squashes the following commits:
aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain
f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain
5f923c0 [Reynold Xin] Merge pull request #15 from shivaram/sparkr-groupby-retrain
c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit 9a6be746ef
b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite.
d910141 [Reynold Xin] Updated rest of the files
1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate
It's the first step: generalize UnresolvedGetField to support all map, struct, and array
TODO: add `apply` in Scala and `__getitem__` in Python, and unify the `getItem` and `getField` methods to one single API(or should we keep them for compatibility?).
Author: Wenchen Fan <cloud0fan@outlook.com>
Closes#5744 from cloud-fan/generalize and squashes the following commits:
715c589 [Wenchen Fan] address comments
7ea5b31 [Wenchen Fan] fix python test
4f0833a [Wenchen Fan] add python test
f515d69 [Wenchen Fan] add apply method and test cases
8df6199 [Wenchen Fan] fix python test
239730c [Wenchen Fan] fix test compile
2a70526 [Wenchen Fan] use _bin_op in dataframe.py
6bf72bc [Wenchen Fan] address comments
3f880c3 [Wenchen Fan] add java doc
ab35ab5 [Wenchen Fan] fix python test
b5961a9 [Wenchen Fan] fix style
c9d85f5 [Wenchen Fan] generalize UnresolvedGetField to support all map, struct, and array
Author: Shiti <ssaxena.ece@gmail.com>
Closes#5867 from Shiti/spark-7295 and squashes the following commits:
71a9913 [Shiti] implementation for bitwise and,or, not and xor on Column with tests and docs
Reduced take size from 1e8 to 1e6.
cc rxin
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5900 from brkyvz/df-cont-followup and squashes the following commits:
c11e762 [Burak Yavuz] fix grammar
b30ace2 [Burak Yavuz] address comments
a417ba5 [Burak Yavuz] [SPARK-7243][SQL] Reduce size for Contingency Tables in DataFrames
Computes a pair-wise frequency table of the given columns. Also known as cross-tabulation.
cc mengxr rxin
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5842 from brkyvz/df-cont and squashes the following commits:
a07c01e [Burak Yavuz] addressed comments v4.1
ae9e01d [Burak Yavuz] fix test
9106585 [Burak Yavuz] addressed comments v4.0
bced829 [Burak Yavuz] fix merge conflicts
a63ad00 [Burak Yavuz] addressed comments v3.0
a0cad97 [Burak Yavuz] addressed comments v3.0
6805df8 [Burak Yavuz] addressed comments and fixed test
939b7c4 [Burak Yavuz] lint python
7f098bc [Burak Yavuz] add crosstab pyTest
fd53b00 [Burak Yavuz] added python support for crosstab
27a5a81 [Burak Yavuz] implemented crosstab
Author: 云峤 <chensong.cs@alibaba-inc.com>
Closes#5865 from kaka1992/df.show and squashes the following commits:
c79204b [云峤] Update
a1338f6 [云峤] Update python dataFrame show test and add empty df unit test.
734369c [云峤] Update python dataFrame show test and add empty df unit test.
84aec3e [云峤] Update python dataFrame show test and add empty df unit test.
159b3d5 [云峤] update
03ef434 [云峤] update
7394fd5 [云峤] update test show
ced487a [云峤] update pep8
b6e690b [云峤] Merge remote-tracking branch 'upstream/master' into df.show
30ac311 [云峤] [SPARK-7294] ADD BETWEEN
7d62368 [云峤] [SPARK-7294] ADD BETWEEN
baf839b [云峤] [SPARK-7294] ADD BETWEEN
d11d5b9 [云峤] [SPARK-7294] ADD BETWEEN
submitting this PR from a phone, excuse the brevity.
adds Pearson correlation to Dataframes, reusing the covariance calculation code
cc mengxr rxin
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5858 from brkyvz/df-corr and squashes the following commits:
285b838 [Burak Yavuz] addressed comments v2.0
d10babb [Burak Yavuz] addressed comments v0.2
4b74b24 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into df-corr
4fe693b [Burak Yavuz] addressed comments v0.1
a682d06 [Burak Yavuz] ready for PR
Author: Dean Chen <deanchen5@gmail.com>
Closes#5866 from deanchen/patch-1 and squashes the following commits:
0028bc4 [Dean Chen] Fix typo in Dataframes.py introduced in [SPARK-3444]
The python api for DataFrame's plus addressed your comments from previous PR.
rxin
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5859 from brkyvz/df-freq-py2 and squashes the following commits:
f9aa9ce [Burak Yavuz] addressed comments v0.1
4b25056 [Burak Yavuz] added python api for freqItems
Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from pwendel & davies
Author: Holden Karau <holden@pigscanfly.ca>
Closes#5791 from holdenk/SPARK-3444-provide-an-easy-way-to-change-log-level-r2 and squashes the following commits:
3bf3be9 [Holden Karau] fix exception
42ba873 [Holden Karau] fix exception
9117244 [Holden Karau] Only allow valid log levels, throw exception if invalid log level.
338d7bf [Holden Karau] rename setLoggingLevel to setLogLevel
fac14a0 [Holden Karau] Fix style errors
d9d03f3 [Holden Karau] Add support for changing the log level at run time through the SparkContext. Based on an earlier PR, #2433 includes CR feedback from @pwendel & @davies
small fixes regarding comments in PR #5761
cc rxin
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5795 from brkyvz/split-followup and squashes the following commits:
369c522 [Burak Yavuz] changed wording a little
1ea456f [Burak Yavuz] Addressed follow up comments
This is built on top of kaka1992 's PR #5711 using Logical plans.
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#5761 from brkyvz/random-sample and squashes the following commits:
a1fb0aa [Burak Yavuz] remove unrelated file
69669c3 [Burak Yavuz] fix broken test
1ddb3da [Burak Yavuz] copy base
6000328 [Burak Yavuz] added python api and fixed test
3c11d1b [Burak Yavuz] fixed broken test
f400ade [Burak Yavuz] fix build errors
2384266 [Burak Yavuz] addressed comments v0.1
e98ebac [Burak Yavuz] [SPARK-7156][SQL] support RandomSplit in DataFrames
I believe column access via `__getattr__` is bad and shouldn't be implicitly encouraged by the error message when accessing a non-existing attribute on DataFrame. This patch changes the error message from 'no such column' to the more generic 'no such attribute', which is also what Pandas DFs will throw.
Author: ksonj <kson@siberie.de>
Closes#5771 from ksonj/master and squashes the following commits:
bcc2220 [ksonj] Better error message on access to non-existing attribute
This pr tries to provide a way to let python users workaround https://issues.apache.org/jira/browse/SPARK-6231.
Author: Yin Huai <yhuai@databricks.com>
Closes#5634 from yhuai/pythonDFAlias and squashes the following commits:
8465acd [Yin Huai] Add an alias to a Python DF.
Author: Reynold Xin <rxin@databricks.com>
Closes#5638 from rxin/joinUsing and squashes the following commits:
13e9cc9 [Reynold Xin] Code review + Python.
b1bd914 [Reynold Xin] [SPARK-7059][SQL] Create a DataFrame join API to facilitate equijoin and self join.
This PR enable auto_convert in JavaGateway, then we could register a converter for a given types, for example, date and datetime.
There are two bugs related to auto_convert, see [1] and [2], we workaround it in this PR.
[1] https://github.com/bartdag/py4j/issues/160
[2] https://github.com/bartdag/py4j/issues/161
cc rxin JoshRosen
Author: Davies Liu <davies@databricks.com>
Closes#5570 from davies/py4j_date and squashes the following commits:
eb4fa53 [Davies Liu] fix tests in python 3
d17d634 [Davies Liu] rollback changes in mllib
2e7566d [Davies Liu] convert tuple into ArrayList
ceb3779 [Davies Liu] Update rdd.py
3c373f3 [Davies Liu] support date and datetime by auto_convert
cb094ff [Davies Liu] enable auto convert
Author: Elisey Zanko <elisey.zanko@gmail.com>
Closes#5361 from 31z4/spark-6661 and squashes the following commits:
73c5d79 [Elisey Zanko] Python type errors should print type, not object
As a follow up PR to #5544.
cc davies
Author: Reynold Xin <rxin@databricks.com>
Closes#5558 from rxin/sort-doc-improvement and squashes the following commits:
f4c276f [Reynold Xin] Review feedback.
d2dcf24 [Reynold Xin] Minor fix to SPARK-6958: Improve Python docstring for DataFrame.sort.