[SPARK-31525][SQL] Return an empty list for df.head() when df is empty

### What changes were proposed in this pull request?

return an empty list instead of None when calling `df.head()`

### Why are the changes needed?

`df.head()` and `df.head(1)` are inconsistent when df is empty.

### Does this PR introduce _any_ user-facing change?

Yes. If a user relies on `df.head()` to return None, things like `if df.head() is None:` will be broken.

### How was this patch tested?

Closes #29214 from tianshizz/SPARK-31525.

Authored-by: Tianshi Zhu <zhutianshirea@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
Tianshi Zhu 2020-07-28 12:32:19 +09:00 committed by HyukjinKwon
parent 77f2ca6cce
commit 44a5258ac2
2 changed files with 6 additions and 2 deletions

View file

@ -26,6 +26,9 @@ Note that this migration guide describes the items specific to PySpark.
Many items of SQL migration can be applied when migrating PySpark to higher versions.
Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.html).
## Upgrading from PySpark 3.0 to 3.1
- In Spark 3.1, PySpark `DataFrame.head()` will return `[]` if the PySpark DataFrame is empty. In Spark 3.0 or prior, it will return `None`. The bahavior remains the same for non-empty PySpark DataFrame.
## Upgrading from PySpark 2.4 to 3.0
- In Spark 3.0, PySpark requires a pandas version of 0.23.2 or higher to use pandas related functionality, such as `toPandas`, `createDataFrame` from pandas DataFrame, and so on.

View file

@ -1323,7 +1323,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
:param n: int, default 1. Number of rows to return.
:return: If n is greater than 1, return a list of :class:`Row`.
If n is 1, return a single Row.
If n is 1, return a single Row if it exists. Otherwise, we will return an
empty list to match the behavior of `head(1)` when the dataframe is empty.
>>> df.head()
Row(age=2, name='Alice')
@ -1332,7 +1333,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
"""
if n is None:
rs = self.head(1)
return rs[0] if rs else None
return rs[0] if rs else []
return self.take(n)
@since(1.3)