[SPARK-31525][SQL] Return an empty list for df.head() when df is empty
### What changes were proposed in this pull request? return an empty list instead of None when calling `df.head()` ### Why are the changes needed? `df.head()` and `df.head(1)` are inconsistent when df is empty. ### Does this PR introduce _any_ user-facing change? Yes. If a user relies on `df.head()` to return None, things like `if df.head() is None:` will be broken. ### How was this patch tested? Closes #29214 from tianshizz/SPARK-31525. Authored-by: Tianshi Zhu <zhutianshirea@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
77f2ca6cce
commit
44a5258ac2
|
@ -26,6 +26,9 @@ Note that this migration guide describes the items specific to PySpark.
|
|||
Many items of SQL migration can be applied when migrating PySpark to higher versions.
|
||||
Please refer [Migration Guide: SQL, Datasets and DataFrame](sql-migration-guide.html).
|
||||
|
||||
## Upgrading from PySpark 3.0 to 3.1
|
||||
- In Spark 3.1, PySpark `DataFrame.head()` will return `[]` if the PySpark DataFrame is empty. In Spark 3.0 or prior, it will return `None`. The bahavior remains the same for non-empty PySpark DataFrame.
|
||||
|
||||
## Upgrading from PySpark 2.4 to 3.0
|
||||
- In Spark 3.0, PySpark requires a pandas version of 0.23.2 or higher to use pandas related functionality, such as `toPandas`, `createDataFrame` from pandas DataFrame, and so on.
|
||||
|
||||
|
|
|
@ -1323,7 +1323,8 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
|
|||
|
||||
:param n: int, default 1. Number of rows to return.
|
||||
:return: If n is greater than 1, return a list of :class:`Row`.
|
||||
If n is 1, return a single Row.
|
||||
If n is 1, return a single Row if it exists. Otherwise, we will return an
|
||||
empty list to match the behavior of `head(1)` when the dataframe is empty.
|
||||
|
||||
>>> df.head()
|
||||
Row(age=2, name='Alice')
|
||||
|
@ -1332,7 +1333,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
|
|||
"""
|
||||
if n is None:
|
||||
rs = self.head(1)
|
||||
return rs[0] if rs else None
|
||||
return rs[0] if rs else []
|
||||
return self.take(n)
|
||||
|
||||
@since(1.3)
|
||||
|
|
Loading…
Reference in a new issue