[SPARK-26224][SQL][PYTHON][R][FOLLOW-UP] Add notes about many projects in withColumn at SparkR and PySpark as well

## What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/23285. This PR adds the notes into PySpark and SparkR documentation as well.

While I am here, I revised the doc a bit to make it sound a bit more neutral

## How was this patch tested?

Manually built the doc and verified.

Closes #24272 from HyukjinKwon/SPARK-26224.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This commit is contained in:
Hyukjin Kwon 2019-04-03 08:30:24 +09:00
parent 949d712839
commit d7dd59a6b4
3 changed files with 14 additions and 4 deletions

View file

@ -2143,6 +2143,11 @@ setMethod("selectExpr",
#' Return a new SparkDataFrame by adding a column or replacing the existing column
#' that has the same name.
#'
#' Note: This method introduces a projection internally. Therefore, calling it multiple times,
#' for instance, via loops in order to add multiple columns can generate big plans which
#' can cause performance issues and even \code{StackOverflowException}. To avoid this,
#' use \code{select} with the multiple columns at once.
#'
#' @param x a SparkDataFrame.
#' @param colName a column name.
#' @param col a Column expression (which must refer only to this SparkDataFrame), or an atomic

View file

@ -1974,6 +1974,11 @@ class DataFrame(object):
:param colName: string, name of the new column.
:param col: a :class:`Column` expression for the new column.
.. note:: This method introduces a projection internally. Therefore, calling it multiple
times, for instance, via loops in order to add multiple columns can generate big
plans which can cause performance issues and even `StackOverflowException`.
To avoid this, use :func:`select` with the multiple columns at once.
>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name=u'Alice', age2=4), Row(age=5, name=u'Bob', age2=7)]

View file

@ -2151,10 +2151,10 @@ class Dataset[T] private[sql](
* `column`'s expression must only refer to attributes supplied by this Dataset. It is an
* error to add a column that refers to some other Dataset.
*
* Please notice that this method introduces a `Project`. This means that using it in loops in
* order to add several columns can generate very big plans which can cause huge performance
* issues and even `StackOverflowException`s. A much better alternative use `select` with the
* list of columns to add.
* @note this method introduces a projection internally. Therefore, calling it multiple times,
* for instance, via loops in order to add multiple columns can generate big plans which
* can cause performance issues and even `StackOverflowException`. To avoid this,
* use `select` with the multiple columns at once.
*
* @group untypedrel
* @since 2.0.0