[DOC] improve python doc for rdd.histogram and dataframe.join
## What changes were proposed in this pull request? doc change only ## How was this patch tested? doc change only Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #14253 from mortada/histogram_typos.
This commit is contained in:
parent
1426a08052
commit
6ee40d2cc5
|
@ -1027,20 +1027,20 @@ class RDD(object):
|
|||
|
||||
If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
|
||||
this can be switched from an O(log n) inseration to O(1) per
|
||||
element(where n = # buckets).
|
||||
element (where n is the number of buckets).
|
||||
|
||||
Buckets must be sorted and not contain any duplicates, must be
|
||||
Buckets must be sorted, not contain any duplicates, and have
|
||||
at least two elements.
|
||||
|
||||
If `buckets` is a number, it will generates buckets which are
|
||||
If `buckets` is a number, it will generate buckets which are
|
||||
evenly spaced between the minimum and maximum of the RDD. For
|
||||
example, if the min value is 0 and the max is 100, given buckets
|
||||
as 2, the resulting buckets will be [0,50) [50,100]. buckets must
|
||||
be at least 1 If the RDD contains infinity, NaN throws an exception
|
||||
If the elements in RDD do not vary (max == min) always returns
|
||||
a single bucket.
|
||||
example, if the min value is 0 and the max is 100, given `buckets`
|
||||
as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
|
||||
be at least 1. An exception is raised if the RDD contains infinity.
|
||||
If the elements in the RDD do not vary (max == min), a single bucket
|
||||
will be used.
|
||||
|
||||
It will return a tuple of buckets and histogram.
|
||||
The return value is a tuple of buckets and histogram.
|
||||
|
||||
>>> rdd = sc.parallelize(range(51))
|
||||
>>> rdd.histogram(2)
|
||||
|
|
|
@ -613,16 +613,16 @@ class DataFrame(object):
|
|||
def join(self, other, on=None, how=None):
|
||||
"""Joins with another :class:`DataFrame`, using the given join expression.
|
||||
|
||||
The following performs a full outer join between ``df1`` and ``df2``.
|
||||
|
||||
:param other: Right side of the join
|
||||
:param on: a string for join column name, a list of column names,
|
||||
, a join expression (Column) or a list of Columns.
|
||||
If `on` is a string or a list of string indicating the name of the join column(s),
|
||||
:param on: a string for the join column name, a list of column names,
|
||||
a join expression (Column), or a list of Columns.
|
||||
If `on` is a string or a list of strings indicating the name of the join column(s),
|
||||
the column(s) must exist on both sides, and this performs an equi-join.
|
||||
:param how: str, default 'inner'.
|
||||
One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
|
||||
|
||||
The following performs a full outer join between ``df1`` and ``df2``.
|
||||
|
||||
>>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect()
|
||||
[Row(name=None, height=80), Row(name=u'Bob', height=85), Row(name=u'Alice', height=None)]
|
||||
|
||||
|
|
Loading…
Reference in a new issue