[DOC] improve python doc for rdd.histogram and dataframe.join

## What changes were proposed in this pull request?

doc change only

## How was this patch tested?

doc change only

Author: Mortada Mehyar <mortada.mehyar@gmail.com>

Closes #14253 from mortada/histogram_typos.
This commit is contained in:
Mortada Mehyar 2016-07-18 23:49:47 -07:00 committed by Reynold Xin
parent 1426a08052
commit 6ee40d2cc5
2 changed files with 14 additions and 14 deletions

View file

@ -1027,20 +1027,20 @@ class RDD(object):
If your histogram is evenly spaced (e.g. [0, 10, 20, 30]), If your histogram is evenly spaced (e.g. [0, 10, 20, 30]),
this can be switched from an O(log n) inseration to O(1) per this can be switched from an O(log n) inseration to O(1) per
element(where n = # buckets). element (where n is the number of buckets).
Buckets must be sorted and not contain any duplicates, must be Buckets must be sorted, not contain any duplicates, and have
at least two elements. at least two elements.
If `buckets` is a number, it will generates buckets which are If `buckets` is a number, it will generate buckets which are
evenly spaced between the minimum and maximum of the RDD. For evenly spaced between the minimum and maximum of the RDD. For
example, if the min value is 0 and the max is 100, given buckets example, if the min value is 0 and the max is 100, given `buckets`
as 2, the resulting buckets will be [0,50) [50,100]. buckets must as 2, the resulting buckets will be [0,50) [50,100]. `buckets` must
be at least 1 If the RDD contains infinity, NaN throws an exception be at least 1. An exception is raised if the RDD contains infinity.
If the elements in RDD do not vary (max == min) always returns If the elements in the RDD do not vary (max == min), a single bucket
a single bucket. will be used.
It will return a tuple of buckets and histogram. The return value is a tuple of buckets and histogram.
>>> rdd = sc.parallelize(range(51)) >>> rdd = sc.parallelize(range(51))
>>> rdd.histogram(2) >>> rdd.histogram(2)

View file

@ -613,16 +613,16 @@ class DataFrame(object):
def join(self, other, on=None, how=None): def join(self, other, on=None, how=None):
"""Joins with another :class:`DataFrame`, using the given join expression. """Joins with another :class:`DataFrame`, using the given join expression.
The following performs a full outer join between ``df1`` and ``df2``.
:param other: Right side of the join :param other: Right side of the join
:param on: a string for join column name, a list of column names, :param on: a string for the join column name, a list of column names,
, a join expression (Column) or a list of Columns. a join expression (Column), or a list of Columns.
If `on` is a string or a list of string indicating the name of the join column(s), If `on` is a string or a list of strings indicating the name of the join column(s),
the column(s) must exist on both sides, and this performs an equi-join. the column(s) must exist on both sides, and this performs an equi-join.
:param how: str, default 'inner'. :param how: str, default 'inner'.
One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`. One of `inner`, `outer`, `left_outer`, `right_outer`, `leftsemi`.
The following performs a full outer join between ``df1`` and ``df2``.
>>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect() >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height).collect()
[Row(name=None, height=80), Row(name=u'Bob', height=85), Row(name=u'Alice', height=None)] [Row(name=None, height=80), Row(name=u'Bob', height=85), Row(name=u'Alice', height=None)]