[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold
Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
This commit is contained in:
parent
c776fce99b
commit
fb7e21797e
|
@ -305,11 +305,11 @@ setMethod("groupByKey",
|
|||
#' Merge values by key
|
||||
#'
|
||||
#' This function operates on RDDs where every element is of the form list(K, V) or c(K, V).
|
||||
#' and merges the values for each key using an associative reduce function.
|
||||
#' and merges the values for each key using an associative and commutative reduce function.
|
||||
#'
|
||||
#' @param x The RDD to reduce by key. Should be an RDD where each element is
|
||||
#' list(K, V) or c(K, V).
|
||||
#' @param combineFunc The associative reduce function to use.
|
||||
#' @param combineFunc The associative and commutative reduce function to use.
|
||||
#' @param numPartitions Number of partitions to create.
|
||||
#' @return An RDD where each element is list(K, V') where V' is the merged
|
||||
#' value
|
||||
|
@ -347,12 +347,12 @@ setMethod("reduceByKey",
|
|||
#' Merge values by key locally
|
||||
#'
|
||||
#' This function operates on RDDs where every element is of the form list(K, V) or c(K, V).
|
||||
#' and merges the values for each key using an associative reduce function, but return the
|
||||
#' results immediately to the driver as an R list.
|
||||
#' and merges the values for each key using an associative and commutative reduce function, but
|
||||
#' return the results immediately to the driver as an R list.
|
||||
#'
|
||||
#' @param x The RDD to reduce by key. Should be an RDD where each element is
|
||||
#' list(K, V) or c(K, V).
|
||||
#' @param combineFunc The associative reduce function to use.
|
||||
#' @param combineFunc The associative and commutative reduce function to use.
|
||||
#' @return A list of elements of type list(K, V') where V' is the merged value for each key
|
||||
#' @seealso reduceByKey
|
||||
#' @examples
|
||||
|
|
|
@ -29,9 +29,9 @@ import org.apache.spark.storage.{BlockId, BlockStatus}
|
|||
/**
|
||||
* A simpler value of [[Accumulable]] where the result type being accumulated is the same
|
||||
* as the types of elements being merged, i.e. variables that are only "added" to through an
|
||||
* associative operation and can therefore be efficiently supported in parallel. They can be used
|
||||
* to implement counters (as in MapReduce) or sums. Spark natively supports accumulators of numeric
|
||||
* value types, and programmers can add support for new types.
|
||||
* associative and commutative operation and can therefore be efficiently supported in parallel.
|
||||
* They can be used to implement counters (as in MapReduce) or sums. Spark natively supports
|
||||
* accumulators of numeric value types, and programmers can add support for new types.
|
||||
*
|
||||
* An accumulator is created from an initial value `v` by calling [[SparkContext#accumulator]].
|
||||
* Tasks running on the cluster can then add to it using the [[Accumulable#+=]] operator.
|
||||
|
|
|
@ -278,17 +278,17 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
|
|||
combineByKey(createCombiner, mergeValue, mergeCombiners, new HashPartitioner(numPartitions))
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce.
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce.
|
||||
*/
|
||||
def reduceByKey(partitioner: Partitioner, func: JFunction2[V, V, V]): JavaPairRDD[K, V] =
|
||||
fromRDD(rdd.reduceByKey(partitioner, func))
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function, but return the results
|
||||
* immediately to the master as a Map. This will also perform the merging locally on each mapper
|
||||
* before sending results to a reducer, similarly to a "combiner" in MapReduce.
|
||||
* Merge the values for each key using an associative and commutative reduce function, but return
|
||||
* the result immediately to the master as a Map. This will also perform the merging locally on
|
||||
* each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.
|
||||
*/
|
||||
def reduceByKeyLocally(func: JFunction2[V, V, V]): java.util.Map[K, V] =
|
||||
mapAsSerializableJavaMap(rdd.reduceByKeyLocally(func))
|
||||
|
@ -381,9 +381,9 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
|
|||
fromRDD(rdd.foldByKey(zeroValue)(func))
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.
|
||||
*/
|
||||
def reduceByKey(func: JFunction2[V, V, V], numPartitions: Int): JavaPairRDD[K, V] =
|
||||
fromRDD(rdd.reduceByKey(func, numPartitions))
|
||||
|
@ -461,10 +461,10 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
|
|||
fromRDD(rdd.partitionBy(partitioner))
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce.
|
||||
*/
|
||||
* Return an RDD containing all pairs of elements with matching keys in `this` and `other`. Each
|
||||
* pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in `this` and
|
||||
* (k, v2) is in `other`. Uses the given Partitioner to partition the output RDD.
|
||||
*/
|
||||
def join[W](other: JavaPairRDD[K, W], partitioner: Partitioner): JavaPairRDD[K, (V, W)] =
|
||||
fromRDD(rdd.join(other, partitioner))
|
||||
|
||||
|
@ -520,9 +520,9 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
|
|||
}
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/
|
||||
* parallelism level.
|
||||
*/
|
||||
def reduceByKey(func: JFunction2[V, V, V]): JavaPairRDD[K, V] = {
|
||||
|
|
|
@ -373,7 +373,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
|
|||
|
||||
/**
|
||||
* Aggregate the elements of each partition, and then the results for all the partitions, using a
|
||||
* given associative and commutative function and a neutral "zero value". The function
|
||||
* given associative function and a neutral "zero value". The function
|
||||
* op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object
|
||||
* allocation; however, it should not modify t2.
|
||||
*
|
||||
|
|
|
@ -300,27 +300,27 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
|
|||
}
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce.
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce.
|
||||
*/
|
||||
def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)] = self.withScope {
|
||||
combineByKeyWithClassTag[V]((v: V) => v, func, func, partitioner)
|
||||
}
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.
|
||||
*/
|
||||
def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)] = self.withScope {
|
||||
reduceByKey(new HashPartitioner(numPartitions), func)
|
||||
}
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function. This will also perform
|
||||
* the merging locally on each mapper before sending results to a reducer, similarly to a
|
||||
* "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/
|
||||
* Merge the values for each key using an associative and commutative reduce function. This will
|
||||
* also perform the merging locally on each mapper before sending results to a reducer, similarly
|
||||
* to a "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/
|
||||
* parallelism level.
|
||||
*/
|
||||
def reduceByKey(func: (V, V) => V): RDD[(K, V)] = self.withScope {
|
||||
|
@ -328,9 +328,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
|
|||
}
|
||||
|
||||
/**
|
||||
* Merge the values for each key using an associative reduce function, but return the results
|
||||
* immediately to the master as a Map. This will also perform the merging locally on each mapper
|
||||
* before sending results to a reducer, similarly to a "combiner" in MapReduce.
|
||||
* Merge the values for each key using an associative and commutative reduce function, but return
|
||||
* the results immediately to the master as a Map. This will also perform the merging locally on
|
||||
* each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.
|
||||
*/
|
||||
def reduceByKeyLocally(func: (V, V) => V): Map[K, V] = self.withScope {
|
||||
val cleanedF = self.sparkContext.clean(func)
|
||||
|
|
|
@ -973,7 +973,7 @@ abstract class RDD[T: ClassTag](
|
|||
|
||||
/**
|
||||
* Aggregate the elements of each partition, and then the results for all the partitions, using a
|
||||
* given associative and commutative function and a neutral "zero value". The function
|
||||
* given associative function and a neutral "zero value". The function
|
||||
* op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object
|
||||
* allocation; however, it should not modify t2.
|
||||
*
|
||||
|
|
|
@ -1343,7 +1343,7 @@ value of the broadcast variable (e.g. if the variable is shipped to a new node l
|
|||
|
||||
## Accumulators
|
||||
|
||||
Accumulators are variables that are only "added" to through an associative operation and can
|
||||
Accumulators are variables that are only "added" to through an associative and commutative operation and can
|
||||
therefore be efficiently supported in parallel. They can be used to implement counters (as in
|
||||
MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers
|
||||
can add support for new types. If accumulators are created with a name, they will be
|
||||
|
|
|
@ -798,7 +798,7 @@ Some of the common ones are as follows.
|
|||
<td> <b>reduce</b>(<i>func</i>) </td>
|
||||
<td> Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the
|
||||
source DStream using a function <i>func</i> (which takes two arguments and returns one).
|
||||
The function should be associative so that it can be computed in parallel. </td>
|
||||
The function should be associative and commutative so that it can be computed in parallel. </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> <b>countByValue</b>() </td>
|
||||
|
@ -1072,7 +1072,7 @@ said two parameters - <i>windowLength</i> and <i>slideInterval</i>.
|
|||
<tr>
|
||||
<td> <b>reduceByWindow</b>(<i>func</i>, <i>windowLength</i>, <i>slideInterval</i>) </td>
|
||||
<td> Return a new single-element stream, created by aggregating elements in the stream over a
|
||||
sliding interval using <i>func</i>. The function should be associative so that it can be computed
|
||||
sliding interval using <i>func</i>. The function should be associative and commutative so that it can be computed
|
||||
correctly in parallel.
|
||||
</td>
|
||||
</tr>
|
||||
|
|
|
@ -844,8 +844,7 @@ class RDD(object):
|
|||
def fold(self, zeroValue, op):
|
||||
"""
|
||||
Aggregate the elements of each partition, and then the results for all
|
||||
the partitions, using a given associative and commutative function and
|
||||
a neutral "zero value."
|
||||
the partitions, using a given associative function and a neutral "zero value."
|
||||
|
||||
The function C{op(t1, t2)} is allowed to modify C{t1} and return it
|
||||
as its result value to avoid object allocation; however, it should not
|
||||
|
@ -1558,7 +1557,7 @@ class RDD(object):
|
|||
|
||||
def reduceByKey(self, func, numPartitions=None, partitionFunc=portable_hash):
|
||||
"""
|
||||
Merge the values for each key using an associative reduce function.
|
||||
Merge the values for each key using an associative and commutative reduce function.
|
||||
|
||||
This will also perform the merging locally on each mapper before
|
||||
sending results to a reducer, similarly to a "combiner" in MapReduce.
|
||||
|
@ -1576,7 +1575,7 @@ class RDD(object):
|
|||
|
||||
def reduceByKeyLocally(self, func):
|
||||
"""
|
||||
Merge the values for each key using an associative reduce function, but
|
||||
Merge the values for each key using an associative and commutative reduce function, but
|
||||
return the results immediately to the master as a dictionary.
|
||||
|
||||
This will also perform the merging locally on each mapper before
|
||||
|
|
|
@ -453,7 +453,7 @@ class DStream(object):
|
|||
2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
|
||||
This is more efficient than `invReduceFunc` is None.
|
||||
|
||||
@param reduceFunc: associative reduce function
|
||||
@param reduceFunc: associative and commutative reduce function
|
||||
@param invReduceFunc: inverse reduce function of `reduceFunc`
|
||||
@param windowDuration: width of the window; must be a multiple of this DStream's
|
||||
batching interval
|
||||
|
@ -524,7 +524,7 @@ class DStream(object):
|
|||
`invFunc` can be None, then it will reduce all the RDDs in window, could be slower
|
||||
than having `invFunc`.
|
||||
|
||||
@param func: associative reduce function
|
||||
@param func: associative and commutative reduce function
|
||||
@param invFunc: inverse function of `reduceFunc`
|
||||
@param windowDuration: width of the window; must be a multiple of this DStream's
|
||||
batching interval
|
||||
|
|
|
@ -214,7 +214,7 @@ trait JavaDStreamLike[T, This <: JavaDStreamLike[T, This, R], R <: JavaRDDLike[T
|
|||
/**
|
||||
* Return a new DStream in which each RDD has a single element generated by reducing all
|
||||
* elements in a sliding window over this DStream.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -234,7 +234,7 @@ trait JavaDStreamLike[T, This <: JavaDStreamLike[T, This, R], R <: JavaRDDLike[T
|
|||
/**
|
||||
* Return a new DStream in which each RDD has a single element generated by reducing all
|
||||
* elements in a sliding window over this DStream.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -257,7 +257,7 @@ trait JavaDStreamLike[T, This <: JavaDStreamLike[T, This, R], R <: JavaRDDLike[T
|
|||
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
|
||||
* This is more efficient than reduceByWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
|
|
@ -138,8 +138,8 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
|
||||
/**
|
||||
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
|
||||
* merged using the associative reduce function. Hash partitioning is used to generate the RDDs
|
||||
* with Spark's default number of partitions.
|
||||
* merged using the associative and commutative reduce function. Hash partitioning is used to
|
||||
* generate the RDDs with Spark's default number of partitions.
|
||||
*/
|
||||
def reduceByKey(func: JFunction2[V, V, V]): JavaPairDStream[K, V] =
|
||||
dstream.reduceByKey(func)
|
||||
|
@ -257,7 +257,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* Similar to `DStream.reduceByKey()`, but applies it over a sliding window. The new DStream
|
||||
* generates RDDs with the same interval as this DStream. Hash partitioning is used to generate
|
||||
* the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
*/
|
||||
|
@ -270,7 +270,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
|
||||
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
|
||||
* generate the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -289,7 +289,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
|
||||
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
|
||||
* generate the RDDs with `numPartitions` partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -309,7 +309,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
/**
|
||||
* Return a new DStream by applying `reduceByKey` over a sliding window. Similar to
|
||||
* `DStream.reduceByKey()`, but applies it over a sliding window.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative rand commutative educe function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -335,7 +335,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* This is more efficient that reduceByKeyAndWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* Hash partitioning is used to generate the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
@ -360,7 +360,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* This is more efficient that reduceByKeyAndWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* Hash partitioning is used to generate the RDDs with `numPartitions` partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
@ -397,7 +397,7 @@ class JavaPairDStream[K, V](val dstream: DStream[(K, V)])(
|
|||
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
|
||||
* This is more efficient that reduceByKeyAndWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
|
|
@ -791,7 +791,7 @@ abstract class DStream[T: ClassTag] (
|
|||
/**
|
||||
* Return a new DStream in which each RDD has a single element generated by reducing all
|
||||
* elements in a sliding window over this DStream.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -814,7 +814,7 @@ abstract class DStream[T: ClassTag] (
|
|||
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
|
||||
* This is more efficient than reduceByWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
|
|
@ -75,8 +75,8 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
|
||||
/**
|
||||
* Return a new DStream by applying `reduceByKey` to each RDD. The values for each key are
|
||||
* merged using the associative reduce function. Hash partitioning is used to generate the RDDs
|
||||
* with Spark's default number of partitions.
|
||||
* merged using the associative and commutative reduce function. Hash partitioning is used to
|
||||
* generate the RDDs with Spark's default number of partitions.
|
||||
*/
|
||||
def reduceByKey(reduceFunc: (V, V) => V): DStream[(K, V)] = ssc.withScope {
|
||||
reduceByKey(reduceFunc, defaultPartitioner())
|
||||
|
@ -204,7 +204,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
* Similar to `DStream.reduceByKey()`, but applies it over a sliding window. The new DStream
|
||||
* generates RDDs with the same interval as this DStream. Hash partitioning is used to generate
|
||||
* the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
*/
|
||||
|
@ -219,7 +219,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
|
||||
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
|
||||
* generate the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -238,7 +238,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
* Return a new DStream by applying `reduceByKey` over a sliding window. This is similar to
|
||||
* `DStream.reduceByKey()` but applies it over a sliding window. Hash partitioning is used to
|
||||
* generate the RDDs with `numPartitions` partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -259,7 +259,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
/**
|
||||
* Return a new DStream by applying `reduceByKey` over a sliding window. Similar to
|
||||
* `DStream.reduceByKey()`, but applies it over a sliding window.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
* @param slideDuration sliding interval of the window (i.e., the interval after which
|
||||
|
@ -289,7 +289,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
* This is more efficient than reduceByKeyAndWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* Hash partitioning is used to generate the RDDs with Spark's default number of partitions.
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
@ -320,7 +320,7 @@ class PairDStreamFunctions[K, V](self: DStream[(K, V)])
|
|||
* 2. "inverse reduce" the old values that left the window (e.g., subtracting old counts)
|
||||
* This is more efficient than reduceByKeyAndWindow without "inverse reduce" function.
|
||||
* However, it is applicable to only "invertible reduce functions".
|
||||
* @param reduceFunc associative reduce function
|
||||
* @param reduceFunc associative and commutative reduce function
|
||||
* @param invReduceFunc inverse reduce function
|
||||
* @param windowDuration width of the window; must be a multiple of this DStream's
|
||||
* batching interval
|
||||
|
|
Loading…
Reference in a new issue