spark-instrumented-optimizer/core/src/main/scala/spark/DoubleRDDFunctions.scala

package spark

import spark.partial.BoundedDouble
import spark.partial.MeanEvaluator
import spark.partial.PartialResult
import spark.partial.SumEvaluator
import spark.util.StatCounter

/**
 * Extra functions available on RDDs of Doubles through an implicit conversion.
 * Import `spark.SparkContext._` at the top of your program to use these functions.
 */
class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable {
  /** Add up the elements in this RDD. */
  def sum(): Double = {
    self.reduce(_ + _)
  }

  /**
   * Return a [[spark.util.StatCounter]] object that captures the mean, variance and count
   * of the RDD's elements in one operation.
   */
  def stats(): StatCounter = {
    self.mapPartitions(nums => Iterator(StatCounter(nums))).reduce((a, b) => a.merge(b))
  }

  /** Compute the mean of this RDD's elements. */
  def mean(): Double = stats().mean

  /** Compute the variance of this RDD's elements. */
  def variance(): Double = stats().variance

  /** Compute the standard deviation of this RDD's elements. */
  def stdev(): Double = stats().stdev

  /** 
   * Compute the sample standard deviation of this RDD's elements (which corrects for bias in
   * estimating the standard deviation by dividing by N-1 instead of N).
   */
  def sampleStdev(): Double = stats().stdev

  /** (Experimental) Approximate operation to return the mean within a timeout. */
  def meanApprox(timeout: Long, confidence: Double = 0.95): PartialResult[BoundedDouble] = {
    val processPartition = (ctx: TaskContext, ns: Iterator[Double]) => StatCounter(ns)
    val evaluator = new MeanEvaluator(self.splits.size, confidence)
    self.context.runApproximateJob(self, processPartition, evaluator, timeout)
  }

  /** (Experimental) Approximate operation to return the sum within a timeout. */
  def sumApprox(timeout: Long, confidence: Double = 0.95): PartialResult[BoundedDouble] = {
    val processPartition = (ctx: TaskContext, ns: Iterator[Double]) => StatCounter(ns)
    val evaluator = new SumEvaluator(self.splits.size, confidence)
    self.context.runApproximateJob(self, processPartition, evaluator, timeout)
  }
}
Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`package spark`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00
			`import spark.partial.BoundedDouble`
			`import spark.partial.MeanEvaluator`
			`import spark.partial.PartialResult`
			`import spark.partial.SumEvaluator`
			`import spark.util.StatCounter`

			`/**`
			`* Extra functions available on RDDs of Doubles through an implicit conversion.`
Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			* Import `spark.SparkContext._` at the top of your program to use these functions.
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`*/`
			`class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable {`
Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/** Add up the elements in this RDD. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def sum(): Double = {`
			`self.reduce(_ + _)`
			`}`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/**`
			`* Return a [[spark.util.StatCounter]] object that captures the mean, variance and count`
			`* of the RDD's elements in one operation.`
			`*/`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def stats(): StatCounter = {`
			`self.mapPartitions(nums => Iterator(StatCounter(nums))).reduce((a, b) => a.merge(b))`
			`}`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/** Compute the mean of this RDD's elements. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def mean(): Double = stats().mean`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/** Compute the variance of this RDD's elements. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def variance(): Double = stats().variance`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/** Compute the standard deviation of this RDD's elements. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def stdev(): Double = stats().stdev`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/**`
			`* Compute the sample standard deviation of this RDD's elements (which corrects for bias in`
			`* estimating the standard deviation by dividing by N-1 instead of N).`
			`*/`
			`def sampleStdev(): Double = stats().stdev`

			`/** (Experimental) Approximate operation to return the mean within a timeout. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def meanApprox(timeout: Long, confidence: Double = 0.95): PartialResult[BoundedDouble] = {`
			`val processPartition = (ctx: TaskContext, ns: Iterator[Double]) => StatCounter(ns)`
			`val evaluator = new MeanEvaluator(self.splits.size, confidence)`
			`self.context.runApproximateJob(self, processPartition, evaluator, timeout)`
			`}`

Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API. 2012-10-09 21:38:36 -04:00			`/** (Experimental) Approximate operation to return the sum within a timeout. */`
Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet. 2012-06-07 03:25:47 -04:00			`def sumApprox(timeout: Long, confidence: Double = 0.95): PartialResult[BoundedDouble] = {`
			`val processPartition = (ctx: TaskContext, ns: Iterator[Double]) => StatCounter(ns)`
			`val evaluator = new SumEvaluator(self.splits.size, confidence)`
			`self.context.runApproximateJob(self, processPartition, evaluator, timeout)`
			`}`
			`}`