[DOCS][MINOR] Screenshot + minor fixes to improve reading for accumulators

## What changes were proposed in this pull request?

Added screenshot + minor fixes to improve reading

## How was this patch tested?

Manual

Author: Jacek Laskowski <jacek@japila.pl>

Closes #12569 from jaceklaskowski/docs-accumulators.
This commit is contained in:
Jacek Laskowski 2016-04-24 10:36:33 +01:00 committed by Sean Owen
parent db7113b1d3
commit 8df8a81825
2 changed files with 12 additions and 6 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 226 KiB

View file

@ -1328,12 +1328,18 @@ value of the broadcast variable (e.g. if the variable is shipped to a new node l
Accumulators are variables that are only "added" to through an associative and commutative operation and can Accumulators are variables that are only "added" to through an associative and commutative operation and can
therefore be efficiently supported in parallel. They can be used to implement counters (as in therefore be efficiently supported in parallel. They can be used to implement counters (as in
MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers
can add support for new types. If accumulators are created with a name, they will be can add support for new types.
If accumulators are created with a name, they will be
displayed in Spark's UI. This can be useful for understanding the progress of displayed in Spark's UI. This can be useful for understanding the progress of
running stages (NOTE: this is not yet supported in Python). running stages (NOTE: this is not yet supported in Python).
<p style="text-align: center;">
<img src="img/spark-webui-accumulators.png" title="Accumulators in the Spark UI" alt="Accumulators in the Spark UI" />
</p>
An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks
running on the cluster can then add to it using the `add` method or the `+=` operator (in Scala and Python). running on a cluster can then add to it using the `add` method or the `+=` operator (in Scala and Python).
However, they cannot read its value. However, they cannot read its value.
Only the driver program can read the accumulator's value, using its `value` method. Only the driver program can read the accumulator's value, using its `value` method.
@ -1345,7 +1351,7 @@ The code below shows an accumulator being used to add up the elements of an arra
{% highlight scala %} {% highlight scala %}
scala> val accum = sc.accumulator(0, "My Accumulator") scala> val accum = sc.accumulator(0, "My Accumulator")
accum: spark.Accumulator[Int] = 0 accum: org.apache.spark.Accumulator[Int] = 0
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x) scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
... ...
@ -1469,8 +1475,8 @@ Accumulators do not change the lazy evaluation model of Spark. If they are being
<div data-lang="scala" markdown="1"> <div data-lang="scala" markdown="1">
{% highlight scala %} {% highlight scala %}
val accum = sc.accumulator(0) val accum = sc.accumulator(0)
data.map { x => accum += x; f(x) } data.map { x => accum += x; x }
// Here, accum is still 0 because no actions have caused the <code>map</code> to be computed. // Here, accum is still 0 because no actions have caused the map operation to be computed.
{% endhighlight %} {% endhighlight %}
</div> </div>