SPARK-1637: Clean up examples for 1.0

- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
- [x] Move Python examples into examples/src/main/python
- [x] Update docs to reflect these changes

Author: Sandeep <sandeep@techaddict.me>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #571 from techaddict/SPARK-1637 and squashes the following commits:

47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples
8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples
5f96121 [Sandeep] Move Python examples into examples/src/main/python
0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
This commit is contained in:
Sandeep 2014-05-06 17:27:52 -07:00 committed by Matei Zaharia
parent 39b8b1489f
commit a000b5c3b0
40 changed files with 69 additions and 72 deletions

View file

@ -24,11 +24,11 @@ right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/
# Running the Examples and Shell
Spark comes with several sample programs. Scala and Java examples are in the `examples` directory, and Python examples are in `python/examples`.
Spark comes with several sample programs. Scala, Java and Python examples are in the `examples/src/main` directory.
To run one of the Java or Scala sample programs, use `./bin/run-example <class> <params>` in the top-level Spark directory
(the `bin/run-example` script sets up the appropriate paths and launches that program).
For example, try `./bin/run-example org.apache.spark.examples.SparkPi local`.
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./python/examples/pi.py local`.
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./examples/src/main/python/pi.py local`.
Each example prints usage help when run with no parameters.

View file

@ -161,9 +161,9 @@ some example applications.
# Where to Go from Here
PySpark also includes several sample programs in the [`python/examples` folder](https://github.com/apache/spark/tree/master/python/examples).
PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python).
You can run them by passing the files to `pyspark`; e.g.:
./bin/spark-submit python/examples/wordcount.py
./bin/spark-submit examples/src/main/python/wordcount.py
Each program prints usage help when run without arguments.

View file

@ -129,7 +129,7 @@ ssc.awaitTermination() // Wait for the computation to terminate
{% endhighlight %}
The complete code can be found in the Spark Streaming example
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala).
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala).
<br>
</div>
@ -215,7 +215,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
{% endhighlight %}
The complete code can be found in the Spark Streaming example
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java).
<br>
</div>
@ -234,12 +234,12 @@ Then, in a different terminal, you can start the example by using
<div class="codetabs">
<div data-lang="scala" markdown="1">
{% highlight bash %}
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
{% endhighlight %}
</div>
<div data-lang="java" markdown="1">
{% highlight bash %}
$ ./bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999
$ ./bin/run-example org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999
{% endhighlight %}
</div>
</div>
@ -268,7 +268,7 @@ hello world
{% highlight bash %}
# TERMINAL 2: RUNNING NetworkWordCount or JavaNetworkWordCount
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
...
-------------------------------------------
Time: 1357008430000 ms
@ -609,7 +609,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu
The update function will be called for each word, with `newValues` having a sequence of 1's (from
the `(word, 1)` pairs) and the `runningCount` having the previous count. For the complete
Scala code, take a look at the example
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala).
<h4>Transform Operation</h4>
@ -1135,7 +1135,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
If the directory does not exist (i.e., running for the first time),
then the function `functionToCreateContext` will be called to create a new
context and set up the DStreams. See the Scala example
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala).
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala).
This example appends the word counts of network data into a file.
You can also explicitly create a `StreamingContext` from the checkpoint data and start the
@ -1174,7 +1174,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
If the directory does not exist (i.e., running for the first time),
then the function `contextFactory` will be called to create a new
context and set up the DStreams. See the Scala example
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/JavaRecoverableWordCount.scala)
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/JavaRecoverableWordCount.scala)
(note that this example is missing in the 0.9 release, so you can test it using the master branch).
This example appends the word counts of network data into a file.
@ -1374,7 +1374,6 @@ package and renamed for better clarity.
[ZeroMQUtils](api/java/org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
[MQTTUtils](api/java/org/apache/spark/streaming/mqtt/MQTTUtils.html)
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and
[video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and [video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.mllib.examples;
package org.apache.spark.examples.mllib;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.mllib.examples;
package org.apache.spark.examples.mllib;
import java.util.regex.Pattern;

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.mllib.examples;
package org.apache.spark.examples.mllib;
import java.util.regex.Pattern;

View file

@ -15,9 +15,10 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples;
package org.apache.spark.examples.streaming;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.examples.streaming.StreamingExamples;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.flume.FlumeUtils;

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples;
package org.apache.spark.examples.streaming;
import java.util.Map;
import java.util.HashMap;
@ -26,6 +26,7 @@ import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.examples.streaming.StreamingExamples;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
@ -44,7 +45,7 @@ import scala.Tuple2;
* <numThreads> is the number of threads the kafka consumer should use
*
* Example:
* `./bin/run-example org.apache.spark.streaming.examples.JavaKafkaWordCount local[2] zoo01,zoo02,
* `./bin/run-example org.apache.spark.examples.streaming.JavaKafkaWordCount local[2] zoo01,zoo02,
* zoo03 my-consumer-group topic1,topic2 1`
*/

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples;
package org.apache.spark.examples.streaming;
import com.google.common.collect.Lists;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
@ -23,6 +23,7 @@ import scala.Tuple2;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.examples.streaming.StreamingExamples;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
@ -39,7 +40,7 @@ import java.util.regex.Pattern;
* To run this on your local machine, you need to first run a Netcat server
* `$ nc -lk 9999`
* and then run the example
* `$ ./run org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999`
* `$ ./run org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999`
*/
public final class JavaNetworkWordCount {
private static final Pattern SPACE = Pattern.compile(" ");

View file

@ -15,13 +15,14 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples;
package org.apache.spark.examples.streaming;
import com.google.common.collect.Lists;
import scala.Tuple2;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.examples.streaming.StreamingExamples;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;

View file

@ -18,7 +18,7 @@
"""
The K-means algorithm written from scratch against PySpark. In practice,
one may prefer to use the KMeans algorithm in MLlib, as shown in
python/examples/mllib/kmeans.py.
examples/src/main/python/mllib/kmeans.py.
This example requires NumPy (http://www.numpy.org/).
"""

View file

@ -20,7 +20,7 @@ A logistic regression implementation that uses NumPy (http://www.numpy.org)
to act on batches of input data using efficient matrix operations.
In practice, one may prefer to use the LogisticRegression algorithm in
MLlib, as shown in python/examples/mllib/logistic_regression.py.
MLlib, as shown in examples/src/main/python/mllib/logistic_regression.py.
"""
from collections import namedtuple

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.sql.examples
package org.apache.spark.examples.sql
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.sql.hive.examples
package org.apache.spark.examples.sql.hive
import org.apache.spark.SparkContext
import org.apache.spark.sql._

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import scala.collection.mutable.LinkedList
import scala.reflect.ClassTag
@ -78,7 +78,7 @@ class FeederActor extends Actor {
* goes and subscribe to a typical publisher/feeder actor and receives
* data.
*
* @see [[org.apache.spark.streaming.examples.FeederActor]]
* @see [[org.apache.spark.examples.streaming.FeederActor]]
*/
class SampleActorReceiver[T: ClassTag](urlOfPublisher: String)
extends Actor with ActorHelper {
@ -131,9 +131,9 @@ object FeederActor {
* <hostname> and <port> describe the AkkaSystem that Spark Sample feeder is running on.
*
* To run this example locally, you may run Feeder Actor as
* `$ ./bin/run-example org.apache.spark.streaming.examples.FeederActor 127.0.1.1 9999`
* `$ ./bin/run-example org.apache.spark.examples.streaming.FeederActor 127.0.1.1 9999`
* and then run the example
* `./bin/run-example org.apache.spark.streaming.examples.ActorWordCount local[2] 127.0.1.1 9999`
* `./bin/run-example org.apache.spark.examples.streaming.ActorWordCount local[2] 127.0.1.1 9999`
*/
object ActorWordCount {
def main(args: Array[String]) {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
@ -27,7 +27,7 @@ import org.apache.spark.streaming.StreamingContext._
* <directory> is the directory that Spark Streaming will use to find and read new text files.
*
* To run this on your local machine on directory `localdir`, run this example
* `$ ./bin/run-example org.apache.spark.streaming.examples.HdfsWordCount local[2] localdir`
* `$ ./bin/run-example org.apache.spark.examples.streaming.HdfsWordCount local[2] localdir`
* Then create a text file in `localdir` and the words in the file will get counted.
*/
object HdfsWordCount {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import java.util.Properties
@ -24,7 +24,6 @@ import kafka.producer._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.streaming.util.RawTextHelper._
// scalastyle:off
/**
@ -37,7 +36,7 @@ import org.apache.spark.streaming.util.RawTextHelper._
* <numThreads> is the number of threads the kafka consumer should use
*
* Example:
* `./bin/run-example org.apache.spark.streaming.examples.KafkaWordCount local[2] zoo01,zoo02,zoo03 my-consumer-group topic1,topic2 1`
* `./bin/run-example org.apache.spark.examples.streaming.KafkaWordCount local[2] zoo01,zoo02,zoo03 my-consumer-group topic1,topic2 1`
*/
// scalastyle:on
object KafkaWordCount {
@ -59,7 +58,7 @@ object KafkaWordCount {
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap).map(_._2)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L))
.reduceByKeyAndWindow(add _, subtract _, Minutes(10), Seconds(2), 2)
.reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
wordCounts.print()
ssc.start()

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.eclipse.paho.client.mqttv3.{MqttClient, MqttClientPersistence, MqttException, MqttMessage, MqttTopic}
import org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence
@ -79,9 +79,9 @@ object MQTTPublisher {
* <MqttbrokerUrl> and <topic> describe where Mqtt publisher is running.
*
* To run this example locally, you may run publisher as
* `$ ./bin/run-example org.apache.spark.streaming.examples.MQTTPublisher tcp://localhost:1883 foo`
* `$ ./bin/run-example org.apache.spark.examples.streaming.MQTTPublisher tcp://localhost:1883 foo`
* and run the example as
* `$ ./bin/run-example org.apache.spark.streaming.examples.MQTTWordCount local[2] tcp://localhost:1883 foo`
* `$ ./bin/run-example org.apache.spark.examples.streaming.MQTTWordCount local[2] tcp://localhost:1883 foo`
*/
// scalastyle:on
object MQTTWordCount {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
@ -32,7 +32,7 @@ import org.apache.spark.storage.StorageLevel
* To run this on your local machine, you need to first run a Netcat server
* `$ nc -lk 9999`
* and then run the example
* `$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999`
* `$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999`
*/
// scalastyle:on
object NetworkWordCount {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import scala.collection.mutable.SynchronizedQueue

View file

@ -15,11 +15,10 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.util.RawTextHelper
import org.apache.spark.util.IntParam
/**
@ -52,9 +51,6 @@ object RawNetworkGrep {
val ssc = new StreamingContext(master, "RawNetworkGrep", Milliseconds(batchMillis),
System.getenv("SPARK_HOME"), StreamingContext.jarOfClass(this.getClass).toSeq)
// Warm up the JVMs on master and slave for JIT compilation to kick in
RawTextHelper.warmUp(ssc.sparkContext)
val rawStreams = (1 to numStreams).map(_ =>
ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER_2)).toArray
val union = ssc.union(rawStreams)

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.streaming.{Time, Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
@ -44,7 +44,7 @@ import java.nio.charset.Charset
*
* and run the example as
*
* `$ ./run-example org.apache.spark.streaming.examples.RecoverableNetworkWordCount \
* `$ ./run-example org.apache.spark.examples.streaming.RecoverableNetworkWordCount \
* local[2] localhost 9999 ~/checkpoint/ ~/out`
*
* If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create
@ -56,7 +56,7 @@ import java.nio.charset.Charset
*
* `$ ./spark-class org.apache.spark.deploy.Client -s launch <cluster-url> \
* <path-to-examples-jar> \
* org.apache.spark.streaming.examples.RecoverableNetworkWordCount <cluster-url> \
* org.apache.spark.examples.streaming.RecoverableNetworkWordCount <cluster-url> \
* localhost 9999 ~/checkpoint ~/out`
*
* <path-to-examples-jar> would typically be

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
@ -31,7 +31,7 @@ import org.apache.spark.streaming.StreamingContext._
* To run this on your local machine, you need to first run a Netcat server
* `$ nc -lk 9999`
* and then run the example
* `$ ./bin/run-example org.apache.spark.streaming.examples.StatefulNetworkWordCount local[2] localhost 9999`
* `$ ./bin/run-example org.apache.spark.examples.streaming.StatefulNetworkWordCount local[2] localhost 9999`
*/
// scalastyle:on
object StatefulNetworkWordCount {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.Logging

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import com.twitter.algebird._

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import com.twitter.algebird.HyperLogLogMonoid
import com.twitter.algebird.HyperLogLog._

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import org.apache.spark.streaming.{Seconds, StreamingContext}
import StreamingContext._

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples
package org.apache.spark.examples.streaming
import akka.actor.ActorSystem
import akka.actor.actorRef2Scala
@ -68,9 +68,9 @@ object SimpleZeroMQPublisher {
* <zeroMQurl> and <topic> describe where zeroMq publisher is running.
*
* To run this example locally, you may run publisher as
* `$ ./bin/run-example org.apache.spark.streaming.examples.SimpleZeroMQPublisher tcp://127.0.1.1:1234 foo.bar`
* `$ ./bin/run-example org.apache.spark.examples.streaming.SimpleZeroMQPublisher tcp://127.0.1.1:1234 foo.bar`
* and run the example as
* `$ ./bin/run-example org.apache.spark.streaming.examples.ZeroMQWordCount local[2] tcp://127.0.1.1:1234 foo`
* `$ ./bin/run-example org.apache.spark.examples.streaming.ZeroMQWordCount local[2] tcp://127.0.1.1:1234 foo`
*/
// scalastyle:on
object ZeroMQWordCount {

View file

@ -15,7 +15,7 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples.clickstream
package org.apache.spark.examples.streaming.clickstream
import java.net.ServerSocket
import java.io.PrintWriter
@ -40,8 +40,8 @@ object PageView extends Serializable {
/** Generates streaming events to simulate page views on a website.
*
* This should be used in tandem with PageViewStream.scala. Example:
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewGenerator 44444 10
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewStream errorRatePerZipCode localhost 44444
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewGenerator 44444 10
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewStream errorRatePerZipCode localhost 44444
*
* When running this, you may want to set the root logging level to ERROR in
* conf/log4j.properties to reduce the verbosity of the output.

View file

@ -15,19 +15,19 @@
* limitations under the License.
*/
package org.apache.spark.streaming.examples.clickstream
package org.apache.spark.examples.streaming.clickstream
import org.apache.spark.SparkContext._
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.examples.StreamingExamples
import org.apache.spark.examples.streaming.StreamingExamples
// scalastyle:off
/** Analyses a streaming dataset of web page views. This class demonstrates several types of
* operators available in Spark streaming.
*
* This should be used in tandem with PageViewStream.scala. Example:
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewGenerator 44444 10
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewStream errorRatePerZipCode localhost 44444
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewGenerator 44444 10
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewStream errorRatePerZipCode localhost 44444
*/
// scalastyle:on
object PageViewStream {

View file

@ -25,7 +25,7 @@ import scala.collection.JavaConversions.mapAsScalaMap
private[streaming]
object RawTextHelper {
/**
/**
* Splits lines and counts the words.
*/
def splitAndCountPartitions(iter: Iterator[String]): Iterator[(String, Long)] = {
@ -114,4 +114,3 @@ object RawTextHelper {
def max(v1: Long, v2: Long) = math.max(v1, v2)
}