SPARK-1637: Clean up examples for 1.0
- [x] Move all of them into subpackages of org.apache.spark.examples (right now some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib) - [x] Move Python examples into examples/src/main/python - [x] Update docs to reflect these changes Author: Sandeep <sandeep@techaddict.me> This patch had conflicts when merged, resolved by Committer: Matei Zaharia <matei@databricks.com> Closes #571 from techaddict/SPARK-1637 and squashes the following commits: 47ef86c [Sandeep] Changes based on Discussions on PR, removing use of RawTextHelper from examples 8ed2d3f [Sandeep] Docs Updated for changes, Change for java examples 5f96121 [Sandeep] Move Python examples into examples/src/main/python 0a8dd77 [Sandeep] Move all Scala Examples to org.apache.spark.examples (some are in org.apache.spark.streaming.examples, for instance, and others are in org.apache.spark.examples.mllib)
This commit is contained in:
parent
39b8b1489f
commit
a000b5c3b0
|
@ -24,11 +24,11 @@ right version of Scala from [scala-lang.org](http://www.scala-lang.org/download/
|
|||
|
||||
# Running the Examples and Shell
|
||||
|
||||
Spark comes with several sample programs. Scala and Java examples are in the `examples` directory, and Python examples are in `python/examples`.
|
||||
Spark comes with several sample programs. Scala, Java and Python examples are in the `examples/src/main` directory.
|
||||
To run one of the Java or Scala sample programs, use `./bin/run-example <class> <params>` in the top-level Spark directory
|
||||
(the `bin/run-example` script sets up the appropriate paths and launches that program).
|
||||
For example, try `./bin/run-example org.apache.spark.examples.SparkPi local`.
|
||||
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./python/examples/pi.py local`.
|
||||
To run a Python sample program, use `./bin/pyspark <sample-program> <params>`. For example, try `./bin/pyspark ./examples/src/main/python/pi.py local`.
|
||||
|
||||
Each example prints usage help when run with no parameters.
|
||||
|
||||
|
|
|
@ -161,9 +161,9 @@ some example applications.
|
|||
|
||||
# Where to Go from Here
|
||||
|
||||
PySpark also includes several sample programs in the [`python/examples` folder](https://github.com/apache/spark/tree/master/python/examples).
|
||||
PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python).
|
||||
You can run them by passing the files to `pyspark`; e.g.:
|
||||
|
||||
./bin/spark-submit python/examples/wordcount.py
|
||||
./bin/spark-submit examples/src/main/python/wordcount.py
|
||||
|
||||
Each program prints usage help when run without arguments.
|
||||
|
|
|
@ -129,7 +129,7 @@ ssc.awaitTermination() // Wait for the computation to terminate
|
|||
{% endhighlight %}
|
||||
|
||||
The complete code can be found in the Spark Streaming example
|
||||
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/NetworkWordCount.scala).
|
||||
[NetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala).
|
||||
<br>
|
||||
|
||||
</div>
|
||||
|
@ -215,7 +215,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
|
|||
{% endhighlight %}
|
||||
|
||||
The complete code can be found in the Spark Streaming example
|
||||
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java).
|
||||
[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java).
|
||||
<br>
|
||||
|
||||
</div>
|
||||
|
@ -234,12 +234,12 @@ Then, in a different terminal, you can start the example by using
|
|||
<div class="codetabs">
|
||||
<div data-lang="scala" markdown="1">
|
||||
{% highlight bash %}
|
||||
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
|
||||
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
|
||||
{% endhighlight %}
|
||||
</div>
|
||||
<div data-lang="java" markdown="1">
|
||||
{% highlight bash %}
|
||||
$ ./bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999
|
||||
$ ./bin/run-example org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999
|
||||
{% endhighlight %}
|
||||
</div>
|
||||
</div>
|
||||
|
@ -268,7 +268,7 @@ hello world
|
|||
{% highlight bash %}
|
||||
# TERMINAL 2: RUNNING NetworkWordCount or JavaNetworkWordCount
|
||||
|
||||
$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999
|
||||
$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999
|
||||
...
|
||||
-------------------------------------------
|
||||
Time: 1357008430000 ms
|
||||
|
@ -609,7 +609,7 @@ JavaPairDStream<String, Integer> runningCounts = pairs.updateStateByKey(updateFu
|
|||
The update function will be called for each word, with `newValues` having a sequence of 1's (from
|
||||
the `(word, 1)` pairs) and the `runningCount` having the previous count. For the complete
|
||||
Scala code, take a look at the example
|
||||
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/streaming/examples/StatefulNetworkWordCount.scala).
|
||||
[StatefulNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala).
|
||||
|
||||
<h4>Transform Operation</h4>
|
||||
|
||||
|
@ -1135,7 +1135,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
|
|||
If the directory does not exist (i.e., running for the first time),
|
||||
then the function `functionToCreateContext` will be called to create a new
|
||||
context and set up the DStreams. See the Scala example
|
||||
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/RecoverableNetworkWordCount.scala).
|
||||
[RecoverableNetworkWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala).
|
||||
This example appends the word counts of network data into a file.
|
||||
|
||||
You can also explicitly create a `StreamingContext` from the checkpoint data and start the
|
||||
|
@ -1174,7 +1174,7 @@ If the `checkpointDirectory` exists, then the context will be recreated from the
|
|||
If the directory does not exist (i.e., running for the first time),
|
||||
then the function `contextFactory` will be called to create a new
|
||||
context and set up the DStreams. See the Scala example
|
||||
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples/JavaRecoverableWordCount.scala)
|
||||
[JavaRecoverableWordCount]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming/JavaRecoverableWordCount.scala)
|
||||
(note that this example is missing in the 0.9 release, so you can test it using the master branch).
|
||||
This example appends the word counts of network data into a file.
|
||||
|
||||
|
@ -1374,7 +1374,6 @@ package and renamed for better clarity.
|
|||
[ZeroMQUtils](api/java/org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
|
||||
[MQTTUtils](api/java/org/apache/spark/streaming/mqtt/MQTTUtils.html)
|
||||
|
||||
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/streaming/examples)
|
||||
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/streaming/examples)
|
||||
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and
|
||||
[video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.
|
||||
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
|
||||
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)
|
||||
* [Paper](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf) and [video](http://youtu.be/g171ndOHgJ0) describing Spark Streaming.
|
||||
|
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.mllib.examples;
|
||||
package org.apache.spark.examples.mllib;
|
||||
|
||||
import org.apache.spark.api.java.JavaRDD;
|
||||
import org.apache.spark.api.java.JavaSparkContext;
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.mllib.examples;
|
||||
package org.apache.spark.examples.mllib;
|
||||
|
||||
import java.util.regex.Pattern;
|
||||
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.mllib.examples;
|
||||
package org.apache.spark.examples.mllib;
|
||||
|
||||
import java.util.regex.Pattern;
|
||||
|
|
@ -15,9 +15,10 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples;
|
||||
package org.apache.spark.examples.streaming;
|
||||
|
||||
import org.apache.spark.api.java.function.Function;
|
||||
import org.apache.spark.examples.streaming.StreamingExamples;
|
||||
import org.apache.spark.streaming.*;
|
||||
import org.apache.spark.streaming.api.java.*;
|
||||
import org.apache.spark.streaming.flume.FlumeUtils;
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples;
|
||||
package org.apache.spark.examples.streaming;
|
||||
|
||||
import java.util.Map;
|
||||
import java.util.HashMap;
|
||||
|
@ -26,6 +26,7 @@ import org.apache.spark.api.java.function.FlatMapFunction;
|
|||
import org.apache.spark.api.java.function.Function;
|
||||
import org.apache.spark.api.java.function.Function2;
|
||||
import org.apache.spark.api.java.function.PairFunction;
|
||||
import org.apache.spark.examples.streaming.StreamingExamples;
|
||||
import org.apache.spark.streaming.Duration;
|
||||
import org.apache.spark.streaming.api.java.JavaDStream;
|
||||
import org.apache.spark.streaming.api.java.JavaPairDStream;
|
||||
|
@ -44,7 +45,7 @@ import scala.Tuple2;
|
|||
* <numThreads> is the number of threads the kafka consumer should use
|
||||
*
|
||||
* Example:
|
||||
* `./bin/run-example org.apache.spark.streaming.examples.JavaKafkaWordCount local[2] zoo01,zoo02,
|
||||
* `./bin/run-example org.apache.spark.examples.streaming.JavaKafkaWordCount local[2] zoo01,zoo02,
|
||||
* zoo03 my-consumer-group topic1,topic2 1`
|
||||
*/
|
||||
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples;
|
||||
package org.apache.spark.examples.streaming;
|
||||
|
||||
import com.google.common.collect.Lists;
|
||||
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
|
||||
|
@ -23,6 +23,7 @@ import scala.Tuple2;
|
|||
import org.apache.spark.api.java.function.FlatMapFunction;
|
||||
import org.apache.spark.api.java.function.Function2;
|
||||
import org.apache.spark.api.java.function.PairFunction;
|
||||
import org.apache.spark.examples.streaming.StreamingExamples;
|
||||
import org.apache.spark.streaming.Duration;
|
||||
import org.apache.spark.streaming.api.java.JavaDStream;
|
||||
import org.apache.spark.streaming.api.java.JavaPairDStream;
|
||||
|
@ -39,7 +40,7 @@ import java.util.regex.Pattern;
|
|||
* To run this on your local machine, you need to first run a Netcat server
|
||||
* `$ nc -lk 9999`
|
||||
* and then run the example
|
||||
* `$ ./run org.apache.spark.streaming.examples.JavaNetworkWordCount local[2] localhost 9999`
|
||||
* `$ ./run org.apache.spark.examples.streaming.JavaNetworkWordCount local[2] localhost 9999`
|
||||
*/
|
||||
public final class JavaNetworkWordCount {
|
||||
private static final Pattern SPACE = Pattern.compile(" ");
|
|
@ -15,13 +15,14 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples;
|
||||
package org.apache.spark.examples.streaming;
|
||||
|
||||
import com.google.common.collect.Lists;
|
||||
import scala.Tuple2;
|
||||
import org.apache.spark.api.java.JavaRDD;
|
||||
import org.apache.spark.api.java.function.Function2;
|
||||
import org.apache.spark.api.java.function.PairFunction;
|
||||
import org.apache.spark.examples.streaming.StreamingExamples;
|
||||
import org.apache.spark.streaming.Duration;
|
||||
import org.apache.spark.streaming.api.java.JavaDStream;
|
||||
import org.apache.spark.streaming.api.java.JavaPairDStream;
|
|
@ -18,7 +18,7 @@
|
|||
"""
|
||||
The K-means algorithm written from scratch against PySpark. In practice,
|
||||
one may prefer to use the KMeans algorithm in MLlib, as shown in
|
||||
python/examples/mllib/kmeans.py.
|
||||
examples/src/main/python/mllib/kmeans.py.
|
||||
|
||||
This example requires NumPy (http://www.numpy.org/).
|
||||
"""
|
|
@ -20,7 +20,7 @@ A logistic regression implementation that uses NumPy (http://www.numpy.org)
|
|||
to act on batches of input data using efficient matrix operations.
|
||||
|
||||
In practice, one may prefer to use the LogisticRegression algorithm in
|
||||
MLlib, as shown in python/examples/mllib/logistic_regression.py.
|
||||
MLlib, as shown in examples/src/main/python/mllib/logistic_regression.py.
|
||||
"""
|
||||
|
||||
from collections import namedtuple
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.sql.examples
|
||||
package org.apache.spark.examples.sql
|
||||
|
||||
import org.apache.spark.SparkContext
|
||||
import org.apache.spark.sql.SQLContext
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.sql.hive.examples
|
||||
package org.apache.spark.examples.sql.hive
|
||||
|
||||
import org.apache.spark.SparkContext
|
||||
import org.apache.spark.sql._
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import scala.collection.mutable.LinkedList
|
||||
import scala.reflect.ClassTag
|
||||
|
@ -78,7 +78,7 @@ class FeederActor extends Actor {
|
|||
* goes and subscribe to a typical publisher/feeder actor and receives
|
||||
* data.
|
||||
*
|
||||
* @see [[org.apache.spark.streaming.examples.FeederActor]]
|
||||
* @see [[org.apache.spark.examples.streaming.FeederActor]]
|
||||
*/
|
||||
class SampleActorReceiver[T: ClassTag](urlOfPublisher: String)
|
||||
extends Actor with ActorHelper {
|
||||
|
@ -131,9 +131,9 @@ object FeederActor {
|
|||
* <hostname> and <port> describe the AkkaSystem that Spark Sample feeder is running on.
|
||||
*
|
||||
* To run this example locally, you may run Feeder Actor as
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.FeederActor 127.0.1.1 9999`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.FeederActor 127.0.1.1 9999`
|
||||
* and then run the example
|
||||
* `./bin/run-example org.apache.spark.streaming.examples.ActorWordCount local[2] 127.0.1.1 9999`
|
||||
* `./bin/run-example org.apache.spark.examples.streaming.ActorWordCount local[2] 127.0.1.1 9999`
|
||||
*/
|
||||
object ActorWordCount {
|
||||
def main(args: Array[String]) {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.storage.StorageLevel
|
||||
import org.apache.spark.streaming._
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.streaming.{Seconds, StreamingContext}
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
|
@ -27,7 +27,7 @@ import org.apache.spark.streaming.StreamingContext._
|
|||
* <directory> is the directory that Spark Streaming will use to find and read new text files.
|
||||
*
|
||||
* To run this on your local machine on directory `localdir`, run this example
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.HdfsWordCount local[2] localdir`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.HdfsWordCount local[2] localdir`
|
||||
* Then create a text file in `localdir` and the words in the file will get counted.
|
||||
*/
|
||||
object HdfsWordCount {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import java.util.Properties
|
||||
|
||||
|
@ -24,7 +24,6 @@ import kafka.producer._
|
|||
import org.apache.spark.streaming._
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
import org.apache.spark.streaming.kafka._
|
||||
import org.apache.spark.streaming.util.RawTextHelper._
|
||||
|
||||
// scalastyle:off
|
||||
/**
|
||||
|
@ -37,7 +36,7 @@ import org.apache.spark.streaming.util.RawTextHelper._
|
|||
* <numThreads> is the number of threads the kafka consumer should use
|
||||
*
|
||||
* Example:
|
||||
* `./bin/run-example org.apache.spark.streaming.examples.KafkaWordCount local[2] zoo01,zoo02,zoo03 my-consumer-group topic1,topic2 1`
|
||||
* `./bin/run-example org.apache.spark.examples.streaming.KafkaWordCount local[2] zoo01,zoo02,zoo03 my-consumer-group topic1,topic2 1`
|
||||
*/
|
||||
// scalastyle:on
|
||||
object KafkaWordCount {
|
||||
|
@ -59,7 +58,7 @@ object KafkaWordCount {
|
|||
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicpMap).map(_._2)
|
||||
val words = lines.flatMap(_.split(" "))
|
||||
val wordCounts = words.map(x => (x, 1L))
|
||||
.reduceByKeyAndWindow(add _, subtract _, Minutes(10), Seconds(2), 2)
|
||||
.reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
|
||||
wordCounts.print()
|
||||
|
||||
ssc.start()
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.eclipse.paho.client.mqttv3.{MqttClient, MqttClientPersistence, MqttException, MqttMessage, MqttTopic}
|
||||
import org.eclipse.paho.client.mqttv3.persist.MqttDefaultFilePersistence
|
||||
|
@ -79,9 +79,9 @@ object MQTTPublisher {
|
|||
* <MqttbrokerUrl> and <topic> describe where Mqtt publisher is running.
|
||||
*
|
||||
* To run this example locally, you may run publisher as
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.MQTTPublisher tcp://localhost:1883 foo`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.MQTTPublisher tcp://localhost:1883 foo`
|
||||
* and run the example as
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.MQTTWordCount local[2] tcp://localhost:1883 foo`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.MQTTWordCount local[2] tcp://localhost:1883 foo`
|
||||
*/
|
||||
// scalastyle:on
|
||||
object MQTTWordCount {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.streaming.{Seconds, StreamingContext}
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
|
@ -32,7 +32,7 @@ import org.apache.spark.storage.StorageLevel
|
|||
* To run this on your local machine, you need to first run a Netcat server
|
||||
* `$ nc -lk 9999`
|
||||
* and then run the example
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.NetworkWordCount local[2] localhost 9999`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.NetworkWordCount local[2] localhost 9999`
|
||||
*/
|
||||
// scalastyle:on
|
||||
object NetworkWordCount {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import scala.collection.mutable.SynchronizedQueue
|
||||
|
|
@ -15,11 +15,10 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.storage.StorageLevel
|
||||
import org.apache.spark.streaming._
|
||||
import org.apache.spark.streaming.util.RawTextHelper
|
||||
import org.apache.spark.util.IntParam
|
||||
|
||||
/**
|
||||
|
@ -52,9 +51,6 @@ object RawNetworkGrep {
|
|||
val ssc = new StreamingContext(master, "RawNetworkGrep", Milliseconds(batchMillis),
|
||||
System.getenv("SPARK_HOME"), StreamingContext.jarOfClass(this.getClass).toSeq)
|
||||
|
||||
// Warm up the JVMs on master and slave for JIT compilation to kick in
|
||||
RawTextHelper.warmUp(ssc.sparkContext)
|
||||
|
||||
val rawStreams = (1 to numStreams).map(_ =>
|
||||
ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER_2)).toArray
|
||||
val union = ssc.union(rawStreams)
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.streaming.{Time, Seconds, StreamingContext}
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
|
@ -44,7 +44,7 @@ import java.nio.charset.Charset
|
|||
*
|
||||
* and run the example as
|
||||
*
|
||||
* `$ ./run-example org.apache.spark.streaming.examples.RecoverableNetworkWordCount \
|
||||
* `$ ./run-example org.apache.spark.examples.streaming.RecoverableNetworkWordCount \
|
||||
* local[2] localhost 9999 ~/checkpoint/ ~/out`
|
||||
*
|
||||
* If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create
|
||||
|
@ -56,7 +56,7 @@ import java.nio.charset.Charset
|
|||
*
|
||||
* `$ ./spark-class org.apache.spark.deploy.Client -s launch <cluster-url> \
|
||||
* <path-to-examples-jar> \
|
||||
* org.apache.spark.streaming.examples.RecoverableNetworkWordCount <cluster-url> \
|
||||
* org.apache.spark.examples.streaming.RecoverableNetworkWordCount <cluster-url> \
|
||||
* localhost 9999 ~/checkpoint ~/out`
|
||||
*
|
||||
* <path-to-examples-jar> would typically be
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.streaming._
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
|
@ -31,7 +31,7 @@ import org.apache.spark.streaming.StreamingContext._
|
|||
* To run this on your local machine, you need to first run a Netcat server
|
||||
* `$ nc -lk 9999`
|
||||
* and then run the example
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.StatefulNetworkWordCount local[2] localhost 9999`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.StatefulNetworkWordCount local[2] localhost 9999`
|
||||
*/
|
||||
// scalastyle:on
|
||||
object StatefulNetworkWordCount {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.Logging
|
||||
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import com.twitter.algebird._
|
||||
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import com.twitter.algebird.HyperLogLogMonoid
|
||||
import com.twitter.algebird.HyperLogLog._
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import org.apache.spark.streaming.{Seconds, StreamingContext}
|
||||
import StreamingContext._
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples
|
||||
package org.apache.spark.examples.streaming
|
||||
|
||||
import akka.actor.ActorSystem
|
||||
import akka.actor.actorRef2Scala
|
||||
|
@ -68,9 +68,9 @@ object SimpleZeroMQPublisher {
|
|||
* <zeroMQurl> and <topic> describe where zeroMq publisher is running.
|
||||
*
|
||||
* To run this example locally, you may run publisher as
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.SimpleZeroMQPublisher tcp://127.0.1.1:1234 foo.bar`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.SimpleZeroMQPublisher tcp://127.0.1.1:1234 foo.bar`
|
||||
* and run the example as
|
||||
* `$ ./bin/run-example org.apache.spark.streaming.examples.ZeroMQWordCount local[2] tcp://127.0.1.1:1234 foo`
|
||||
* `$ ./bin/run-example org.apache.spark.examples.streaming.ZeroMQWordCount local[2] tcp://127.0.1.1:1234 foo`
|
||||
*/
|
||||
// scalastyle:on
|
||||
object ZeroMQWordCount {
|
|
@ -15,7 +15,7 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples.clickstream
|
||||
package org.apache.spark.examples.streaming.clickstream
|
||||
|
||||
import java.net.ServerSocket
|
||||
import java.io.PrintWriter
|
||||
|
@ -40,8 +40,8 @@ object PageView extends Serializable {
|
|||
/** Generates streaming events to simulate page views on a website.
|
||||
*
|
||||
* This should be used in tandem with PageViewStream.scala. Example:
|
||||
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewGenerator 44444 10
|
||||
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewStream errorRatePerZipCode localhost 44444
|
||||
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewGenerator 44444 10
|
||||
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewStream errorRatePerZipCode localhost 44444
|
||||
*
|
||||
* When running this, you may want to set the root logging level to ERROR in
|
||||
* conf/log4j.properties to reduce the verbosity of the output.
|
|
@ -15,19 +15,19 @@
|
|||
* limitations under the License.
|
||||
*/
|
||||
|
||||
package org.apache.spark.streaming.examples.clickstream
|
||||
package org.apache.spark.examples.streaming.clickstream
|
||||
|
||||
import org.apache.spark.SparkContext._
|
||||
import org.apache.spark.streaming.{Seconds, StreamingContext}
|
||||
import org.apache.spark.streaming.StreamingContext._
|
||||
import org.apache.spark.streaming.examples.StreamingExamples
|
||||
import org.apache.spark.examples.streaming.StreamingExamples
|
||||
// scalastyle:off
|
||||
/** Analyses a streaming dataset of web page views. This class demonstrates several types of
|
||||
* operators available in Spark streaming.
|
||||
*
|
||||
* This should be used in tandem with PageViewStream.scala. Example:
|
||||
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewGenerator 44444 10
|
||||
* $ ./bin/run-example org.apache.spark.streaming.examples.clickstream.PageViewStream errorRatePerZipCode localhost 44444
|
||||
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewGenerator 44444 10
|
||||
* $ ./bin/run-example org.apache.spark.examples.streaming.clickstream.PageViewStream errorRatePerZipCode localhost 44444
|
||||
*/
|
||||
// scalastyle:on
|
||||
object PageViewStream {
|
|
@ -25,7 +25,7 @@ import scala.collection.JavaConversions.mapAsScalaMap
|
|||
private[streaming]
|
||||
object RawTextHelper {
|
||||
|
||||
/**
|
||||
/**
|
||||
* Splits lines and counts the words.
|
||||
*/
|
||||
def splitAndCountPartitions(iter: Iterator[String]): Iterator[(String, Long)] = {
|
||||
|
@ -114,4 +114,3 @@ object RawTextHelper {
|
|||
|
||||
def max(v1: Long, v2: Long) = math.max(v1, v2)
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in a new issue