[SPARK-13874][DOC] Remove docs of streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter
## What changes were proposed in this pull request? This PR removes all docs about the old streaming-akka, streaming-zeromq, streaming-mqtt and streaming-twitter projects since I have already copied them to https://github.com/spark-packages Also remove mqtt_wordcount.py that I forgot to remove previously. ## How was this patch tested? Jenkins PR Build. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11824 from zsxwing/remove-doc.
This commit is contained in:
parent
13945dd83b
commit
d23ad7c1c9
1
NOTICE
1
NOTICE
|
@ -48,7 +48,6 @@ Eclipse Public License 1.0
|
|||
|
||||
The following components are provided under the Eclipse Public License 1.0. See project link for details.
|
||||
|
||||
(Eclipse Public License - Version 1.0) mqtt-client (org.eclipse.paho:mqtt-client:0.4.0 - http://www.eclipse.org/paho/mqtt-client)
|
||||
(Eclipse Public License v1.0) Eclipse JDT Core (org.eclipse.jdt:core:3.1.1 - http://www.eclipse.org/jdt/)
|
||||
|
||||
========================================================================
|
||||
|
|
|
@ -11,7 +11,7 @@ description: Spark Streaming programming guide and tutorial for Spark SPARK_VERS
|
|||
# Overview
|
||||
Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput,
|
||||
fault-tolerant stream processing of live data streams. Data can be ingested from many sources
|
||||
like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets, and can be processed using complex
|
||||
like Kafka, Flume, Kinesis, or TCP sockets, and can be processed using complex
|
||||
algorithms expressed with high-level functions like `map`, `reduce`, `join` and `window`.
|
||||
Finally, processed data can be pushed out to filesystems, databases,
|
||||
and live dashboards. In fact, you can apply Spark's
|
||||
|
@ -419,9 +419,6 @@ some of the common ones are as follows.
|
|||
<tr><td> Kafka </td><td> spark-streaming-kafka_{{site.SCALA_BINARY_VERSION}} </td></tr>
|
||||
<tr><td> Flume </td><td> spark-streaming-flume_{{site.SCALA_BINARY_VERSION}} </td></tr>
|
||||
<tr><td> Kinesis<br/></td><td>spark-streaming-kinesis-asl_{{site.SCALA_BINARY_VERSION}} [Amazon Software License] </td></tr>
|
||||
<tr><td> Twitter </td><td> spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}} </td></tr>
|
||||
<tr><td> ZeroMQ </td><td> spark-streaming-zeromq_{{site.SCALA_BINARY_VERSION}} </td></tr>
|
||||
<tr><td> MQTT </td><td> spark-streaming-mqtt_{{site.SCALA_BINARY_VERSION}} </td></tr>
|
||||
<tr><td></td><td></td></tr>
|
||||
</table>
|
||||
|
||||
|
@ -595,7 +592,7 @@ Spark Streaming provides two categories of built-in streaming sources.
|
|||
|
||||
- *Basic sources*: Sources directly available in the StreamingContext API.
|
||||
Examples: file systems, and socket connections.
|
||||
- *Advanced sources*: Sources like Kafka, Flume, Kinesis, Twitter, etc. are available through
|
||||
- *Advanced sources*: Sources like Kafka, Flume, Kinesis, etc. are available through
|
||||
extra utility classes. These require linking against extra dependencies as discussed in the
|
||||
[linking](#linking) section.
|
||||
|
||||
|
@ -672,38 +669,12 @@ for Java, and [StreamingContext](api/python/pyspark.streaming.html#pyspark.strea
|
|||
{:.no_toc}
|
||||
|
||||
<span class="badge" style="background-color: grey">Python API</span> As of Spark {{site.SPARK_VERSION_SHORT}},
|
||||
out of these sources, Kafka, Kinesis, Flume and MQTT are available in the Python API.
|
||||
out of these sources, Kafka, Kinesis and Flume are available in the Python API.
|
||||
|
||||
This category of sources require interfacing with external non-Spark libraries, some of them with
|
||||
complex dependencies (e.g., Kafka and Flume). Hence, to minimize issues related to version conflicts
|
||||
of dependencies, the functionality to create DStreams from these sources has been moved to separate
|
||||
libraries that can be [linked](#linking) to explicitly when necessary. For example, if you want to
|
||||
create a DStream using data from Twitter's stream of tweets, you have to do the following:
|
||||
|
||||
1. *Linking*: Add the artifact `spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}}` to the
|
||||
SBT/Maven project dependencies.
|
||||
1. *Programming*: Import the `TwitterUtils` class and create a DStream with
|
||||
`TwitterUtils.createStream` as shown below.
|
||||
1. *Deploying*: Generate an uber JAR with all the dependencies (including the dependency
|
||||
`spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}}` and its transitive dependencies) and
|
||||
then deploy the application. This is further explained in the [Deploying section](#deploying-applications).
|
||||
|
||||
<div class="codetabs">
|
||||
<div data-lang="scala">
|
||||
{% highlight scala %}
|
||||
import org.apache.spark.streaming.twitter._
|
||||
|
||||
TwitterUtils.createStream(ssc, None)
|
||||
{% endhighlight %}
|
||||
</div>
|
||||
<div data-lang="java">
|
||||
{% highlight java %}
|
||||
import org.apache.spark.streaming.twitter.*;
|
||||
|
||||
TwitterUtils.createStream(jssc);
|
||||
{% endhighlight %}
|
||||
</div>
|
||||
</div>
|
||||
libraries that can be [linked](#linking) to explicitly when necessary.
|
||||
|
||||
Note that these advanced sources are not available in the Spark shell, hence applications based on
|
||||
these advanced sources cannot be tested in the shell. If you really want to use them in the Spark
|
||||
|
@ -718,15 +689,6 @@ Some of these advanced sources are as follows.
|
|||
|
||||
- **Kinesis:** Spark Streaming {{site.SPARK_VERSION_SHORT}} is compatible with Kinesis Client Library 1.2.1. See the [Kinesis Integration Guide](streaming-kinesis-integration.html) for more details.
|
||||
|
||||
- **Twitter:** Spark Streaming's TwitterUtils uses Twitter4j to get the public stream of tweets using
|
||||
[Twitter's Streaming API](https://dev.twitter.com/docs/streaming-apis). Authentication information
|
||||
can be provided by any of the [methods](http://twitter4j.org/en/configuration.html) supported by
|
||||
Twitter4J library. You can either get the public stream, or get the filtered stream based on a
|
||||
keywords. See the API documentation ([Scala](api/scala/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
|
||||
[Java](api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html)) and examples
|
||||
([TwitterPopularTags]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterPopularTags.scala)
|
||||
and [TwitterAlgebirdCMS]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/TwitterAlgebirdCMS.scala)).
|
||||
|
||||
### Custom Sources
|
||||
{:.no_toc}
|
||||
|
||||
|
@ -1927,10 +1889,10 @@ To run a Spark Streaming applications, you need to have the following.
|
|||
- *Package the application JAR* - You have to compile your streaming application into a JAR.
|
||||
If you are using [`spark-submit`](submitting-applications.html) to start the
|
||||
application, then you will not need to provide Spark and Spark Streaming in the JAR. However,
|
||||
if your application uses [advanced sources](#advanced-sources) (e.g. Kafka, Flume, Twitter),
|
||||
if your application uses [advanced sources](#advanced-sources) (e.g. Kafka, Flume),
|
||||
then you will have to package the extra artifact they link to, along with their dependencies,
|
||||
in the JAR that is used to deploy the application. For example, an application using `TwitterUtils`
|
||||
will have to include `spark-streaming-twitter_{{site.SCALA_BINARY_VERSION}}` and all its
|
||||
in the JAR that is used to deploy the application. For example, an application using `KafkaUtils`
|
||||
will have to include `spark-streaming-kafka_{{site.SCALA_BINARY_VERSION}}` and all its
|
||||
transitive dependencies in the application JAR.
|
||||
|
||||
- *Configuring sufficient memory for the executors* - Since the received data must be stored in
|
||||
|
@ -2398,8 +2360,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are
|
|||
Between Spark 0.9.1 and Spark 1.0, there were a few API changes made to ensure future API stability.
|
||||
This section elaborates the steps required to migrate your existing code to 1.0.
|
||||
|
||||
**Input DStreams**: All operations that create an input stream (e.g., `StreamingContext.socketStream`,
|
||||
`FlumeUtils.createStream`, etc.) now returns
|
||||
**Input DStreams**: All operations that create an input stream (e.g., `StreamingContext.socketStream`, `FlumeUtils.createStream`, etc.) now returns
|
||||
[InputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream) /
|
||||
[ReceiverInputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream)
|
||||
(instead of DStream) for Scala, and [JavaInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaInputDStream.html) /
|
||||
|
@ -2443,9 +2404,13 @@ Please refer to the project for more details.
|
|||
# Where to Go from Here
|
||||
* Additional guides
|
||||
- [Kafka Integration Guide](streaming-kafka-integration.html)
|
||||
- [Flume Integration Guide](streaming-flume-integration.html)
|
||||
- [Kinesis Integration Guide](streaming-kinesis-integration.html)
|
||||
- [Custom Receiver Guide](streaming-custom-receivers.html)
|
||||
* External DStream data sources:
|
||||
- [DStream MQTT](https://github.com/spark-packages/dstream-mqtt)
|
||||
- [DStream Twitter](https://github.com/spark-packages/dstream-twitter)
|
||||
- [DStream Akka](https://github.com/spark-packages/dstream-akka)
|
||||
- [DStream ZeroMQ](https://github.com/spark-packages/dstream-zeromq)
|
||||
* API documentation
|
||||
- Scala docs
|
||||
* [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and
|
||||
|
@ -2453,9 +2418,6 @@ Please refer to the project for more details.
|
|||
* [KafkaUtils](api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$),
|
||||
[FlumeUtils](api/scala/index.html#org.apache.spark.streaming.flume.FlumeUtils$),
|
||||
[KinesisUtils](api/scala/index.html#org.apache.spark.streaming.kinesis.KinesisUtils$),
|
||||
[TwitterUtils](api/scala/index.html#org.apache.spark.streaming.twitter.TwitterUtils$),
|
||||
[ZeroMQUtils](api/scala/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$), and
|
||||
[MQTTUtils](api/scala/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
|
||||
- Java docs
|
||||
* [JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html),
|
||||
[JavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaDStream.html) and
|
||||
|
@ -2463,9 +2425,6 @@ Please refer to the project for more details.
|
|||
* [KafkaUtils](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html),
|
||||
[FlumeUtils](api/java/index.html?org/apache/spark/streaming/flume/FlumeUtils.html),
|
||||
[KinesisUtils](api/java/index.html?org/apache/spark/streaming/kinesis/KinesisUtils.html)
|
||||
[TwitterUtils](api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html),
|
||||
[ZeroMQUtils](api/java/index.html?org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
|
||||
[MQTTUtils](api/java/index.html?org/apache/spark/streaming/mqtt/MQTTUtils.html)
|
||||
- Python docs
|
||||
* [StreamingContext](api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext) and [DStream](api/python/pyspark.streaming.html#pyspark.streaming.DStream)
|
||||
* [KafkaUtils](api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils)
|
||||
|
|
|
@ -1,59 +0,0 @@
|
|||
#
|
||||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
#
|
||||
|
||||
"""
|
||||
A sample wordcount with MqttStream stream
|
||||
Usage: mqtt_wordcount.py <broker url> <topic>
|
||||
|
||||
To run this in your local machine, you need to setup a MQTT broker and publisher first,
|
||||
Mosquitto is one of the open source MQTT Brokers, see
|
||||
http://mosquitto.org/
|
||||
Eclipse paho project provides number of clients and utilities for working with MQTT, see
|
||||
http://www.eclipse.org/paho/#getting-started
|
||||
|
||||
and then run the example
|
||||
`$ bin/spark-submit --jars \
|
||||
external/mqtt-assembly/target/scala-*/spark-streaming-mqtt-assembly-*.jar \
|
||||
examples/src/main/python/streaming/mqtt_wordcount.py \
|
||||
tcp://localhost:1883 foo`
|
||||
"""
|
||||
|
||||
import sys
|
||||
|
||||
from pyspark import SparkContext
|
||||
from pyspark.streaming import StreamingContext
|
||||
from pyspark.streaming.mqtt import MQTTUtils
|
||||
|
||||
if __name__ == "__main__":
|
||||
if len(sys.argv) != 3:
|
||||
print >> sys.stderr, "Usage: mqtt_wordcount.py <broker url> <topic>"
|
||||
exit(-1)
|
||||
|
||||
sc = SparkContext(appName="PythonStreamingMQTTWordCount")
|
||||
ssc = StreamingContext(sc, 1)
|
||||
|
||||
brokerUrl = sys.argv[1]
|
||||
topic = sys.argv[2]
|
||||
|
||||
lines = MQTTUtils.createStream(ssc, brokerUrl, topic)
|
||||
counts = lines.flatMap(lambda line: line.split(" ")) \
|
||||
.map(lambda word: (word, 1)) \
|
||||
.reduceByKey(lambda a, b: a+b)
|
||||
counts.pprint()
|
||||
|
||||
ssc.start()
|
||||
ssc.awaitTermination()
|
|
@ -1,31 +0,0 @@
|
|||
Copyright (c) 2009, Mikio L. Braun and contributors
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above
|
||||
copyright notice, this list of conditions and the following
|
||||
disclaimer in the documentation and/or other materials provided
|
||||
with the distribution.
|
||||
|
||||
* Neither the name of the Technische Universität Berlin nor the
|
||||
names of its contributors may be used to endorse or promote
|
||||
products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
@ -88,16 +88,11 @@ object MimaBuild {
|
|||
|
||||
def mimaSettings(sparkHome: File, projectRef: ProjectRef) = {
|
||||
val organization = "org.apache.spark"
|
||||
// The resolvers setting for MQTT Repository is needed for mqttv3(1.0.1)
|
||||
// because spark-streaming-mqtt(1.6.0) depends on it.
|
||||
// Remove the setting on updating previousSparkVersion.
|
||||
val previousSparkVersion = "1.6.0"
|
||||
val fullId = "spark-" + projectRef.project + "_2.11"
|
||||
mimaDefaultSettings ++
|
||||
Seq(previousArtifact := Some(organization % fullId % previousSparkVersion),
|
||||
binaryIssueFilters ++= ignoredABIProblems(sparkHome, version.value),
|
||||
sbt.Keys.resolvers +=
|
||||
"MQTT Repository" at "https://repo.eclipse.org/content/repositories/paho-releases")
|
||||
binaryIssueFilters ++= ignoredABIProblems(sparkHome, version.value))
|
||||
}
|
||||
|
||||
}
|
||||
|
|
|
@ -29,10 +29,3 @@ pyspark.streaming.flume.module
|
|||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
pyspark.streaming.mqtt module
|
||||
-----------------------------
|
||||
.. automodule:: pyspark.streaming.mqtt
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
Loading…
Reference in a new issue