Commit graph

20 commits

Author SHA1 Message Date
Tathagata Das 04c37b6f74 [SPARK-1332] Improve Spark Streaming's Network Receiver and InputDStream API [WIP]
The current Network Receiver API makes it slightly complicated to right a new receiver as one needs to create an instance of BlockGenerator as shown in SocketReceiver
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L51

Exposing the BlockGenerator interface has made it harder to improve the receiving process. The API of NetworkReceiver (which was not a very stable API anyways) needs to be change if we are to ensure future stability.

Additionally, the functions like streamingContext.socketStream that create input streams, return DStream objects. That makes it hard to expose functionality (say, rate limits) unique to input dstreams. They should return InputDStream or NetworkInputDStream. This is still not yet implemented.

This PR is blocked on the graceful shutdown PR #247

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #300 from tdas/network-receiver-api and squashes the following commits:

ea27b38 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3a4777c [Tathagata Das] Renamed NetworkInputDStream to ReceiverInputDStream, and ActorReceiver related stuff.
838dd39 [Tathagata Das] Added more events to the StreamingListener to report errors and stopped receivers.
a75c7a6 [Tathagata Das] Address some PR comments and fixed other issues.
91bfa72 [Tathagata Das] Fixed bugs.
8533094 [Tathagata Das] Scala style fixes.
028bde6 [Tathagata Das] Further refactored receiver to allow restarting of a receiver.
43f5290 [Tathagata Das] Made functions that create input streams return InputDStream and NetworkInputDStream, for both Scala and Java.
2c94579 [Tathagata Das] Fixed graceful shutdown by removing interrupts on receiving thread.
9e37a0b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3223e95 [Tathagata Das] Refactored the code that runs the NetworkReceiver into further classes and traits to make them more testable.
a36cc48 [Tathagata Das] Refactored the NetworkReceiver API for future stability.
2014-04-21 19:04:49 -07:00
Sandeep 930b70f052 Remove Unnecessary Whitespace's
stack these together in a commit else they show up chunk by chunk in different commits.

Author: Sandeep <sandeep@techaddict.me>

Closes #380 from techaddict/white_space and squashes the following commits:

b58f294 [Sandeep] Remove Unnecessary Whitespace's
2014-04-10 15:04:13 -07:00
Sandy Ryza a99fb3747a SPARK-1193. Fix indentation in pom.xmls
Author: Sandy Ryza <sandy@cloudera.com>

Closes #91 from sryza/sandy-spark-1193 and squashes the following commits:

a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls
2014-03-07 23:10:35 -08:00
Patrick Wendell c3f5e07533 SPARK-1121: Include avro for yarn-alpha builds
This lets us explicitly include Avro based on a profile for 0.23.X
builds. It makes me sad how convoluted it is to express this logic
in Maven. @tgraves and @sryza curious if this works for you.

I'm also considering just reverting to how it was before. The only
real problem was that Spark advertised a dependency on Avro
even though it only really depends transitively on Avro through
other deps.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #49 from pwendell/avro-build-fix and squashes the following commits:

8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
2014-03-02 15:18:19 -08:00
Patrick Wendell 1fd2bfd3dd Remove remaining references to incubation
This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #51 from pwendell/tlp and squashes the following commits:

d553b1b [Patrick Wendell] Remove remaining references to incubation
2014-03-02 01:00:16 -08:00
Sean Owen c0ef3afa82 SPARK-1071: Tidy logging strategy and use of log4j
Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger.

Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j.

This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should:

- Exclude dependencies on log4j, slf4j-log4j12 from Spark
- Include dependency on log4j-over-slf4j
- Include dependency on another logger X, and another slf4j-X
- Recreate any log config that Spark does, that is needed, in the other logger's config

That sounds about right.

Here are the key changes:

- Include the jcl-over-slf4j shim everywhere by depending on it in core.
- Exclude dependencies on commons-logging from third-party libraries.
- Include the jul-to-slf4j shim everywhere by depending on it in core.
- Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings
- Added missing slf4j-log4j12 binding to GraphX, Bagel module tests

And minor/incidental changes:

- Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2
- (Remove a duplicate HBase dependency declaration in SparkBuild.scala)
- (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me)

Author: Sean Owen <sowen@cloudera.com>

Closes #570 from srowen/SPARK-1071 and squashes the following commits:

52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core.
77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j
2014-02-23 11:40:55 -08:00
Mark Hamstra c2341c92bb Merge pull request #542 from markhamstra/versionBump. Closes #542.
Version number to 1.0.0-SNAPSHOT

Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore.

@pwendell

Author: Mark Hamstra <markhamstra@gmail.com>

== Merge branch commits ==

commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71
Author: Mark Hamstra <markhamstra@gmail.com>
Date:   Wed Feb 5 09:30:32 2014 -0800

    Version number to 1.0.0-SNAPSHOT
2014-02-08 16:00:43 -08:00
Henry Saputra 0386f42e38 Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala
Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency

Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code.
This PR is to change it to => to get some consistency on the Scala code.

If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of =>

And remove unused imports found in TwitterInputDStream.scala while I was there =)

Author: Henry Saputra <hsaputra@apache.org>

== Merge branch commits ==

commit 29c1771d346dff901b0b778f764e6b4409900234
Author: Henry Saputra <hsaputra@apache.org>
Date:   Sat Feb 1 22:05:16 2014 -0800

    Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.
2014-02-02 21:51:17 -08:00
Tathagata Das 1f4718c480 Changed SparkConf to not be serializable. And also fixed unit-test log paths in log4j.properties of external modules. 2014-01-14 22:20:14 -08:00
Tathagata Das 4e497db8f3 Removed StreamingContext.registerInputStream and registerOutputStream - they were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation. 2014-01-13 23:23:46 -08:00
Tathagata Das ffa1d38ef1 Fixed import formatting. 2014-01-12 22:27:07 -08:00
Tathagata Das 034f89aaab Fixed persistence logic of WindowedDStream, and fixed default persistence level of input streams. 2014-01-12 19:02:27 -08:00
Tathagata Das 448aef6790 Moved DStream, DStreamCheckpointData and PairDStream from org.apache.spark.streaming to org.apache.spark.streaming.dstream. 2014-01-12 11:31:54 -08:00
Tathagata Das c5921e5c61 Fixed bugs. 2014-01-12 01:12:08 -08:00
Tathagata Das aa99f226a6 Removed XYZFunctions and added XYZUtils as a common Scala and Java interface for creating XYZ streams. 2014-01-07 01:56:15 -08:00
Tathagata Das d0fd3b9ad2 Changed JavaStreamingContextWith*** to ***Function in streaming.api.java.*** package. Also fixed packages of Flume and MQTT tests. 2014-01-06 01:47:53 -08:00
Tathagata Das 87b915f221 Removed extra empty lines. 2013-12-31 00:42:10 -08:00
Tathagata Das 97630849ff Added pom.xml for external projects and removed unnecessary dependencies and repositoris from other poms and sbt. 2013-12-31 00:28:57 -08:00
Tathagata Das f4e4066191 Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects. 2013-12-30 11:13:24 -08:00
Tathagata Das 6e43039614 Refactored streaming project to separate out the twitter functionality. 2013-12-26 18:02:49 -08:00