Commit graph

30 commits

Author SHA1 Message Date
Tathagata Das e8f847a41f [HOTFIX] [TEST] Ignoring flaky tests
org.apache.spark.DriverSuite.driver should exit after finishing without cleanup (SPARK-530)
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2267/

org.apache.spark.deploy.SparkSubmitSuite.includes jars passed in through --jars
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2271/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/

org.apache.spark.streaming.flume.FlumePollingStreamSuite.flume polling test
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/2269/

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #5901 from tdas/ignore-flaky-tests and squashes the following commits:

9cd8667 [Tathagata Das] Ignoring tests.

(cherry picked from commit 8776fe0b93)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
2015-05-05 01:59:03 -07:00
WangTaoTheTonic 7d92db342e [SPARK-6758]block the right jetty package in log
https://issues.apache.org/jira/browse/SPARK-6758

I am not sure if it is ok to block them in test resources too (as we shade jetty in assembly?).

Author: WangTaoTheTonic <wangtao111@huawei.com>

Closes #5406 from WangTaoTheTonic/SPARK-6758 and squashes the following commits:

e09605b [WangTaoTheTonic] block the right jetty package
2015-04-09 17:44:08 -04:00
Reynold Xin 15e0d2bd13 [SPARK-6765] Fix test code style for streaming.
So we can turn style checker on for test code.

Author: Reynold Xin <rxin@databricks.com>

Closes #5409 from rxin/test-style-streaming and squashes the following commits:

7aea69b [Reynold Xin] [SPARK-6765] Fix test code style for streaming.
2015-04-08 00:24:59 -07:00
Kousuke Saruta 85cf063682 [SPARK-5559] [Streaming] [Test] Remove oppotunity we met flakiness when running FlumeStreamSuite
When we run FlumeStreamSuite on Jenkins, sometimes we get error like as follows.

    sbt.ForkMain$ForkError: The code passed to eventually never returned normally. Attempted 52 times over 10.094849836 seconds. Last failure message: Error connecting to localhost/127.0.0.1:23456.
	    at org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:420)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:438)
	    at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	    at org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:307)
	   at org.scalatest.concurrent.Eventually$.eventually(Eventually.scala:478)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite.writeAndVerify(FlumeStreamSuite.scala:116)
           at org.apache.spark.streaming.flume.FlumeStreamSuite.org$apache$spark$streaming$flume$FlumeStreamSuite$$testFlumeStream(FlumeStreamSuite.scala:74)
	   at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply$mcV$sp(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.apache.spark.streaming.flume.FlumeStreamSuite$$anonfun$3.apply(FlumeStreamSuite.scala:66)
	    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
	    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
	    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
	    at org.scalatest.Transformer.apply(Transformer.scala:22)
	    at org.scalatest.Transformer.apply(Transformer.scala:20)
    	    at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
	    at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
	    at org.scalatest.FunSuite.withFixture(FunSuite.scala:1555)
	    at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
	   at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
	    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
	    at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)

This error is caused by check-then-act logic  when it find free-port .

      /** Find a free port */
      private def findFreePort(): Int = {
        Utils.startServiceOnPort(23456, (trialPort: Int) => {
          val socket = new ServerSocket(trialPort)
          socket.close()
          (null, trialPort)
        }, conf)._2
      }

Removing the check-then-act is not easy but we can reduce the chance of having the error by choosing random value for initial port instead of 23456.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #4337 from sarutak/SPARK-5559 and squashes the following commits:

16f109f [Kousuke Saruta] Added `require` to Utils#startServiceOnPort
c39d8b6 [Kousuke Saruta] Merge branch 'SPARK-5559' of github.com:sarutak/spark into SPARK-5559
1610ba2 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
33357e3 [Kousuke Saruta] Changed "findFreePort" method in MQTTStreamSuite and FlumeStreamSuite so that it can choose valid random port
a9029fe [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
9489ef9 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-5559
8212e42 [Kousuke Saruta] Modified default port used in FlumeStreamSuite from 23456 to random value
2015-03-24 16:20:52 +00:00
Sean Owen 34b7c35380 SPARK-4682 [CORE] Consolidate various 'Clock' classes
Another one from JoshRosen 's wish list. The first commit is much smaller and removes 2 of the 4 Clock classes. The second is much larger, necessary for consolidating the streaming one. I put together implementations in the way that seemed simplest. Almost all the change is standardizing class and method names.

Author: Sean Owen <sowen@cloudera.com>

Closes #4514 from srowen/SPARK-4682 and squashes the following commits:

5ed3a03 [Sean Owen] Javadoc Clock classes; make ManualClock private[spark]
169dd13 [Sean Owen] Add support for legacy org.apache.spark.streaming clock class names
277785a [Sean Owen] Reduce the net change in this patch by reversing some unnecessary syntax changes along the way
b5e53df [Sean Owen] FakeClock -> ManualClock; getTime() -> getTimeMillis()
160863a [Sean Owen] Consolidate Streaming Clock class into common util Clock
7c956b2 [Sean Owen] Consolidate Clocks except for Streaming Clock
2015-02-19 15:35:23 -08:00
Hari Shreedharan 0765af9b21 [SPARK-4905][STREAMING] FlumeStreamSuite fix.
Using String constructor instead of CharsetDecoder to see if it fixes the issue of empty strings in Flume test output.

Author: Hari Shreedharan <hshreedharan@apache.org>

Closes #4371 from harishreedharan/Flume-stream-attempted-fix and squashes the following commits:

550d363 [Hari Shreedharan] Fix imports.
8695950 [Hari Shreedharan] Use Charsets.UTF_8 instead of "UTF-8" in String constructors.
af3ba14 [Hari Shreedharan] [SPARK-4905][STREAMING] FlumeStreamSuite fix.
2015-02-09 14:17:14 -08:00
WangTaoTheTonic f7741a9a72 [SPARK-5006][Deploy]spark.port.maxRetries doesn't work
https://issues.apache.org/jira/browse/SPARK-5006

I think the issue is produced in https://github.com/apache/spark/pull/1777.

Not digging mesos's backend yet. Maybe should add same logic either.

Author: WangTaoTheTonic <barneystinson@aliyun.com>
Author: WangTao <barneystinson@aliyun.com>

Closes #3841 from WangTaoTheTonic/SPARK-5006 and squashes the following commits:

8cdf96d [WangTao] indent thing
2d86d65 [WangTaoTheTonic] fix line length
7cdfd98 [WangTaoTheTonic] fit for new HttpServer constructor
61a370d [WangTaoTheTonic] some minor fixes
bc6e1ec [WangTaoTheTonic] rebase
67bcb46 [WangTaoTheTonic] put conf at 3rd position, modify suite class, add comments
f450cd1 [WangTaoTheTonic] startServiceOnPort will use a SparkConf arg
29b751b [WangTaoTheTonic] rebase as ExecutorRunnableUtil changed to ExecutorRunnable
396c226 [WangTaoTheTonic] make the grammar more like scala
191face [WangTaoTheTonic] invalid value name
62ec336 [WangTaoTheTonic] spark.port.maxRetries doesn't work
2015-01-13 09:29:25 -08:00
Sean Owen 4cba6eb420 SPARK-4159 [CORE] Maven build doesn't run JUnit test suites
This PR:

- Reenables `surefire`, and copies config from `scalatest` (which is itself an old fork of `surefire`, so similar)
- Tells `surefire` to test only Java tests
- Enables `surefire` and `scalatest` for all children, and in turn eliminates some duplication.

For me this causes the Scala and Java tests to be run once each, it seems, as desired. It doesn't affect the SBT build but works for Maven. I still need to verify that all of the Scala tests and Java tests are being run.

Author: Sean Owen <sowen@cloudera.com>

Closes #3651 from srowen/SPARK-4159 and squashes the following commits:

2e8a0af [Sean Owen] Remove specialized SPARK_HOME setting for REPL, YARN tests as it appears to be obsolete
12e4558 [Sean Owen] Append to unit-test.log instead of overwriting, so that both surefire and scalatest output is preserved. Also standardize/correct comments a bit.
e6f8601 [Sean Owen] Reenable Java tests by reenabling surefire with config cloned from scalatest; centralize test config in the parent
2015-01-06 12:02:08 -08:00
Josh Rosen 352ed6bbe3 [SPARK-1010] Clean up uses of System.setProperty in unit tests
Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.

This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).

For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:

0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
7a3d224 [Josh Rosen] Fix trait ordering
3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
2014-12-30 18:12:20 -08:00
Prashant Sharma 1c938413ba SPARK-3962 Marked scope as provided for external projects.
Somehow maven shade plugin is set in infinite loop of creating effective pom.

Author: Prashant Sharma <prashant.s@imaginea.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

Closes #2959 from ScrapCodes/SPARK-3962/scope-provided and squashes the following commits:

994d1d3 [Prashant Sharma] Fixed failing flume tests
270b4fb [Prashant Sharma] Removed most of the unused code.
bb3bbfd [Prashant Sharma] SPARK-3962 Marked scope as provided for external.
2014-11-19 14:18:10 -08:00
Aaron Davidson 2ebd1df3f1 [SPARK-4183] Close transport-related resources between SparkContexts
A leak of event loops may be causing test failures.

Author: Aaron Davidson <aaron@databricks.com>

Closes #3053 from aarondav/leak and squashes the following commits:

e676d18 [Aaron Davidson] Typo!
8f96475 [Aaron Davidson] Keep original ssc semantics
7e49f10 [Aaron Davidson] A leak of event loops may be causing test failures.
2014-11-02 16:26:24 -08:00
Tathagata Das 4d26aca770 [SPARK-3912][Streaming] Fixed flakyFlumeStreamSuite
@harishreedharan @pwendell
See JIRA for diagnosis of the problem
https://issues.apache.org/jira/browse/SPARK-3912

The solution was to reimplement it.
1. Find a free port (by binding and releasing a server-scoket), and then use that port
2. Remove thread.sleep()s, instead repeatedly try to create a sender and send data and check whether data was sent. Use eventually() to minimize waiting time.
3. Check whether all the data was received, without caring about batches.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #2773 from tdas/flume-test-fix and squashes the following commits:

93cd7f6 [Tathagata Das] Reimplimented FlumeStreamSuite to be more robust.
2014-10-13 22:46:49 -07:00
Reynold Xin 3888ee2f38 [SPARK-3748] Log thread name in unit test logs
Thread names are useful for correlating failures.

Author: Reynold Xin <rxin@apache.org>

Closes #2600 from rxin/log4j and squashes the following commits:

83ffe88 [Reynold Xin] [SPARK-3748] Log thread name in unit test logs
2014-10-01 01:03:49 -07:00
Sean Owen 8764fe368b SPARK-3744 [STREAMING] FlumeStreamSuite will fail during port contention
Since it looked quite easy, I took the liberty of making a quick PR that just uses `Utils.startServiceOnPort` to fix this. It works locally for me.

Author: Sean Owen <sowen@cloudera.com>

Closes #2601 from srowen/SPARK-3744 and squashes the following commits:

ddc9319 [Sean Owen] Avoid port contention in tests by retrying several ports for Flume stream
2014-09-30 15:18:51 -07:00
GuoQiang Li 905861906e [Minor]Remove extra semicolon in FlumeStreamSuite.scala
Author: GuoQiang Li <witgo@qq.com>

Closes #2265 from witgo/FlumeStreamSuite and squashes the following commits:

6c99e6e [GuoQiang Li] Remove extra semicolon in FlumeStreamSuite.scala
2014-09-04 10:28:23 -07:00
Hari Shreedharan 8c5a222693 [SPARK-3054][STREAMING] Add unit tests for Spark Sink.
This patch adds unit tests for Spark Sink.

It also removes the private[flume] for Spark Sink,
since the sink is instantiated from Flume configuration (looks like this is ignored by reflection which is used by
Flume, but we should still remove it anyway).

Author: Hari Shreedharan <hshreedharan@apache.org>
Author: Hari Shreedharan <hshreedharan@cloudera.com>

Closes #1958 from harishreedharan/spark-sink-test and squashes the following commits:

e3110b9 [Hari Shreedharan] Add a sleep to allow sink to commit the transactions
120b81e [Hari Shreedharan] Fix complexity in threading model in test
4df5be6 [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
c9190d1 [Hari Shreedharan] Indentation and spaces changes
7fedc5a [Hari Shreedharan] Merge remote-tracking branch 'asf/master' into spark-sink-test
abc20cb [Hari Shreedharan] Minor test changes
7b9b649 [Hari Shreedharan] Merge branch 'master' into spark-sink-test
f2c56c9 [Hari Shreedharan] Update SparkSinkSuite.scala
a24aac8 [Hari Shreedharan] Remove unused var
c86d615 [Hari Shreedharan] [SPARK-3054][STREAMING] Add unit tests for Spark Sink.
2014-08-20 04:09:54 -07:00
Hari Shreedharan 95470a03ae [HOTFIX][STREAMING] Allow the JVM/Netty to decide which port to bind to in Flume Polling Tests.
Author: Hari Shreedharan <harishreedharan@gmail.com>

Closes #1820 from harishreedharan/use-free-ports and squashes the following commits:

b939067 [Hari Shreedharan] Remove unused import.
67856a8 [Hari Shreedharan] Remove findFreePort.
0ea51d1 [Hari Shreedharan] Make some changes to getPort to use map on the serverOpt.
1fb0283 [Hari Shreedharan] Merge branch 'master' of https://github.com/apache/spark into use-free-ports
b351651 [Hari Shreedharan] Allow Netty to choose port, and query it to decide the port to bind to. Leaving findFreePort as is, if other tests want to use it at some point.
e6c9620 [Hari Shreedharan] Making sure the second sink uses the correct port.
11c340d [Hari Shreedharan] Add info about race condition to scaladoc.
e89d135 [Hari Shreedharan] Adding Scaladoc.
6013bb0 [Hari Shreedharan] [STREAMING] Find free ports to use before attempting to create Flume Sink in Flume Polling Suite
2014-08-17 19:50:31 -07:00
Andrew Or c6889d2cb9 [HOTFIX][Streaming] Handle port collisions in flume polling test
This is failing my tests in #1777. @tdas

Author: Andrew Or <andrewor14@gmail.com>

Closes #1803 from andrewor14/fix-flaky-streaming-test and squashes the following commits:

ea11a03 [Andrew Or] Catch all exceptions caused by BindExceptions
54a0ca0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-flaky-streaming-test
664095c [Andrew Or] Tone down bind exception message
af3ddc9 [Andrew Or] Handle port collisions in flume polling test
2014-08-06 16:34:53 -07:00
Patrick Wendell 44460ba594 HOTFIX: Fix concurrency issue in FlumePollingStreamSuite.
This has been failing on master. One possible cause is that the port
gets contended if multiple test runs happen concurrently and they
hit this test at the same time. Since this test takes a long time
(60 seconds) that's very plausible. This patch randomizes the port
used in this test to avoid contention.
2014-08-02 01:16:13 -07:00
Hari Shreedharan 800ecff4b1 [STREAMING] SPARK-1729. Make Flume pull data from source, rather than the current pu...
...sh model

Currently Spark uses Flume's internal Avro Protocol to ingest data from Flume. If the executor running the
receiver fails, it currently has to be restarted on the same node to be able to receive data.

This commit adds a new Sink which can be deployed to a Flume agent. This sink can be polled by a new
DStream that is also included in this commit. This model ensures that data can be pulled into Spark from
Flume even if the receiver is restarted on a new node. This also allows the receiver to receive data on
multiple threads for better performance.

Author: Hari Shreedharan <harishreedharan@gmail.com>
Author: Hari Shreedharan <hshreedharan@apache.org>
Author: Tathagata Das <tathagata.das1565@gmail.com>
Author: harishreedharan <hshreedharan@cloudera.com>

Closes #807 from harishreedharan/master and squashes the following commits:

e7f70a3 [Hari Shreedharan] Merge remote-tracking branch 'asf-git/master'
96cfb6f [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
e48d785 [Hari Shreedharan] Documenting flume-sink being ignored for Mima checks.
5f212ce [Hari Shreedharan] Ignore Spark Sink from mima.
981bf62 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
7a1bc6e [Hari Shreedharan] Fix SparkBuild.scala
a082eb3 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
1f47364 [Hari Shreedharan] Minor fixes.
73d6f6d [Hari Shreedharan] Cleaned up tests a bit. Added some docs in multiple places.
65b76b4 [Hari Shreedharan] Fixing the unit test.
e59cc20 [Hari Shreedharan] Use SparkFlumeEvent instead of the new type. Also, Flume Polling Receiver now uses the store(ArrayBuffer) method.
f3c99d1 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
3572180 [Hari Shreedharan] Adding a license header, making Jenkins happy.
799509f [Hari Shreedharan] Fix a compile issue.
3c5194c [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
d248d22 [harishreedharan] Merge pull request #1 from tdas/flume-polling
10b6214 [Tathagata Das] Changed public API, changed sink package, and added java unit test to make sure Java API is callable from Java.
1edc806 [Hari Shreedharan] SPARK-1729. Update logging in Spark Sink.
8c00289 [Hari Shreedharan] More debug messages
393bd94 [Hari Shreedharan] SPARK-1729. Use LinkedBlockingQueue instead of ArrayBuffer to keep track of connections.
120e2a1 [Hari Shreedharan] SPARK-1729. Some test changes and changes to utils classes.
9fd0da7 [Hari Shreedharan] SPARK-1729. Use foreach instead of map for all Options.
8136aa6 [Hari Shreedharan] Adding TransactionProcessor to map on returning batch of data
86aa274 [Hari Shreedharan] Merge remote-tracking branch 'asf/master'
205034d [Hari Shreedharan] Merging master in
4b0c7fc [Hari Shreedharan] FLUME-1729. New Flume-Spark integration.
bda01fc [Hari Shreedharan] FLUME-1729. Flume-Spark integration.
0d69604 [Hari Shreedharan] FLUME-1729. Better Flume-Spark integration.
3c23c18 [Hari Shreedharan] SPARK-1729. New Spark-Flume integration.
70bcc2a [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
d6fa3aa [Hari Shreedharan] SPARK-1729. New Flume-Spark integration.
e7da512 [Hari Shreedharan] SPARK-1729. Fixing import order
9741683 [Hari Shreedharan] SPARK-1729. Fixes based on review.
c604a3c [Hari Shreedharan] SPARK-1729. Optimize imports.
0f10788 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
87775aa [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
8df37e4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
03d6c1c [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
08176ad [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
d24d9d4 [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
6d6776a [Hari Shreedharan] SPARK-1729. Make Flume pull data from source, rather than the current push model
2014-07-29 11:11:29 -07:00
tmalaska 40a8fef4e6 [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915
This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
Adds MIMA binary check exclusions.

Author: tmalaska <ted.malaska@cloudera.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #1347 from tdas/FLUME-1915 and squashes the following commits:

96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
41d5338 [tmalaska] Address line 57 that was too long
12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
2014-07-10 13:15:02 -07:00
Sean Owen 7120a2979d SPARK-1798. Tests should clean up temp files
Three issues related to temp files that tests generate – these should be touched up for hygiene but are not urgent.

Modules have a log4j.properties which directs the unit-test.log output file to a directory like `[module]/target/unit-test.log`. But this ends up creating `[module]/[module]/target/unit-test.log` instead of former.

The `work/` directory is not deleted by "mvn clean", in the parent and in modules. Neither is the `checkpoint/` directory created under the various external modules.

Many tests create a temp directory, which is not usually deleted. This can be largely resolved by calling `deleteOnExit()` at creation and trying to call `Utils.deleteRecursively` consistently to clean up, sometimes in an `@After` method.

_If anyone seconds the motion, I can create a more significant change that introduces a new test trait along the lines of `LocalSparkContext`, which provides management of temp directories for subclasses to take advantage of._

Author: Sean Owen <sowen@cloudera.com>

Closes #732 from srowen/SPARK-1798 and squashes the following commits:

5af578e [Sean Owen] Try to consistently delete test temp dirs and files, and set deleteOnExit() for each
b21b356 [Sean Owen] Remove work/ and checkpoint/ dirs with mvn clean
bdd0f41 [Sean Owen] Remove duplicate module dir in log4j.properties output path for tests
2014-05-12 14:16:19 -07:00
Mridul Muralidharan 968c0187a1 SPARK-1586 Windows build fixes
Unfortunately, this is not exhaustive - particularly hive tests still fail due to path issues.

Author: Mridul Muralidharan <mridulm80@apache.org>

This patch had conflicts when merged, resolved by
Committer: Matei Zaharia <matei@databricks.com>

Closes #505 from mridulm/windows_fixes and squashes the following commits:

ef12283 [Mridul Muralidharan] Move to org.apache.commons.lang3 for StringEscapeUtils. Earlier version was buggy appparently
cdae406 [Mridul Muralidharan] Remove leaked changes from > 2G fix branch
3267f4b [Mridul Muralidharan] Fix build failures
35b277a [Mridul Muralidharan] Fix Scalastyle failures
bc69d14 [Mridul Muralidharan] Change from hardcoded path separator
10c4d78 [Mridul Muralidharan] Use explicit encoding while using getBytes
1337abd [Mridul Muralidharan] fix classpath while running in windows
2014-04-24 20:48:33 -07:00
Tathagata Das 04c37b6f74 [SPARK-1332] Improve Spark Streaming's Network Receiver and InputDStream API [WIP]
The current Network Receiver API makes it slightly complicated to right a new receiver as one needs to create an instance of BlockGenerator as shown in SocketReceiver
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L51

Exposing the BlockGenerator interface has made it harder to improve the receiving process. The API of NetworkReceiver (which was not a very stable API anyways) needs to be change if we are to ensure future stability.

Additionally, the functions like streamingContext.socketStream that create input streams, return DStream objects. That makes it hard to expose functionality (say, rate limits) unique to input dstreams. They should return InputDStream or NetworkInputDStream. This is still not yet implemented.

This PR is blocked on the graceful shutdown PR #247

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #300 from tdas/network-receiver-api and squashes the following commits:

ea27b38 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3a4777c [Tathagata Das] Renamed NetworkInputDStream to ReceiverInputDStream, and ActorReceiver related stuff.
838dd39 [Tathagata Das] Added more events to the StreamingListener to report errors and stopped receivers.
a75c7a6 [Tathagata Das] Address some PR comments and fixed other issues.
91bfa72 [Tathagata Das] Fixed bugs.
8533094 [Tathagata Das] Scala style fixes.
028bde6 [Tathagata Das] Further refactored receiver to allow restarting of a receiver.
43f5290 [Tathagata Das] Made functions that create input streams return InputDStream and NetworkInputDStream, for both Scala and Java.
2c94579 [Tathagata Das] Fixed graceful shutdown by removing interrupts on receiving thread.
9e37a0b [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into network-receiver-api
3223e95 [Tathagata Das] Refactored the code that runs the NetworkReceiver into further classes and traits to make them more testable.
a36cc48 [Tathagata Das] Refactored the NetworkReceiver API for future stability.
2014-04-21 19:04:49 -07:00
Tathagata Das 1f4718c480 Changed SparkConf to not be serializable. And also fixed unit-test log paths in log4j.properties of external modules. 2014-01-14 22:20:14 -08:00
Tathagata Das 4e497db8f3 Removed StreamingContext.registerInputStream and registerOutputStream - they were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation. 2014-01-13 23:23:46 -08:00
Tathagata Das aa99f226a6 Removed XYZFunctions and added XYZUtils as a common Scala and Java interface for creating XYZ streams. 2014-01-07 01:56:15 -08:00
Tathagata Das 3b4c4c7f4d Merge remote-tracking branch 'apache/master' into project-refactor
Conflicts:
	examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java
	streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
	streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
	streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
	streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
	streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
2014-01-06 03:05:52 -08:00
Tathagata Das d0fd3b9ad2 Changed JavaStreamingContextWith*** to ***Function in streaming.api.java.*** package. Also fixed packages of Flume and MQTT tests. 2014-01-06 01:47:53 -08:00
Tathagata Das f4e4066191 Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects. 2013-12-30 11:13:24 -08:00