Patrick Wendell
0bb33076e2
Removing mentions in tests
2014-01-12 16:53:58 -08:00
Tathagata Das
4f39e79c23
Merge remote-tracking branch 'apache/master' into driver-test
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala
2014-01-10 15:47:01 -08:00
Tathagata Das
82f07deeda
Modified streaming.FailureSuite tests to test StreamingContext.getOrCreate.
2014-01-10 15:37:05 -08:00
Tathagata Das
e4bb845238
Updated docs based on Patrick's comments in PR 383.
2014-01-10 12:17:09 -08:00
Tathagata Das
2213a5a47f
Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test
2014-01-10 05:06:22 -08:00
Tathagata Das
740730a179
Fixed conf/slaves and updated docs.
2014-01-10 05:06:15 -08:00
Tathagata Das
4f609f7901
Removed spark.hostPort and other setting from SparkConf before saving to checkpoint.
2014-01-10 12:58:07 +00:00
Tathagata Das
d7ec73ac76
Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test
2014-01-10 11:44:17 +00:00
Tathagata Das
9d3d9c8251
Refactored graph checkpoint file reading and writing code to make it cleaner and easily debuggable.
2014-01-10 11:44:02 +00:00
Patrick Wendell
997c830e0b
Merge pull request #363 from pwendell/streaming-logs
...
Set default logging to WARN for Spark streaming examples.
This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
2014-01-09 22:22:20 -08:00
Tathagata Das
38d75e18fa
Merge remote-tracking branch 'apache/master' into driver-test
2014-01-09 19:31:36 -08:00
Tathagata Das
4a5558ca99
Fixed bugs in reading of checkpoints.
2014-01-10 03:28:39 +00:00
Tathagata Das
f1d206c6b4
Merge branch 'standalone-driver' into driver-test
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2014-01-09 15:06:24 -08:00
Tathagata Das
6f713e2a3e
Changed the way StreamingContext finds and reads checkpoint files, and added JavaStreamingContext.getOrCreate.
2014-01-09 13:42:04 -08:00
Patrick Wendell
35f80da21a
Set default logging to WARN for Spark streaming examples.
...
This programatically sets the log level to WARN by default for streaming
tests. If the user has already specified a log4j.properties file,
the user's file will take precedence over this default.
2014-01-09 10:42:58 -08:00
Matei Zaharia
a01f3401e3
Use typed getters for configuration settings
2014-01-09 00:07:29 -08:00
Tathagata Das
a17cc602ac
More bug fixes.
2014-01-08 04:12:05 -08:00
Tathagata Das
0b7a132d03
Modified checkpoing file clearing policy.
2014-01-08 03:22:06 -08:00
Tathagata Das
3b4c4c7f4d
Merge remote-tracking branch 'apache/master' into project-refactor
...
Conflicts:
examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
2014-01-06 03:05:52 -08:00
Tathagata Das
ac1f4b06c1
Added a hashmap to cache file mod times.
2014-01-05 23:42:53 -08:00
Tathagata Das
2394794591
Merge branch 'filestream-fix' into driver-test
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
2014-01-06 02:23:53 +00:00
Tathagata Das
8e88db3ca5
Bug fixes to the DriverRunner and minor changes here and there.
2014-01-06 02:21:56 +00:00
Patrick Wendell
79f52809c8
Removing SPARK_EXAMPLES_JAR in the code
2014-01-05 11:49:42 -08:00
Tathagata Das
a1b8dd53e3
Added StreamingContext.getOrCreate to for automatic recovery, and added RecoverableNetworkWordCount example to use it.
2014-01-02 19:07:22 -08:00
Patrick Wendell
588a1695f4
Merge pull request #297 from tdas/window-improvement
...
Improvements to DStream window ops and refactoring of Spark's CheckpointSuite
- Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located.
- Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads.
- Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary.
- Added mapSideCombine option to combineByKeyAndWindow.
2014-01-02 13:20:54 -08:00
Matei Zaharia
e2c68642c6
Miscellaneous fixes from code review.
...
Also replaced SparkConf.getOrElse with just a "get" that takes a default
value, and added getInt, getLong, etc to make code that uses this
simpler later on.
2014-01-01 22:03:39 -05:00
Matei Zaharia
45ff8f413d
Merge remote-tracking branch 'apache/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
2014-01-01 21:25:00 -05:00
Patrick Wendell
f8d245bdfc
Merge remote-tracking branch 'apache-github/master' into log4j-fix-2
...
Conflicts:
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2014-01-01 16:10:51 -08:00
Matei Zaharia
42bcfb2bb2
Fix two compile errors introduced in merge
2013-12-31 18:26:23 -05:00
Matei Zaharia
ba9338f104
Merge remote-tracking branch 'apache/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
2013-12-31 18:23:14 -05:00
Tathagata Das
fcd17a1e8e
Fixed comments and long lines based on comments on PR 289.
2013-12-31 02:01:45 -08:00
Tathagata Das
87b915f221
Removed extra empty lines.
2013-12-31 00:42:10 -08:00
Tathagata Das
3ab297adaa
Removed unnecessary comments.
2013-12-31 00:38:19 -08:00
Patrick Wendell
18181e6c41
Removing initLogging entirely
2013-12-30 23:39:47 -08:00
Tathagata Das
f4e4066191
Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
2013-12-30 11:13:24 -08:00
Matei Zaharia
0bd1900cbc
Fix a few settings that were being read as system properties after merge
2013-12-29 15:38:46 -05:00
Matei Zaharia
b4ceed40d6
Merge remote-tracking branch 'origin/master' into conf2
...
Conflicts:
core/src/main/scala/org/apache/spark/SparkContext.scala
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala
core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala
core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala
core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala
new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala
streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
2013-12-29 15:08:08 -05:00
Matei Zaharia
20631348d1
Fix other failing tests
2013-12-28 23:17:58 -05:00
Matei Zaharia
0900d5c72a
Add a StreamingContext constructor that takes a conf object
2013-12-28 21:38:07 -05:00
Matei Zaharia
642029e7f4
Various fixes to configuration code
...
- Got rid of global SparkContext.globalConf
- Pass SparkConf to serializers and compression codecs
- Made SparkConf public instead of private[spark]
- Improved API of SparkContext and SparkConf
- Switched executor environment vars to be passed through SparkConf
- Fixed some places that were still using system properties
- Fixed some tests, though others are still failing
This still fails several tests in core, repl and streaming, likely due
to properties not being set or cleared correctly (some of the tests run
fine in isolation).
2013-12-28 17:13:15 -05:00
Tathagata Das
271e3237f3
Minor changes in comments and strings to address comments in PR 289.
2013-12-27 12:26:57 -08:00
Tathagata Das
6e43039614
Refactored streaming project to separate out the twitter functionality.
2013-12-26 18:02:49 -08:00
Tathagata Das
577c8cc834
Removed unncessary options from WindowedDStream.
2013-12-26 14:17:16 -08:00
Tathagata Das
3618d70b2a
Added warning if filestream adds files with no data in them (file RDDs have 0 partitions).
2013-12-26 12:45:40 -08:00
Tathagata Das
be64719138
Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).
2013-12-26 12:33:12 -08:00
Tathagata Das
069cb14bdc
Updated groupByKeyAndWindow to be computed incrementally, and added mapSideCombine to combineByKeyAndWindow.
2013-12-26 02:58:29 -08:00
Tathagata Das
bacc65cf28
Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.
2013-12-26 10:18:46 +00:00
Tathagata Das
d4dfab503a
Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.
2013-12-24 14:01:13 -08:00
Prashant Sharma
2573add94c
spark-544, introducing SparkConf and related configuration overhaul.
2013-12-25 00:09:36 +05:30
Tathagata Das
e9165d2a39
Merge branch 'scheduler-update' into window-improvement
2013-12-23 17:49:41 -08:00