ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	f5ae38af87	SPARK-1158: Fix flaky RateLimitedOutputStreamSuite. There was actually a problem with the RateLimitedOutputStream implementation where the first second doesn't write anything because of integer rounding. So RateLimitedOutputStream was overly aggressive in throttling. Author: Reynold Xin <rxin@apache.org> Closes #55 from rxin/ratelimitest and squashes the following commits: 52ce1b7 [Reynold Xin] SPARK-1158: Fix flaky RateLimitedOutputStreamSuite.	2014-03-03 21:24:19 -08:00
Sean Owen	12bbca2065	SPARK 1084.1 (resubmitted) (Ported from https://github.com/apache/incubator-spark/pull/637 ) Author: Sean Owen <sowen@cloudera.com> Closes #31 from srowen/SPARK-1084.1 and squashes the following commits: 6c4a32c [Sean Owen] Suppress warnings about legitimate unchecked array creations, or change code to avoid it f35b833 [Sean Owen] Fix two misc javadoc problems 254e8ef [Sean Owen] Fix one new style error introduced in scaladoc warning commit 5b2fce2 [Sean Owen] Fix scaladoc invocation warning, and enable javac warnings properly, with plugin config updates 007762b [Sean Owen] Remove dead scaladoc links b8ff8cb [Sean Owen] Replace deprecated Ant <tasks> with <target>	2014-02-27 11:12:21 -08:00
Prashant Sharma	919bd7f669	Merge pull request #567 from ScrapCodes/style2. SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2 Continuation of PR #557 With this all scala style errors are fixed across the code base !! The reason for creating a separate PR was to not interrupt an already reviewed and ready to merge PR. Hope this gets reviewed soon and merged too. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #567 and squashes the following commits: 3b1ec30 [Prashant Sharma] scala style fixes	2014-02-09 22:17:52 -08:00
Patrick Wendell	b69f8b2a01	Merge pull request #557 from ScrapCodes/style. Closes #557 . SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Author: Patrick Wendell <pwendell@gmail.com> Author: Prashant Sharma <scrapcodes@gmail.com> == Merge branch commits == commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4 Author: Prashant Sharma <scrapcodes@gmail.com> Date: Sun Feb 9 17:39:07 2014 +0530 scala style fixes commit f91709887a8e0b608c5c2b282db19b8a44d53a43 Author: Patrick Wendell <pwendell@gmail.com> Date: Fri Jan 24 11:22:53 2014 -0800 Adding scalastyle snapshot	2014-02-09 10:09:19 -08:00
Henry Saputra	0386f42e38	Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code. This PR is to change it to => to get some consistency on the Scala code. If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of => And remove unused imports found in TwitterInputDStream.scala while I was there =) Author: Henry Saputra <hsaputra@apache.org> == Merge branch commits == commit 29c1771d346dff901b0b778f764e6b4409900234 Author: Henry Saputra <hsaputra@apache.org> Date: Sat Feb 1 22:05:16 2014 -0800 Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.	2014-02-02 21:51:17 -08:00
Josh Rosen	740e865f40	Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040) This fixes an issue where collectAsMap() could fail when called on a JavaPairRDD that was derived by transforming a non-JavaPairRDD. The root problem was that we were creating the JavaPairRDD's ClassTag by casting a ClassTag[AnyRef] to a ClassTag[Tuple2[K2, V2]]. To fix this, I cast a ClassTag[Tuple2[_, _]] instead, since this actually produces a ClassTag of the appropriate type because ClassTags don't capture type parameters: scala> implicitly[ClassTag[Tuple2[_, _]]] == implicitly[ClassTag[Tuple2[Int, Int]]] res8: Boolean = true scala> implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[Tuple2[Int, Int]]] == implicitly[ClassTag[Tuple2[Int, Int]]] res9: Boolean = false	2014-01-25 16:41:12 -08:00
Tathagata Das	11e6534d92	Updated java API docs for streaming, along with very minor changes in the code examples.	2014-01-16 14:44:02 -08:00
Tathagata Das	9e6375349e	Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.	2014-01-15 12:15:46 -08:00
Tathagata Das	0e15bd7827	Merge remote-tracking branch 'apache/master' into filestream-fix	2014-01-14 22:21:20 -08:00
Tathagata Das	1f4718c480	Changed SparkConf to not be serializable. And also fixed unit-test log paths in log4j.properties of external modules.	2014-01-14 22:20:14 -08:00
Patrick Wendell	23034798d7	Add missing header files	2014-01-14 01:17:13 -08:00
Tathagata Das	f8bd828c7c	Fixed loose ends in docs.	2014-01-14 00:03:46 -08:00
Tathagata Das	f8e239e058	Merge remote-tracking branch 'apache/master' into filestream-fix Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala	2014-01-13 23:57:27 -08:00
Tathagata Das	4e497db8f3	Removed StreamingContext.registerInputStream and registerOutputStream - they were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.	2014-01-13 23:23:46 -08:00
Patrick Wendell	68641bce61	Merge pull request #413 from rxin/scaladoc Adjusted visibility of various components and documentation for 0.9.0 release.	2014-01-13 22:54:13 -08:00
Patrick Wendell	08b9fec93d	Merge pull request #409 from tdas/unpersist Automatically unpersisting RDDs that have been cleaned up from DStreams Earlier RDDs generated by DStreams were forgotten but not unpersisted. The system relied on the natural BlockManager LRU to drop the data. The cleaner.ttl was a hammer to clean up RDDs but it is something that needs to be set separately and need to be set very conservatively (at best, few minutes). This automatic unpersisting allows the system to handle this automatically, which reduces memory usage. As a side effect it will also improve GC performance as there are less number of objects stored in memory. In fact, for some workloads, it may allow RDDs to be cached as deserialized, which speeds up processing without too much GC overheads. This is disabled by default. To enable it set configuration spark.streaming.unpersist to true. In future release, this will be set to true by default. Also, reduced sleep time in TaskSchedulerImpl.stop() from 5 second to 1 second. From my conversation with Matei, there does not seem to be any good reason for the sleep for letting messages be sent out be so long.	2014-01-13 22:29:03 -08:00
Reynold Xin	33022d6656	Adjusted visibility of various components.	2014-01-13 19:58:53 -08:00
Patrick Wendell	a2fee38ee0	Merge pull request #411 from tdas/filestream-fix Improved logic of finding new files in FileInputDStream Earlier, if HDFS has a hiccup and reports a existence of a new file (mod time T sec) at time T + 1 sec, then fileStream could have missed that file. With this change, it should be able to find files that are delayed by up to <batch size> seconds. That is, even if file is reported at T + <batch time> sec, file stream should be able to catch it. The new logic, at a high level, is as follows. It keeps track of the new files it found in the previous interval and mod time of the oldest of those files (lets call it X). Then in the current interval, it will ignore those files that were seen in the previous interval and those which have mod time older than X. So if a new file gets reported by HDFS that in the current interval, but has mod time in the previous interval, it will be considered. However, if the mod time earlier than the previous interval (that is, earlier than X), they will be ignored. This is the current limitation, and future version would improve this behavior further. Also reduced line lengths in DStream to <=100 chars.	2014-01-13 19:45:26 -08:00
Tathagata Das	1233b3de01	Merge remote-tracking branch 'apache/master' into filestream-fix	2014-01-13 17:29:19 -08:00
Tathagata Das	c0bb38e8aa	Improved file input stream further.	2014-01-13 16:54:52 -08:00
Reynold Xin	30328c347b	Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.	2014-01-13 15:58:39 -08:00
Tathagata Das	27311b1332	Added unpersisting and modified testsuite to better test out metadata cleaning.	2014-01-13 14:57:07 -08:00
Patrick Wendell	b93f9d42f2	Merge pull request #400 from tdas/dstream-move Moved DStream and PairDSream to org.apache.spark.streaming.dstream Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure. Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.	2014-01-13 12:18:05 -08:00
Reynold Xin	e6ed13f255	Merge pull request #397 from pwendell/host-port Remove now un-needed hostPort option I noticed this was logging some scary error messages in various places. After I looked into it, this is no longer really used. I removed the option and re-wrote the one remaining use case (it was unnecessary there anyways).	2014-01-12 22:35:14 -08:00
Tathagata Das	777c181d2f	Merge remote-tracking branch 'apache/master' into dstream-move Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala	2014-01-12 21:59:51 -08:00
Patrick Wendell	0ab505a29e	Merge pull request #395 from hsaputra/remove_simpleredundantreturn_scala Remove simple redundant return statements for Scala methods/functions Remove simple redundant return statements for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized -) Add small changes to making var to val if possible and remove () for simple get This hopefully makes the review simpler =) Pass compile and tests.	2014-01-12 21:31:04 -08:00
Patrick Wendell	405bfe86ef	Merge pull request #394 from tdas/error-handling Better error handling in Spark Streaming and more API cleanup Earlier errors in jobs generated by Spark Streaming (or in the generation of jobs) could not be caught from the main driver thread (i.e. the thread that called StreamingContext.start()) as it would be thrown in different threads. With this change, after `ssc.start`, one can call `ssc.awaitTermination()` which will be block until the ssc is closed, or there is an exception. This makes it easier to debug. This change also adds ssc.stop(<stop-spark-context>) where you can stop StreamingContext without stopping the SparkContext. Also fixes the bug that came up with PRs #393 and #381. MetadataCleaner default value has been changed from 3500 to -1 for normal SparkContext and 3600 when creating a StreamingContext. Also, updated StreamingListenerBus with changes similar to SparkListenerBus in #392. And changed a lot of protected[streaming] to private[streaming].	2014-01-12 20:04:21 -08:00
Tathagata Das	034f89aaab	Fixed persistence logic of WindowedDStream, and fixed default persistence level of input streams.	2014-01-12 19:02:27 -08:00
Patrick Wendell	e6e20ceee0	Adding deprecated versions of old code	2014-01-12 18:54:03 -08:00
Tathagata Das	d1820fef57	Merge branch 'error-handling' into dstream-move	2014-01-12 17:36:49 -08:00
Tathagata Das	c7fabb745b	Changed StreamingContext.stopForWait to awaitTermination.	2014-01-12 17:21:13 -08:00
Patrick Wendell	f4d77f8cb8	Rename DStream.foreach to DStream.foreachRDD `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.	2014-01-12 17:21:00 -08:00
Patrick Wendell	0bb33076e2	Removing mentions in tests	2014-01-12 16:53:58 -08:00
Tathagata Das	7883b8f579	Fixed bugs to ensure better cleanup of JobScheduler, JobGenerator and NetworkInputTracker upon close.	2014-01-12 16:44:07 -08:00
Tathagata Das	448aef6790	Moved DStream, DStreamCheckpointData and PairDStream from org.apache.spark.streaming to org.apache.spark.streaming.dstream.	2014-01-12 11:31:54 -08:00
Henry Saputra	91a563608e	Merge branch 'master' into remove_simpleredundantreturn_scala	2014-01-12 10:34:13 -08:00
Henry Saputra	93a65e5fde	Remove simple redundant return statement for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized	2014-01-12 10:30:04 -08:00
Tathagata Das	c5921e5c61	Fixed bugs.	2014-01-12 01:12:08 -08:00
Tathagata Das	4d9b0ab420	Added waitForStop and stop to JavaStreamingContext.	2014-01-11 23:35:51 -08:00
Tathagata Das	f5108ffc24	Converted JobScheduler to use actors for event handling. Changed protected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.	2014-01-11 23:15:09 -08:00
Tathagata Das	4f39e79c23	Merge remote-tracking branch 'apache/master' into driver-test Conflicts: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala	2014-01-10 15:47:01 -08:00
Tathagata Das	82f07deeda	Modified streaming.FailureSuite tests to test StreamingContext.getOrCreate.	2014-01-10 15:37:05 -08:00
Tathagata Das	e4bb845238	Updated docs based on Patrick's comments in PR 383.	2014-01-10 12:17:09 -08:00
Tathagata Das	2213a5a47f	Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test	2014-01-10 05:06:22 -08:00
Tathagata Das	740730a179	Fixed conf/slaves and updated docs.	2014-01-10 05:06:15 -08:00
Tathagata Das	4f609f7901	Removed spark.hostPort and other setting from SparkConf before saving to checkpoint.	2014-01-10 12:58:07 +00:00
Tathagata Das	d7ec73ac76	Merge branch 'driver-test' of github.com:tdas/incubator-spark into driver-test	2014-01-10 11:44:17 +00:00
Tathagata Das	9d3d9c8251	Refactored graph checkpoint file reading and writing code to make it cleaner and easily debuggable.	2014-01-10 11:44:02 +00:00
Patrick Wendell	997c830e0b	Merge pull request #363 from pwendell/streaming-logs Set default logging to WARN for Spark streaming examples. This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.	2014-01-09 22:22:20 -08:00
Tathagata Das	38d75e18fa	Merge remote-tracking branch 'apache/master' into driver-test	2014-01-09 19:31:36 -08:00
Tathagata Das	4a5558ca99	Fixed bugs in reading of checkpoints.	2014-01-10 03:28:39 +00:00
Tathagata Das	f1d206c6b4	Merge branch 'standalone-driver' into driver-test Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2014-01-09 15:06:24 -08:00
Tathagata Das	6f713e2a3e	Changed the way StreamingContext finds and reads checkpoint files, and added JavaStreamingContext.getOrCreate.	2014-01-09 13:42:04 -08:00
Patrick Wendell	35f80da21a	Set default logging to WARN for Spark streaming examples. This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.	2014-01-09 10:42:58 -08:00
Matei Zaharia	a01f3401e3	Use typed getters for configuration settings	2014-01-09 00:07:29 -08:00
Tathagata Das	a17cc602ac	More bug fixes.	2014-01-08 04:12:05 -08:00
Tathagata Das	0b7a132d03	Modified checkpoing file clearing policy.	2014-01-08 03:22:06 -08:00
Tathagata Das	3b4c4c7f4d	Merge remote-tracking branch 'apache/master' into project-refactor Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala	2014-01-06 03:05:52 -08:00
Tathagata Das	ac1f4b06c1	Added a hashmap to cache file mod times.	2014-01-05 23:42:53 -08:00
Tathagata Das	2394794591	Merge branch 'filestream-fix' into driver-test Conflicts: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala	2014-01-06 02:23:53 +00:00
Tathagata Das	8e88db3ca5	Bug fixes to the DriverRunner and minor changes here and there.	2014-01-06 02:21:56 +00:00
Patrick Wendell	79f52809c8	Removing SPARK_EXAMPLES_JAR in the code	2014-01-05 11:49:42 -08:00
Tathagata Das	a1b8dd53e3	Added StreamingContext.getOrCreate to for automatic recovery, and added RecoverableNetworkWordCount example to use it.	2014-01-02 19:07:22 -08:00
Patrick Wendell	588a1695f4	Merge pull request #297 from tdas/window-improvement Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.	2014-01-02 13:20:54 -08:00
Matei Zaharia	e2c68642c6	Miscellaneous fixes from code review. Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.	2014-01-01 22:03:39 -05:00
Matei Zaharia	45ff8f413d	Merge remote-tracking branch 'apache/master' into conf2 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala	2014-01-01 21:25:00 -05:00
Patrick Wendell	f8d245bdfc	Merge remote-tracking branch 'apache-github/master' into log4j-fix-2 Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2014-01-01 16:10:51 -08:00
Matei Zaharia	42bcfb2bb2	Fix two compile errors introduced in merge	2013-12-31 18:26:23 -05:00
Matei Zaharia	ba9338f104	Merge remote-tracking branch 'apache/master' into conf2 Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala	2013-12-31 18:23:14 -05:00
Tathagata Das	fcd17a1e8e	Fixed comments and long lines based on comments on PR 289.	2013-12-31 02:01:45 -08:00
Tathagata Das	87b915f221	Removed extra empty lines.	2013-12-31 00:42:10 -08:00
Tathagata Das	3ab297adaa	Removed unnecessary comments.	2013-12-31 00:38:19 -08:00
Patrick Wendell	18181e6c41	Removing initLogging entirely	2013-12-30 23:39:47 -08:00
Tathagata Das	f4e4066191	Refactored kafka, flume, zeromq, mqtt as separate external projects, with their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.	2013-12-30 11:13:24 -08:00
Matei Zaharia	0bd1900cbc	Fix a few settings that were being read as system properties after merge	2013-12-29 15:38:46 -05:00
Matei Zaharia	b4ceed40d6	Merge remote-tracking branch 'origin/master' into conf2 Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala	2013-12-29 15:08:08 -05:00
Matei Zaharia	20631348d1	Fix other failing tests	2013-12-28 23:17:58 -05:00
Matei Zaharia	0900d5c72a	Add a StreamingContext constructor that takes a conf object	2013-12-28 21:38:07 -05:00
Matei Zaharia	642029e7f4	Various fixes to configuration code - Got rid of global SparkContext.globalConf - Pass SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation).	2013-12-28 17:13:15 -05:00
Tathagata Das	271e3237f3	Minor changes in comments and strings to address comments in PR 289.	2013-12-27 12:26:57 -08:00
Tathagata Das	6e43039614	Refactored streaming project to separate out the twitter functionality.	2013-12-26 18:02:49 -08:00
Tathagata Das	577c8cc834	Removed unncessary options from WindowedDStream.	2013-12-26 14:17:16 -08:00
Tathagata Das	3618d70b2a	Added warning if filestream adds files with no data in them (file RDDs have 0 partitions).	2013-12-26 12:45:40 -08:00
Tathagata Das	be64719138	Changed file stream to not catch any exceptions related to finding new files (FileNotFound exception is still caught and ignored).	2013-12-26 12:33:12 -08:00
Tathagata Das	069cb14bdc	Updated groupByKeyAndWindow to be computed incrementally, and added mapSideCombine to combineByKeyAndWindow.	2013-12-26 02:58:29 -08:00
Tathagata Das	bacc65cf28	Removed slack time in file stream and added better handling of exceptions due to failures due FileNotFound exceptions.	2013-12-26 10:18:46 +00:00
Tathagata Das	d4dfab503a	Fixed Python API for sc.setCheckpointDir. Also other fixes based on Reynold's comments on PR 289.	2013-12-24 14:01:13 -08:00
Prashant Sharma	2573add94c	spark-544, introducing SparkConf and related configuration overhaul.	2013-12-25 00:09:36 +05:30
Tathagata Das	e9165d2a39	Merge branch 'scheduler-update' into window-improvement	2013-12-23 17:49:41 -08:00
Tathagata Das	0af7f84c8e	Minor formatting fixes.	2013-12-23 17:47:16 -08:00
Tathagata Das	8ca14a1e51	Updated testsuites to work with the slack time of file stream.	2013-12-23 16:27:00 -08:00
Tathagata Das	b31e91f927	Merge branch 'scheduler-update' into filestream-fix	2013-12-23 15:59:15 -08:00
Tathagata Das	19d1d58b67	Fixed bug in file stream that prevented some files from being read correctly.	2013-12-23 23:48:43 +00:00
Tathagata Das	f9771690a6	Minor formatting fixes.	2013-12-23 11:32:26 -08:00
Tathagata Das	dc3ee6b612	Added comments to BatchInfo and JobSet, based on Patrick's comment on PR 277.	2013-12-23 11:30:42 -08:00
Tathagata Das	e7b62cbfbf	Updated CheckpointWriter and FileInputDStream to be robust against failed FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.	2013-12-22 18:49:36 -08:00
Tathagata Das	d91ec6f8ea	Merge branch 'scheduler-update' into filestream-fix	2013-12-22 15:23:35 -08:00
Tathagata Das	3ddbdbfbc7	Minor updated based on comments on PR 277.	2013-12-20 19:51:37 -08:00
Tathagata Das	de41c436a0	Merge branch 'scheduler-update' into window-improvement Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala	2013-12-19 12:05:08 -08:00
Tathagata Das	984c582487	Merge branch 'scheduler-update' into filestream-fix Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala	2013-12-19 11:20:48 -08:00

1 2 3 4 5 ...

481 commits