ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Tathagata Das	f98c7da23e	Many changes to ensure better 2nd recovery if 2nd failure happens while recovering from 1st failure - Made the scheduler to checkpoint after clearing old metadata which ensures that a new checkpoint is written as soon as at least one batch gets computed while recovering from a failure. This ensures that if there is a 2nd failure while recovering from 1st failure, the system start 2nd recovery from a newer checkpoint. - Modified Checkpoint writer to write checkpoint in a different thread. - Added a check to make sure that compute for InputDStreams gets called only for strictly increasing times. - Changed implementation of slice to call getOrCompute on parent DStream in time-increasing order. - Added testcase to test slice. - Fixed testGroupByKeyAndWindow testcase in JavaAPISuite to verify results with expected output in an order-independent manner.	2013-02-17 15:06:41 -08:00
Tathagata Das	ddcb976b0d	Made MasterFailureTest more robust.	2013-02-15 06:54:47 +00:00
Tathagata Das	4b8402e900	Moved Java streaming examples to examples/src/main/java/spark/streaming/... and fixed logging in NetworkInputTracker to highlight errors when receiver deregisters/shuts down.	2013-02-14 18:10:37 -08:00
Tathagata Das	def8126d77	Added TwitterInputDStream from example to StreamingContext. Renamed example TwitterBasic to TwitterPopularTags.	2013-02-14 17:49:43 -08:00
Tathagata Das	2eacf22401	Removed countByKeyAndWindow on paired DStreams, and added countByValueAndWindow for all DStreams. Updated both scala and java API and testsuites.	2013-02-14 12:21:47 -08:00
Tathagata Das	03e8dc6861	Changes functions comments to make them more consistent.	2013-02-13 20:59:29 -08:00
Tathagata Das	12b020b668	Added filter functionality to reduceByKeyAndWindow with inverse. Consolidated reduceByKeyAndWindow's many functions into smaller number of functions with optional parameters.	2013-02-13 20:53:50 -08:00
Tathagata Das	39addd3803	Changed scheduler and file input stream to fix bugs in the driver fault tolerance. Added MasterFailureTest to rigorously test master fault tolerance with file input stream.	2013-02-13 12:17:45 -08:00
Tathagata Das	fd90daf850	Fixed bugs in FileInputDStream and Scheduler that occasionally failed to reprocess old files after recovering from master failure. Completely modified spark.streaming.FailureTest to test multiple master failures using file input stream.	2013-02-10 19:48:42 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Tathagata Das	99a5fc498a	Added an initial spark job to ensure worker nodes are initialized.	2013-02-09 15:18:05 -08:00
Tathagata Das	4cc223b478	Merge branch 'mesos-master' into streaming	2013-02-07 13:59:31 -08:00
Tathagata Das	d55e3aa467	Updated JavaStreamingContext with updated kafkaStream API.	2013-02-07 13:59:18 -08:00
Tathagata Das	c6b2f765d3	Merge branch 'mesos-streaming' into streaming	2013-02-07 13:13:53 -08:00
Tathagata Das	12300758cc	Merge pull request #372 from Reinvigorate/sm-kafka Removing offset management code that is non-existent in kafka 0.7.0+	2013-02-07 12:41:07 -08:00
Tathagata Das	915d9931fe	Merge pull request #373 from Reinvigorate/sm-updateStateByKey StateDStream changes to give updateStateByKey consistent behavior	2013-02-07 11:59:19 -08:00
Matei Zaharia	9cfa068379	Merge pull request #450 from stephenh/inlinemergepair Inline mergePair to look more like the narrow dep branch.	2013-02-05 18:28:44 -08:00
Matei Zaharia	03eefbb200	Merge pull request #451 from stephenh/fixdeathpactexception Handle Terminated to avoid endless DeathPactExceptions.	2013-02-05 18:27:54 -08:00
Stephen Haberman	870b2aaf5d	Merge branch 'master' into fixdeathpactexception Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala	2013-02-05 20:27:09 -06:00
Matei Zaharia	a4611d66f0	Merge pull request #449 from stephenh/longerdriversuite Increase DriverSuite timeout.	2013-02-05 17:58:22 -08:00
Stephen Haberman	0e19093fd8	Handle Terminated to avoid endless DeathPactExceptions. Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.	2013-02-05 18:58:00 -06:00
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Matei Zaharia	2d9eca9fbb	Merge pull request #447 from pwendell/streaming-constructor Streaming constructor which takes JavaSparkContext	2013-02-05 11:45:44 -08:00
Patrick Wendell	7eea64aa4c	Streaming constructor which takes JavaSparkContext It's sometimes helpful to directly pass a JavaSparkContext, and take advantage of the various constructors available for that.	2013-02-05 11:43:16 -08:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Matei Zaharia	f7b4e428be	Merge pull request #445 from JoshRosen/pyspark_fixes Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()	2013-02-03 21:36:36 -08:00
Josh Rosen	e61729113d	Remove unnecessary doctest __main__ methods.	2013-02-03 21:29:40 -08:00
Matei Zaharia	3bfaf3ab1d	Merge pull request #379 from stephenh/sparkmem Add spark.executor.memory to differentiate executor memory from spark-shell	2013-02-02 23:58:23 -08:00
Matei Zaharia	88ee6163a1	Merge pull request #422 from squito/blockmanager_info RDDInfo available from SparkContext	2013-02-02 23:44:13 -08:00
Matei Zaharia	cd4ca93679	Merge pull request #436 from stephenh/removeextraloop Once we find a split with no block, we don't have to look for more.	2013-02-02 23:39:28 -08:00
Matei Zaharia	d5daaab381	Merge pull request #442 from stephenh/fixsystemnames Fix createActorSystem not actually using the systemName parameter.	2013-02-02 23:38:46 -08:00
Matei Zaharia	9163c3705d	Formatting	2013-02-02 23:34:47 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00
Josh Rosen	2415c18f48	Fix reporting of PySpark doctest failures.	2013-02-03 06:44:11 +00:00
Matei Zaharia	34a7bcdb3a	Formatting	2013-02-02 19:40:30 -08:00
Matei Zaharia	85019d76a4	Merge pull request #427 from woggling/dag-sched-tests Tests for DAGScheduler	2013-02-02 19:09:59 -08:00
Stephen Haberman	7aba123f0c	Further simplify checking for Nil.	2013-02-02 13:53:28 -06:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Stephen Haberman	cae8a6795c	Fix dangling old variable names.	2013-02-02 02:15:39 -06:00
Stephen Haberman	696eec32c9	Move executorMemory up into SchedulerBackend.	2013-02-02 02:03:26 -06:00
Stephen Haberman	103c375ba0	Merge branch 'master' into sparkmem	2013-02-02 01:57:18 -06:00
Stephen Haberman	28e0cb9f31	Fix createActorSystem not actually using the systemName parameter. This meant all system names were "spark", which worked, but didn't lead to the most intuitive log output. This fixes createActorSystem to use the passed system name, and refactors Master/Worker to encapsulate their system/actor names instead of having the clients guess at them. Note that the driver system name, "spark", is left as is, and is still repeated a few times, but that seems like a separate issue.	2013-02-02 01:11:37 -06:00
Charles Reiss	1fd5ee323d	Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.	2013-02-01 22:33:38 -08:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Matei Zaharia	7ae4b6a23d	Merge pull request #441 from stephenh/lessnoisyakka Reduce the amount of duplicate logging Akka does to stdout.	2013-02-01 21:03:37 -08:00
Stephen Haberman	12c1eb4756	Reduce the amount of duplicate logging Akka does to stdout. Given we have Akka logging go through SLF4j to log4j, we don't need all the extra noise of Akka's stdout logger that is supposedly only used during Akka init time but seems to continue logging lots of noisy network events that we either don't care about or are in the log4j logs anyway. See: http://doc.akka.io/docs/akka/2.0/general/configuration.html # Log level for the very basic logger activated during AkkaApplication startup # Options: ERROR, WARNING, INFO, DEBUG # stdout-loglevel = "WARNING"	2013-02-01 21:21:44 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	4529876db0	Merge branch 'master' of github.com:mesos/spark	2013-02-01 14:07:38 -08:00

1 2 3 4 5 ...

2163 commits