ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Prashant Sharma	291dd47c7f	Taking FeederActor out as seperate program	2013-02-08 14:34:07 +05:30
Matei Zaharia	b53174a6f3	Merge pull request #454 from MLnick/ipython SPARK-685 Adding IPYTHON environment variable support for launching pyspark using ...	2013-02-07 18:29:04 -08:00
Tathagata Das	bcee3cb2db	Merge pull request #455 from tdas/streaming Merging latest master branch changes to the streaming branch	2013-02-07 15:05:20 -08:00
Tathagata Das	4cc223b478	Merge branch 'mesos-master' into streaming	2013-02-07 13:59:31 -08:00
Tathagata Das	d55e3aa467	Updated JavaStreamingContext with updated kafkaStream API.	2013-02-07 13:59:18 -08:00
Tathagata Das	c6b2f765d3	Merge branch 'mesos-streaming' into streaming	2013-02-07 13:13:53 -08:00
Tathagata Das	12300758cc	Merge pull request #372 from Reinvigorate/sm-kafka Removing offset management code that is non-existent in kafka 0.7.0+	2013-02-07 12:41:07 -08:00
Tathagata Das	915d9931fe	Merge pull request #373 from Reinvigorate/sm-updateStateByKey StateDStream changes to give updateStateByKey consistent behavior	2013-02-07 11:59:19 -08:00
Nick Pentreath	21d3946d17	Adding IPYTHON environment variable support for launching pyspark using ipython shell	2013-02-07 16:54:31 +02:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Patrick Wendell	dab81a8511	Fixing to match Spark styleguide	2013-02-05 20:57:04 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f4d43cb43e	Remove unneeded zipWithIndex. Also rename r->rdd and remove unneeded extra type info.	2013-02-05 21:26:45 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Stephen Haberman	67df7f2fa2	Add private, minor formatting.	2013-02-05 21:08:21 -06:00
Imran Rashid	379564c7e0	setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place	2013-02-05 18:30:21 -08:00
Matei Zaharia	9cfa068379	Merge pull request #450 from stephenh/inlinemergepair Inline mergePair to look more like the narrow dep branch.	2013-02-05 18:28:44 -08:00
Matei Zaharia	03eefbb200	Merge pull request #451 from stephenh/fixdeathpactexception Handle Terminated to avoid endless DeathPactExceptions.	2013-02-05 18:27:54 -08:00
Stephen Haberman	870b2aaf5d	Merge branch 'master' into fixdeathpactexception Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala	2013-02-05 20:27:09 -06:00
Matei Zaharia	a4611d66f0	Merge pull request #449 from stephenh/longerdriversuite Increase DriverSuite timeout.	2013-02-05 17:58:22 -08:00
Stephen Haberman	0e19093fd8	Handle Terminated to avoid endless DeathPactExceptions. Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.	2013-02-05 18:58:00 -06:00
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Imran Rashid	1704b124d8	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:33:52 -08:00
Imran Rashid	cfab1a3528	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:31:46 -08:00
Imran Rashid	696e4b2167	track remoteFetchTime	2013-02-05 14:29:16 -08:00
Imran Rashid	b29f9cc978	BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance	2013-02-05 14:00:44 -08:00
Matei Zaharia	2d9eca9fbb	Merge pull request #447 from pwendell/streaming-constructor Streaming constructor which takes JavaSparkContext	2013-02-05 11:45:44 -08:00
Patrick Wendell	7eea64aa4c	Streaming constructor which takes JavaSparkContext It's sometimes helpful to directly pass a JavaSparkContext, and take advantage of the various constructors available for that.	2013-02-05 11:43:16 -08:00
Imran Rashid	e319ac74c1	cogrouped RDD stores the amount of time taken to read shuffle data in each task	2013-02-05 10:18:16 -08:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Imran Rashid	9df7e2ae55	Shuffle Fetchers use a timed iterator	2013-02-05 10:18:16 -08:00
Imran Rashid	1ad77c4766	add TimedIterator	2013-02-05 10:18:15 -08:00
Imran Rashid	843084d69d	track total bytes written by ShuffleMapTasks	2013-02-05 10:18:15 -08:00
haitao.yao	f609182e5b	Merge branch 'mesos'	2013-02-05 14:09:45 +08:00
Imran Rashid	b430d2359d	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-02-04 21:40:44 -08:00
Patrick Wendell	cc37601ecb	Adding an example with an OLAP roll-up	2013-02-04 14:18:11 -08:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Matei Zaharia	f7b4e428be	Merge pull request #445 from JoshRosen/pyspark_fixes Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()	2013-02-03 21:36:36 -08:00
Josh Rosen	e61729113d	Remove unnecessary doctest __main__ methods.	2013-02-03 21:29:40 -08:00
haitao.yao	faa4d9e31f	Merge branch 'mesos'	2013-02-04 11:40:15 +08:00
Patrick Wendell	b14322956c	Starvation check in Standlone scheduler	2013-02-03 12:45:10 -08:00
Patrick Wendell	667860448a	Starvation check in ClusterScheduler	2013-02-03 12:45:04 -08:00
Matei Zaharia	3bfaf3ab1d	Merge pull request #379 from stephenh/sparkmem Add spark.executor.memory to differentiate executor memory from spark-shell	2013-02-02 23:58:23 -08:00
Matei Zaharia	88ee6163a1	Merge pull request #422 from squito/blockmanager_info RDDInfo available from SparkContext	2013-02-02 23:44:13 -08:00
Matei Zaharia	cd4ca93679	Merge pull request #436 from stephenh/removeextraloop Once we find a split with no block, we don't have to look for more.	2013-02-02 23:39:28 -08:00
Matei Zaharia	d5daaab381	Merge pull request #442 from stephenh/fixsystemnames Fix createActorSystem not actually using the systemName parameter.	2013-02-02 23:38:46 -08:00
Matei Zaharia	9163c3705d	Formatting	2013-02-02 23:34:47 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00

... 7 8 9 10 11 ...

2608 commits