ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Mark Hamstra	32979b5e7d	whitespace	2013-03-16 13:36:46 -07:00
Mark Hamstra	ca9f81e8fc	refactor foldByKey to use combineByKey	2013-03-16 13:31:01 -07:00
Mark Hamstra	1fb192ef40	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-16 12:17:13 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Matei Zaharia	c1e9cdc49f	Merge pull request #525 from stephenh/subtractByKey Add PairRDDFunctions.subtractByKey.	2013-03-16 11:47:45 -07:00
Mark Hamstra	ef75be3bf7	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-15 21:41:24 -07:00
Andrew xia	5892393140	refactor fair scheduler implementation 1.Chage "pool" properties to be the memeber of ActiveJob 2.Abstract the Schedulable of Pool and TaskSetManager 3.Abstract the FIFO and FS comparator algorithm 4.Miscellaneous changing of class define and construction	2013-03-16 11:13:38 +08:00
Matei Zaharia	cdbfd1e196	Merge pull request #516 from squito/fix_local_metrics Fix local metrics	2013-03-15 15:13:28 -07:00
Mark Hamstra	1a4070477d	whitespace cleanup	2013-03-15 11:28:28 -07:00
Mark Hamstra	857010392b	Fuller implementation of foldByKey	2013-03-15 10:56:05 -07:00
Mark Hamstra	16a4ca4537	restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test	2013-03-14 13:58:37 -07:00
Mark Hamstra	b1422cbdd5	added foldByKey	2013-03-14 12:59:58 -07:00
Stephen Haberman	7786881f47	Fix tabs that snuck in.	2013-03-14 14:57:12 -05:00
Stephen Haberman	7d8bb4df3a	Allow subtractByKey's other argument to have a different value type.	2013-03-14 14:44:15 -05:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Matei Zaharia	4032beba49	Merge pull request #521 from stephenh/earlyclose Close the reader in HadoopRDD as soon as iteration end.	2013-03-13 19:29:46 -07:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	cd5b947cf6	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-13 13:16:14 -07:00
Stephen Haberman	e7f1a69c6b	Add a test for NextIterator.	2013-03-13 10:46:33 -05:00
Stephen Haberman	1a175d13b9	Add NextIterator.closeIfNeeded.	2013-03-13 10:17:39 -05:00
Stephen Haberman	8f00d23598	Remove NextIterator.close default implementation.	2013-03-12 12:30:10 -05:00
Harold Lim	0b64e5f1ac	Removed some commented code	2013-03-12 13:31:27 +08:00
Harold Lim	f5b1fecb9f	Cleaned up the code	2013-03-12 13:31:27 +08:00
Harold Lim	b5325182a3	Updated/Refactored the Fair Task Scheduler. It does not inherit ClusterScheduler anymore. Rather, ClusterScheduler internally uses TaskSetQueuesManager that handles the scheduling of taskset queues. This is the class that should be extended to support other scheduling policies	2013-03-12 13:31:27 +08:00
Harold Lim	54ed7c4af4	Changed the name of the system property to set the allocation xml	2013-03-12 13:31:27 +08:00
Harold Lim	c07087364b	Made changes to the SparkContext to have a DynamicVariable for setting local properties that can be passed down the stack. Added an implementation of the fair scheduler	2013-03-12 13:31:27 +08:00
Stephen Haberman	9e68f48625	More quickly call close in HadoopRDD. This also refactors out the common "gotNext" iterator pattern into a shared utility class.	2013-03-11 23:59:17 -05:00
Charles Reiss	769d399674	Send block sizes as longs.	2013-03-11 14:17:05 -07:00
Mark Hamstra	562893bea3	deleted excess curly braces	2013-03-10 22:43:08 -07:00
Imran Rashid	8a11ac3dc7	increase sleep time	2013-03-10 22:31:44 -07:00
Imran Rashid	9f97f2f9d8	add a small wait to one task to make sure some task runtime really is non-zero	2013-03-10 22:30:18 -07:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	b57df1f5e3	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-10 16:56:31 -07:00
Matei Zaharia	2e1bbc4e7e	Merge remote-tracking branch 'woggling/dag-sched-driver-port' Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 16:52:54 -07:00
Matei Zaharia	91a9d093bd	Merge pull request #512 from patelh/fix-kryo-serializer Fix reference bug in Kryo serializer, add test, update version	2013-03-10 15:48:23 -07:00
Matei Zaharia	557cfd0f4d	Merge pull request #515 from woggling/deploy-app-death Notify standalone deploy client of application death.	2013-03-10 15:44:57 -07:00
Matei Zaharia	a59cc6060f	Merge remote-tracking branch 'stephenh/nomocks' Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 13:39:10 -07:00
Imran Rashid	20f01a0a1b	enable task metrics in local mode, add tests	2013-03-09 21:17:31 -08:00
Imran Rashid	ec30188a2a	rename remoteFetchWaitTime to fetchWaitTime, since it also includes time from local fetches	2013-03-09 21:16:53 -08:00
Charles Reiss	b0983c5762	Notify standalone deploy client of application death. Usually, this isn't necessary since the application will be removed as a result of the deploy client disconnecting, but occassionally, the standalone deploy master removes an application otherwise. Also mark applications as FAILED instead of FINISHED when they are killed as a result of their executors failing too many times.	2013-03-09 11:29:45 -08:00
Charles Reiss	d0216cb38b	Prevent DAGSchedulerSuite from corrupting driver.port. Use the LocalSparkContext abstraction to properly manage clearing spark.driver.port.	2013-03-09 10:49:02 -08:00
Hiral Patel	664e5fd24b	Fix reference bug in Kryo serializer, add test, update version	2013-03-07 22:16:11 -08:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Matei Zaharia	9f0dc829cb	Fix TaskMetrics not being serializable	2013-03-04 12:08:31 -08:00
Matei Zaharia	04fb81ffe5	Merge pull request #506 from rxin/spark-706 Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-03-03 17:20:07 -08:00
Imran Rashid	0bd1d00c2a	minor cleanup based on feedback in review request	2013-03-03 16:46:45 -08:00
Imran Rashid	f1006b99ff	change CleanupIterator to CompletionIterator	2013-03-03 16:39:05 -08:00
Imran Rashid	8fef5b9c5f	refactoring of TaskMetrics	2013-03-03 16:34:04 -08:00
Imran Rashid	d36abdb053	Merge branch 'master' into stageInfo	2013-03-03 15:20:46 -08:00
Reynold Xin	44134e12bb	Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-02-28 15:14:59 -08:00
Stephen Haberman	6415c2bb60	Don't create the Executor until we have everything it needs.	2013-02-28 12:38:09 -06:00
Stephen Haberman	80eecd2cb1	Make Executor fields volatile since they're read from the thread pool.	2013-02-28 10:41:07 -06:00
Mosharaf Chowdhury	4ab387bcdb	Fixed master datastructure updates after removing an application; and a typo.	2013-02-27 13:52:44 -08:00
Matei Zaharia	ece3edfffa	Fix a problem with no hosts being counted as alive in the first job	2013-02-26 12:11:03 -08:00
Matei Zaharia	73697e2891	Fix overly large thread names in PySpark	2013-02-26 12:07:59 -08:00
Stephen Haberman	db957e5bd7	Fix MapOutputTrackerSuite.	2013-02-26 01:38:50 -06:00
Stephen Haberman	a65aa549ff	Override DAGScheduler.runLocally so we can remove the Thread.sleep.	2013-02-25 23:49:32 -06:00
Stephen Haberman	a4adeb255c	Merge branch 'master' into nomocks Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-02-25 23:48:52 -06:00
Tathagata Das	c02e064938	Fixed replication bug in BlockManager	2013-02-25 17:27:46 -08:00
Matei Zaharia	490f056cdd	Allow passing sparkHome and JARs to StreamingContext constructor Also warns if spark.cleaner.ttl is not set in the version where you pass your own SparkContext.	2013-02-25 15:13:30 -08:00
Matei Zaharia	568bdaf8ae	Set spark.deploy.spreadOut to true by default in 0.7 (improves locality)	2013-02-25 14:34:55 -08:00
Matei Zaharia	1ef58dadcc	Add a config property for Akka lifecycle event logging	2013-02-25 14:01:24 -08:00
Matei Zaharia	ceaec4a675	Merge pull request #498 from pwendell/shutup-akka Disable remote lifecycle logging from Akka.	2013-02-25 12:31:24 -08:00
Patrick Wendell	85a85646d9	Disable remote lifecycle logging from Akka. This changes the default setting to `off` for remote lifecycle events. When this is on, it is very chatty at the INFO level. It also prints out several ERROR messages sometimes when sc.stop() is called.	2013-02-25 12:25:43 -08:00
Imran Rashid	8f17387d97	remove bogus comment	2013-02-25 10:31:06 -08:00
Matei Zaharia	6ae9a22c3e	Get spark.default.paralellism on each call to defaultPartitioner, instead of only once, in case the user changes it across Spark uses	2013-02-25 10:28:08 -08:00
Matei Zaharia	d6e6abece3	Merge pull request #459 from stephenh/bettersplits Change defaultPartitioner to use upstream split size.	2013-02-25 09:22:04 -08:00
Stephen Haberman	c44ccf2862	Use default parallelism if its set.	2013-02-24 23:54:03 -06:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Tathagata Das	dff53d1b94	Merge branch 'mesos-master' into streaming	2013-02-24 12:17:22 -08:00
Matei Zaharia	3b9f929467	Merge pull request #468 from haitaoyao/master support customized java options for Master, Worker, Executor, and Repl	2013-02-23 23:38:15 -08:00
Stephen Haberman	37c7a71f9c	Add subtract to JavaRDD, JavaDoubleRDD, and JavaPairRDD.	2013-02-24 00:27:53 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Tathagata Das	d853aa9658	Change spark.cleaner.delay to spark.cleaner.ttl. Updated docs.	2013-02-23 17:42:26 -08:00
Patrick Wendell	931f439be9	Responding to code review	2013-02-23 15:40:41 -08:00
Patrick Wendell	f51b0f93f2	Adding Java-accessible methods to Vector.scala This is needed for the Strata machine learning tutorial (and also is generally helpful).	2013-02-23 13:26:59 -08:00
Matei Zaharia	d942d39072	Handle exceptions in RecordReader.close() better (suggested by Jim Donahue)	2013-02-23 11:19:07 -08:00
Matei Zaharia	c89824046a	Merge pull request #490 from woggling/conn-death Detect when SendingConnections disconnect even if we aren't sending to them	2013-02-22 22:58:19 -08:00
Charles Reiss	50cf8c8b79	Add fault tolerance test that uses replicated RDDs.	2013-02-22 16:11:53 -08:00
Charles Reiss	c8a7886921	Detect when SendingConnections drop by trying to read them. Comment fix	2013-02-22 16:11:52 -08:00
Matei Zaharia	d4d7993bf5	Several fixes to the work to log when no resources can be used by a job. Fixed some of the messages as well as code style.	2013-02-22 15:51:37 -08:00
Matei Zaharia	f33662c133	Merge remote-tracking branch 'pwendell/starvation-check' Also fixed a bug where master was offering executors on dead workers Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-02-22 15:27:41 -08:00
Matei Zaharia	7341de0d48	Merge pull request #475 from JoshRosen/spark-668 Remove hack workaround for SPARK-668	2013-02-22 14:56:18 -08:00
Patrick Wendell	f8c3a03d55	SPARK-702: Replace Function --> JFunction in JavaAPI Suite. In a few places the Scala (rather than Java) function class is used.	2013-02-22 12:54:15 -08:00
Imran Rashid	0f37b43b40	make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD	2013-02-21 16:56:28 -08:00
Imran Rashid	9230617f23	add cleanup iterator	2013-02-21 16:55:14 -08:00
Imran Rashid	81bd07da26	sparkListeners should be a val	2013-02-21 15:21:45 -08:00
Imran Rashid	796e934d31	add some docs & some cleanup	2013-02-21 15:19:34 -08:00
Imran Rashid	394d3acc3e	store taskInfo & metrics together in a tuple	2013-02-21 15:19:34 -08:00
Imran Rashid	7960927cf4	get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo	2013-02-21 15:19:34 -08:00
Imran Rashid	d0bfac3eed	taskInfo tracks if a task is run on a preferred host	2013-02-21 15:19:34 -08:00
Imran Rashid	6f62a57858	add runtime breakdowns	2013-02-21 15:19:34 -08:00
Imran Rashid	176cb20703	add task result size; better formatting for time interval distributions; cleanup distribution formatting	2013-02-21 15:19:33 -08:00
Imran Rashid	f2fcabf2ea	add timing around parts of executor & track result size	2013-02-21 15:19:33 -08:00
Imran Rashid	ff127cfcd3	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-21 15:16:21 -08:00
Imran Rashid	69f9a7035f	fully revert change to addOnCompleteCallback -- missed this in `e9f53ec`	2013-02-21 15:07:46 -08:00
Imran Rashid	baab23abdf	TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task	2013-02-21 14:13:01 -08:00
haitao.yao	8215b95547	Merge branch 'mesos'	2013-02-21 10:07:24 +08:00
Tathagata Das	334ab92441	Fixed bug in CheckpointSuite	2013-02-20 10:26:36 -08:00
Tathagata Das	1cb725e417	Merge branch 'mesos-master' into streaming	2013-02-20 09:55:35 -08:00
Tathagata Das	fb9956256d	Merge branch 'mesos-master' into streaming Conflicts: core/src/main/scala/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala	2013-02-20 09:01:29 -08:00
Matei Zaharia	05bc02e80b	Merge pull request #482 from woggling/shutdown-exceptions Don't call System.exit over uncaught exceptions from shutdown hooks	2013-02-19 20:56:15 -08:00
haitao.yao	6a3d44c673	Merge branch 'mesos'	2013-02-20 10:23:58 +08:00
Charles Reiss	092c631fa8	Pull detection of being in a shutdown hook into utility function.	2013-02-19 17:49:55 -08:00
Reynold Xin	130f704baf	Added a method to create PartitionPruningRDD.	2013-02-19 16:03:52 -08:00
Charles Reiss	d0588bd6d7	Catch/log errors deleting temp dirs	2013-02-19 13:04:06 -08:00
Charles Reiss	687581c3ec	Paranoid uncaught exception handling for exceptions during shutdown	2013-02-19 13:03:02 -08:00
haitao.yao	7c129388fb	Merge branch 'mesos'	2013-02-19 11:22:24 +08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
Stephen Haberman	6a2d957843	Tweak test names.	2013-02-16 00:33:49 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Imran Rashid	893bad9089	use appid instead of frameworkid; simplify stupid condition	2013-02-13 20:30:21 -08:00
Imran Rashid	8f18e7e863	include jobid in Executor commandline args	2013-02-13 13:05:13 -08:00
Matei Zaharia	fd7e414bd0	Merge pull request #464 from pwendell/java-type-fix SPARK-694: All references to [K, V] in JavaDStreamLike should be changed to [K2, V2]	2013-02-11 19:19:05 -08:00
Matei Zaharia	bfeed4725d	Merge pull request #465 from pwendell/java-sort-fix SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 18:23:12 -08:00
Patrick Wendell	21df6ffc13	SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 17:43:26 -08:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Josh Rosen	e9fb25426e	Remove hack workaround for SPARK-668. Renaming the type paramters solves this problem (see SPARK-694). I tried this fix earlier, but it didn't work because I didn't run `sbt/sbt clean` first.	2013-02-11 11:19:20 -08:00
Patrick Wendell	f0b68c623c	Initial cut at replacing K, V in Java files	2013-02-11 10:03:37 -08:00
Imran Rashid	e9f53ec0ea	undo chnage to onCompleteCallbacks	2013-02-11 09:36:49 -08:00
Matei Zaharia	da8afbc77e	Some bug and formatting fixes to FT Conflicts: core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:43:38 -08:00
root	1b47fa2752	Detect hard crashes of workers using a heartbeat mechanism. Also fixes some issues in the rest of the code with detecting workers this way. Conflicts: core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:28:28 -08:00
Matei Zaharia	8c66c49962	Tweak web UI so that people don't get confused about master URL format Conflicts: core/src/main/twirl/spark/deploy/master/index.scala.html core/src/main/twirl/spark/deploy/worker/index.scala.html	2013-02-10 21:58:34 -08:00
Imran Rashid	d9461b15d3	cleanup a bunch of imports	2013-02-10 21:41:40 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Imran Rashid	383af599bb	SparkContext.addSparkListener; "std" listener in StatsReportListener	2013-02-10 14:19:37 -08:00
Imran Rashid	b7d9e24394	use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver	2013-02-10 14:18:52 -08:00
Stephen Haberman	680f42e6cd	Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
Patrick Wendell	2ed791fd7f	Minor fixes	2013-02-09 22:00:38 -08:00
Patrick Wendell	1859c9f93c	Changing to use Timer based on code review	2013-02-09 21:55:17 -08:00
Matei Zaharia	ccb1ca4a23	Merge pull request #448 from squito/fetch_maxBytesInFlight add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-09 18:15:18 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	4619ee0787	Move JavaRDDLike.coalesce into the right places.	2013-02-09 20:05:42 -06:00
Stephen Haberman	921be76533	Use stubs instead of mocks for DAGSchedulerSuite.	2013-02-09 16:42:18 -06:00
Stephen Haberman	fb7599870f	Fix JavaRDDLike.coalesce return type.	2013-02-09 16:10:52 -06:00
Stephen Haberman	2a18cd826c	Add back return types.	2013-02-09 10:12:04 -06:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Imran Rashid	04e828f7c1	general fixes to Distribution, plus some tests	2013-02-08 19:07:36 -08:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f4d43cb43e	Remove unneeded zipWithIndex. Also rename r->rdd and remove unneeded extra type info.	2013-02-05 21:26:45 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Stephen Haberman	67df7f2fa2	Add private, minor formatting.	2013-02-05 21:08:21 -06:00
Imran Rashid	379564c7e0	setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place	2013-02-05 18:30:21 -08:00
Matei Zaharia	9cfa068379	Merge pull request #450 from stephenh/inlinemergepair Inline mergePair to look more like the narrow dep branch.	2013-02-05 18:28:44 -08:00
Stephen Haberman	870b2aaf5d	Merge branch 'master' into fixdeathpactexception Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala	2013-02-05 20:27:09 -06:00
Matei Zaharia	a4611d66f0	Merge pull request #449 from stephenh/longerdriversuite Increase DriverSuite timeout.	2013-02-05 17:58:22 -08:00
Stephen Haberman	0e19093fd8	Handle Terminated to avoid endless DeathPactExceptions. Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.	2013-02-05 18:58:00 -06:00
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Imran Rashid	1704b124d8	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:33:52 -08:00
Imran Rashid	cfab1a3528	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:31:46 -08:00
Imran Rashid	696e4b2167	track remoteFetchTime	2013-02-05 14:29:16 -08:00
Imran Rashid	b29f9cc978	BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance	2013-02-05 14:00:44 -08:00
Imran Rashid	e319ac74c1	cogrouped RDD stores the amount of time taken to read shuffle data in each task	2013-02-05 10:18:16 -08:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Imran Rashid	9df7e2ae55	Shuffle Fetchers use a timed iterator	2013-02-05 10:18:16 -08:00
Imran Rashid	1ad77c4766	add TimedIterator	2013-02-05 10:18:15 -08:00
Imran Rashid	843084d69d	track total bytes written by ShuffleMapTasks	2013-02-05 10:18:15 -08:00
haitao.yao	f609182e5b	Merge branch 'mesos'	2013-02-05 14:09:45 +08:00
Imran Rashid	b430d2359d	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-02-04 21:40:44 -08:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Matei Zaharia	f7b4e428be	Merge pull request #445 from JoshRosen/pyspark_fixes Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()	2013-02-03 21:36:36 -08:00
haitao.yao	faa4d9e31f	Merge branch 'mesos'	2013-02-04 11:40:15 +08:00
Patrick Wendell	b14322956c	Starvation check in Standlone scheduler	2013-02-03 12:45:10 -08:00
Patrick Wendell	667860448a	Starvation check in ClusterScheduler	2013-02-03 12:45:04 -08:00
Matei Zaharia	3bfaf3ab1d	Merge pull request #379 from stephenh/sparkmem Add spark.executor.memory to differentiate executor memory from spark-shell	2013-02-02 23:58:23 -08:00
Matei Zaharia	88ee6163a1	Merge pull request #422 from squito/blockmanager_info RDDInfo available from SparkContext	2013-02-02 23:44:13 -08:00
Matei Zaharia	cd4ca93679	Merge pull request #436 from stephenh/removeextraloop Once we find a split with no block, we don't have to look for more.	2013-02-02 23:39:28 -08:00
Matei Zaharia	d5daaab381	Merge pull request #442 from stephenh/fixsystemnames Fix createActorSystem not actually using the systemName parameter.	2013-02-02 23:38:46 -08:00
Matei Zaharia	9163c3705d	Formatting	2013-02-02 23:34:47 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00
Matei Zaharia	34a7bcdb3a	Formatting	2013-02-02 19:40:30 -08:00
Stephen Haberman	7aba123f0c	Further simplify checking for Nil.	2013-02-02 13:53:28 -06:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Stephen Haberman	cae8a6795c	Fix dangling old variable names.	2013-02-02 02:15:39 -06:00
Stephen Haberman	696eec32c9	Move executorMemory up into SchedulerBackend.	2013-02-02 02:03:26 -06:00
Stephen Haberman	103c375ba0	Merge branch 'master' into sparkmem	2013-02-02 01:57:18 -06:00
Stephen Haberman	28e0cb9f31	Fix createActorSystem not actually using the systemName parameter. This meant all system names were "spark", which worked, but didn't lead to the most intuitive log output. This fixes createActorSystem to use the passed system name, and refactors Master/Worker to encapsulate their system/actor names instead of having the clients guess at them. Note that the driver system name, "spark", is left as is, and is still repeated a few times, but that seems like a separate issue.	2013-02-02 01:11:37 -06:00
Charles Reiss	1fd5ee323d	Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.	2013-02-01 22:33:38 -08:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Stephen Haberman	12c1eb4756	Reduce the amount of duplicate logging Akka does to stdout. Given we have Akka logging go through SLF4j to log4j, we don't need all the extra noise of Akka's stdout logger that is supposedly only used during Akka init time but seems to continue logging lots of noisy network events that we either don't care about or are in the log4j logs anyway. See: http://doc.akka.io/docs/akka/2.0/general/configuration.html # Log level for the very basic logger activated during AkkaApplication startup # Options: ERROR, WARNING, INFO, DEBUG # stdout-loglevel = "WARNING"	2013-02-01 21:21:44 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	4529876db0	Merge branch 'master' of github.com:mesos/spark	2013-02-01 14:07:38 -08:00
Matei Zaharia	9970926ede	formatting	2013-02-01 14:07:34 -08:00
Matei Zaharia	79c24abe4c	Merge pull request #432 from stephenh/moreprivacy Add more private declarations.	2013-02-01 14:06:55 -08:00
Matei Zaharia	de340ddf0b	Merge pull request #437 from stephenh/cancelmetacleaner Stop BlockManagers metadataCleaner.	2013-02-01 12:59:25 -08:00
Imran Rashid	c6190067ae	remove unneeded (and unused) filter on block info	2013-02-01 09:55:25 -08:00
Stephen Haberman	59c57e48df	Stop BlockManagers metadataCleaner.	2013-02-01 10:34:02 -06:00
Matei Zaharia	571af31304	Merge pull request #433 from rxin/master Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-02-01 00:32:41 -08:00
Imran Rashid	8a0a5ed533	track total partitions, in addition to cached partitions; use scala string formatting	2013-02-01 00:23:38 -08:00
Imran Rashid	f127f2ae76	fixup merge (master -> driver renaming)	2013-02-01 00:20:49 -08:00
Reynold Xin	f9af9cee6f	Moved PruneDependency into PartitionPruningRDD.scala.	2013-02-01 00:02:46 -08:00
haitao.yao	b57570fd12	Merge branch 'mesos'	2013-02-01 14:06:45 +08:00
Patrick Wendell	39ab83e957	Small fix from last commit	2013-01-31 21:52:52 -08:00
Patrick Wendell	c33f0ef41a	Some style cleanup	2013-01-31 21:50:02 -08:00
Patrick Wendell	3446d5c8d6	SPARK-673: Capture and re-throw Python exceptions This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.	2013-01-31 18:06:11 -08:00
Reynold Xin	6289d9654e	Removed the TODO comment from PartitionPruningRDD.	2013-01-31 17:49:36 -08:00
Reynold Xin	5b0fc265c2	Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-01-31 17:48:39 -08:00
Stephen Haberman	782187c210	Once we find a split with no block, we don't have to look for more.	2013-01-31 18:27:25 -06:00
Stephen Haberman	418e36caa8	Add more private declarations.	2013-01-31 17:18:33 -06:00
haitao.yao	3190483b98	bug fix for javadoc	2013-01-31 14:23:51 +08:00
Imran Rashid	02a6761589	Merge branch 'master' into blockmanager_info Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2013-01-30 18:52:35 -08:00
Imran Rashid	c1df24d085	rename Slaves --> Executor	2013-01-30 18:51:14 -08:00
Matei Zaharia	d12330bd2c	Merge pull request #426 from woggling/conn-manager-ips Remember ConnectionManagerId used to initiate SendingConnections	2013-01-30 15:02:53 -08:00
Matei Zaharia	612a9fee71	Merge pull request #428 from woggling/mesos-exec-id Make ExecutorIDs include SlaveIDs when running Mesos	2013-01-30 15:01:46 -08:00
Stephen Haberman	871476d506	Include message and exitStatus if availalbe.	2013-01-30 16:56:46 -06:00
Charles Reiss	252845d304	Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend	2013-01-30 10:38:06 -08:00
Charles Reiss	f7de6978c1	Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use the Mesos ExecutorID as a Spark ExecutorID.	2013-01-30 09:38:57 -08:00
Charles Reiss	7f51458774	Comment at top of DAGSchedulerSuite	2013-01-30 09:34:53 -08:00
Charles Reiss	9c0bae75ad	Change DAGSchedulerSuite to run DAGScheduler in the same Thread.	2013-01-30 09:22:07 -08:00
Charles Reiss	178b89204c	Refactor DAGScheduler more to allow testing without a separate thread.	2013-01-30 09:19:55 -08:00
Charles Reiss	4bf3d7ea12	Clear spark.master.port to cleanup for other tests	2013-01-29 19:05:58 -08:00
Charles Reiss	9eac7d01f0	Add DAGScheduler tests.	2013-01-29 18:55:43 -08:00
Charles Reiss	a3d14c0404	Refactoring to DAGScheduler to aid testing	2013-01-29 18:55:42 -08:00
Charles Reiss	16a0789e10	Remember ConnectionManagerId used to initiate SendingConnections. This prevents ConnectionManager from getting confused if a machine has multiple host names and the one getHostName() finds happens not to be the one that was passed from, e.g., the BlockManagerMaster.	2013-01-29 18:13:59 -08:00
Matei Zaharia	d54b10b6ad	Merge remote-tracking branch 'stephenh/removefailedjob' Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-01-29 18:12:29 -08:00
Matei Zaharia	ccb67ff2ca	Merge pull request #425 from stephenh/toDebugString Add RDD.toDebugString.	2013-01-29 10:44:18 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Imran Rashid	b92259ba57	Merge branch 'master' into blockmanager_info	2013-01-29 09:45:10 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	cbf72bffa5	Include name, if set, in RDD.toString().	2013-01-29 00:20:36 -06:00
Stephen Haberman	3cda14af3f	Add number of splits.	2013-01-29 00:12:31 -06:00
Matei Zaharia	a1ecec8d79	Merge branch 'master' of github.com:mesos/spark	2013-01-28 22:08:44 -08:00
Stephen Haberman	951cfd9ba2	Add JavaRDDLike.toDebugString().	2013-01-29 00:02:17 -06:00
Matei Zaharia	f6eb1f0825	Merge pull request #413 from pwendell/stage-logging SPARK-658: Adding logging of stage duration	2013-01-28 22:01:52 -08:00
Stephen Haberman	b45857c965	Add RDD.toDebugString. Original idea by Nathan Kronenfeld.	2013-01-28 23:56:56 -06:00
Patrick Wendell	7ee824e42e	Units from ms -> s	2013-01-28 21:48:32 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Matei Zaharia	dda2ce017c	Merge pull request #424 from pwendell/logging-cleanup Some DEBUG-level log cleanup.	2013-01-28 21:18:54 -08:00
Patrick Wendell	1f9b486a8b	Some DEBUG-level log cleanup. A few changes to make the DEBUG-level logs less noisy and more readable. - Moved a few very frequent messages to Trace - Changed some BlockManger log messages to make them more understandable SPARK-666 #resolve	2013-01-28 20:29:35 -08:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Imran Rashid	cec9c768c2	convenient name available in StageInfo	2013-01-28 20:09:41 -08:00

... 3 4 5 6 7 ...

1406 commits