ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	931f439be9	Responding to code review	2013-02-23 15:40:41 -08:00
Patrick Wendell	f51b0f93f2	Adding Java-accessible methods to Vector.scala This is needed for the Strata machine learning tutorial (and also is generally helpful).	2013-02-23 13:26:59 -08:00
Matei Zaharia	d942d39072	Handle exceptions in RecordReader.close() better (suggested by Jim Donahue)	2013-02-23 11:19:07 -08:00
Matei Zaharia	c89824046a	Merge pull request #490 from woggling/conn-death Detect when SendingConnections disconnect even if we aren't sending to them	2013-02-22 22:58:19 -08:00
Charles Reiss	c8a7886921	Detect when SendingConnections drop by trying to read them. Comment fix	2013-02-22 16:11:52 -08:00
Matei Zaharia	d4d7993bf5	Several fixes to the work to log when no resources can be used by a job. Fixed some of the messages as well as code style.	2013-02-22 15:51:37 -08:00
Matei Zaharia	f33662c133	Merge remote-tracking branch 'pwendell/starvation-check' Also fixed a bug where master was offering executors on dead workers Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-02-22 15:27:41 -08:00
Matei Zaharia	7341de0d48	Merge pull request #475 from JoshRosen/spark-668 Remove hack workaround for SPARK-668	2013-02-22 14:56:18 -08:00
Patrick Wendell	f8c3a03d55	SPARK-702: Replace Function --> JFunction in JavaAPI Suite. In a few places the Scala (rather than Java) function class is used.	2013-02-22 12:54:15 -08:00
Imran Rashid	0f37b43b40	make the ShuffleFetcher responsible for collecting shuffle metrics, which gives us metrics for CoGroupedRDD and ShuffledRDD	2013-02-21 16:56:28 -08:00
Imran Rashid	9230617f23	add cleanup iterator	2013-02-21 16:55:14 -08:00
Imran Rashid	81bd07da26	sparkListeners should be a val	2013-02-21 15:21:45 -08:00
Imran Rashid	796e934d31	add some docs & some cleanup	2013-02-21 15:19:34 -08:00
Imran Rashid	394d3acc3e	store taskInfo & metrics together in a tuple	2013-02-21 15:19:34 -08:00
Imran Rashid	7960927cf4	get rid of a bunch of boilerplate; more formatting happens in Listener, not StageInfo	2013-02-21 15:19:34 -08:00
Imran Rashid	d0bfac3eed	taskInfo tracks if a task is run on a preferred host	2013-02-21 15:19:34 -08:00
Imran Rashid	6f62a57858	add runtime breakdowns	2013-02-21 15:19:34 -08:00
Imran Rashid	176cb20703	add task result size; better formatting for time interval distributions; cleanup distribution formatting	2013-02-21 15:19:33 -08:00
Imran Rashid	f2fcabf2ea	add timing around parts of executor & track result size	2013-02-21 15:19:33 -08:00
Imran Rashid	ff127cfcd3	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-21 15:16:21 -08:00
Imran Rashid	baab23abdf	TaskContext does not hold a reference to Task; instead, it has a shared instance of TaskMetrics with Task	2013-02-21 14:13:01 -08:00
haitao.yao	8215b95547	Merge branch 'mesos'	2013-02-21 10:07:24 +08:00
Christoph Grothaus	85a35c6840	Fix SPARK-698. From ExecutorRunner, launch java directly instead via the run scripts.	2013-02-20 21:42:11 +01:00
Tathagata Das	334ab92441	Fixed bug in CheckpointSuite	2013-02-20 10:26:36 -08:00
Tathagata Das	1cb725e417	Merge branch 'mesos-master' into streaming	2013-02-20 09:55:35 -08:00
Tathagata Das	fb9956256d	Merge branch 'mesos-master' into streaming Conflicts: core/src/main/scala/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala	2013-02-20 09:01:29 -08:00
Matei Zaharia	05bc02e80b	Merge pull request #482 from woggling/shutdown-exceptions Don't call System.exit over uncaught exceptions from shutdown hooks	2013-02-19 20:56:15 -08:00
haitao.yao	6a3d44c673	Merge branch 'mesos'	2013-02-20 10:23:58 +08:00
Charles Reiss	092c631fa8	Pull detection of being in a shutdown hook into utility function.	2013-02-19 17:49:55 -08:00
Reynold Xin	130f704baf	Added a method to create PartitionPruningRDD.	2013-02-19 16:03:52 -08:00
Charles Reiss	d0588bd6d7	Catch/log errors deleting temp dirs	2013-02-19 13:04:06 -08:00
Charles Reiss	687581c3ec	Paranoid uncaught exception handling for exceptions during shutdown	2013-02-19 13:03:02 -08:00
haitao.yao	7c129388fb	Merge branch 'mesos'	2013-02-19 11:22:24 +08:00
Matei Zaharia	7151e1e4c8	Rename "jobs" to "applications" in the standalone cluster	2013-02-17 23:23:08 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	4281e579c2	Update more javadocs.	2013-02-16 00:45:03 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
haitao.yao	a9cfac347a	Merge branch 'mesos'	2013-02-16 10:11:28 +08:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Imran Rashid	893bad9089	use appid instead of frameworkid; simplify stupid condition	2013-02-13 20:30:21 -08:00
Imran Rashid	8f18e7e863	include jobid in Executor commandline args	2013-02-13 13:05:13 -08:00
Matei Zaharia	bfeed4725d	Merge pull request #465 from pwendell/java-sort-fix SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 18:23:12 -08:00
Patrick Wendell	21df6ffc13	SPARK-696: sortByKey should use 'ascending' parameter	2013-02-11 17:43:26 -08:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Josh Rosen	e9fb25426e	Remove hack workaround for SPARK-668. Renaming the type paramters solves this problem (see SPARK-694). I tried this fix earlier, but it didn't work because I didn't run `sbt/sbt clean` first.	2013-02-11 11:19:20 -08:00
Imran Rashid	e9f53ec0ea	undo chnage to onCompleteCallbacks	2013-02-11 09:36:49 -08:00
Matei Zaharia	da8afbc77e	Some bug and formatting fixes to FT Conflicts: core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:43:38 -08:00
root	1b47fa2752	Detect hard crashes of workers using a heartbeat mechanism. Also fixes some issues in the rest of the code with detecting workers this way. Conflicts: core/src/main/scala/spark/deploy/master/Master.scala core/src/main/scala/spark/deploy/worker/Worker.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala	2013-02-10 22:28:28 -08:00
Matei Zaharia	8c66c49962	Tweak web UI so that people don't get confused about master URL format Conflicts: core/src/main/twirl/spark/deploy/master/index.scala.html core/src/main/twirl/spark/deploy/worker/index.scala.html	2013-02-10 21:58:34 -08:00
Imran Rashid	d9461b15d3	cleanup a bunch of imports	2013-02-10 21:41:40 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Imran Rashid	383af599bb	SparkContext.addSparkListener; "std" listener in StatsReportListener	2013-02-10 14:19:37 -08:00
Imran Rashid	b7d9e24394	use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver	2013-02-10 14:18:52 -08:00
Stephen Haberman	680f42e6cd	Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
Patrick Wendell	2ed791fd7f	Minor fixes	2013-02-09 22:00:38 -08:00
Patrick Wendell	1859c9f93c	Changing to use Timer based on code review	2013-02-09 21:55:17 -08:00
Matei Zaharia	ccb1ca4a23	Merge pull request #448 from squito/fetch_maxBytesInFlight add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-09 18:15:18 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	4619ee0787	Move JavaRDDLike.coalesce into the right places.	2013-02-09 20:05:42 -06:00
Stephen Haberman	921be76533	Use stubs instead of mocks for DAGSchedulerSuite.	2013-02-09 16:42:18 -06:00
Stephen Haberman	fb7599870f	Fix JavaRDDLike.coalesce return type.	2013-02-09 16:10:52 -06:00
Stephen Haberman	2a18cd826c	Add back return types.	2013-02-09 10:12:04 -06:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Imran Rashid	04e828f7c1	general fixes to Distribution, plus some tests	2013-02-08 19:07:36 -08:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f4d43cb43e	Remove unneeded zipWithIndex. Also rename r->rdd and remove unneeded extra type info.	2013-02-05 21:26:45 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Stephen Haberman	67df7f2fa2	Add private, minor formatting.	2013-02-05 21:08:21 -06:00
Imran Rashid	379564c7e0	setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place	2013-02-05 18:30:21 -08:00
Matei Zaharia	9cfa068379	Merge pull request #450 from stephenh/inlinemergepair Inline mergePair to look more like the narrow dep branch.	2013-02-05 18:28:44 -08:00
Stephen Haberman	870b2aaf5d	Merge branch 'master' into fixdeathpactexception Conflicts: core/src/main/scala/spark/deploy/worker/Worker.scala	2013-02-05 20:27:09 -06:00
Stephen Haberman	0e19093fd8	Handle Terminated to avoid endless DeathPactExceptions. Credit to Roland Kuhn, Akka's tech lead, for pointing out this various obvious fix, but StandaloneExecutorBackend.preStart's catch block would never (ever) get hit, because all of the operation's in preStart are async. So, the System.exit in the catch block was skipped, and instead Akka was sending Terminated messages which, since we didn't handle, it turned into DeathPactException, which started a postRestart/preStart infinite loop.	2013-02-05 18:58:00 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Imran Rashid	1704b124d8	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:33:52 -08:00
Imran Rashid	cfab1a3528	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:31:46 -08:00
Imran Rashid	696e4b2167	track remoteFetchTime	2013-02-05 14:29:16 -08:00
Imran Rashid	b29f9cc978	BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance	2013-02-05 14:00:44 -08:00
Imran Rashid	e319ac74c1	cogrouped RDD stores the amount of time taken to read shuffle data in each task	2013-02-05 10:18:16 -08:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Imran Rashid	9df7e2ae55	Shuffle Fetchers use a timed iterator	2013-02-05 10:18:16 -08:00
Imran Rashid	1ad77c4766	add TimedIterator	2013-02-05 10:18:15 -08:00
Imran Rashid	843084d69d	track total bytes written by ShuffleMapTasks	2013-02-05 10:18:15 -08:00
haitao.yao	f609182e5b	Merge branch 'mesos'	2013-02-05 14:09:45 +08:00
Imran Rashid	b430d2359d	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-02-04 21:40:44 -08:00
Matei Zaharia	f7b4e428be	Merge pull request #445 from JoshRosen/pyspark_fixes Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()	2013-02-03 21:36:36 -08:00
haitao.yao	faa4d9e31f	Merge branch 'mesos'	2013-02-04 11:40:15 +08:00
Patrick Wendell	b14322956c	Starvation check in Standlone scheduler	2013-02-03 12:45:10 -08:00
Patrick Wendell	667860448a	Starvation check in ClusterScheduler	2013-02-03 12:45:04 -08:00
Matei Zaharia	3bfaf3ab1d	Merge pull request #379 from stephenh/sparkmem Add spark.executor.memory to differentiate executor memory from spark-shell	2013-02-02 23:58:23 -08:00
Matei Zaharia	88ee6163a1	Merge pull request #422 from squito/blockmanager_info RDDInfo available from SparkContext	2013-02-02 23:44:13 -08:00
Matei Zaharia	cd4ca93679	Merge pull request #436 from stephenh/removeextraloop Once we find a split with no block, we don't have to look for more.	2013-02-02 23:39:28 -08:00
Matei Zaharia	d5daaab381	Merge pull request #442 from stephenh/fixsystemnames Fix createActorSystem not actually using the systemName parameter.	2013-02-02 23:38:46 -08:00
Matei Zaharia	9163c3705d	Formatting	2013-02-02 23:34:47 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00
Matei Zaharia	34a7bcdb3a	Formatting	2013-02-02 19:40:30 -08:00
Stephen Haberman	7aba123f0c	Further simplify checking for Nil.	2013-02-02 13:53:28 -06:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Stephen Haberman	cae8a6795c	Fix dangling old variable names.	2013-02-02 02:15:39 -06:00
Stephen Haberman	696eec32c9	Move executorMemory up into SchedulerBackend.	2013-02-02 02:03:26 -06:00
Stephen Haberman	103c375ba0	Merge branch 'master' into sparkmem	2013-02-02 01:57:18 -06:00
Stephen Haberman	28e0cb9f31	Fix createActorSystem not actually using the systemName parameter. This meant all system names were "spark", which worked, but didn't lead to the most intuitive log output. This fixes createActorSystem to use the passed system name, and refactors Master/Worker to encapsulate their system/actor names instead of having the clients guess at them. Note that the driver system name, "spark", is left as is, and is still repeated a few times, but that seems like a separate issue.	2013-02-02 01:11:37 -06:00
Stephen Haberman	12c1eb4756	Reduce the amount of duplicate logging Akka does to stdout. Given we have Akka logging go through SLF4j to log4j, we don't need all the extra noise of Akka's stdout logger that is supposedly only used during Akka init time but seems to continue logging lots of noisy network events that we either don't care about or are in the log4j logs anyway. See: http://doc.akka.io/docs/akka/2.0/general/configuration.html # Log level for the very basic logger activated during AkkaApplication startup # Options: ERROR, WARNING, INFO, DEBUG # stdout-loglevel = "WARNING"	2013-02-01 21:21:44 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	4529876db0	Merge branch 'master' of github.com:mesos/spark	2013-02-01 14:07:38 -08:00
Matei Zaharia	9970926ede	formatting	2013-02-01 14:07:34 -08:00
Matei Zaharia	79c24abe4c	Merge pull request #432 from stephenh/moreprivacy Add more private declarations.	2013-02-01 14:06:55 -08:00
Matei Zaharia	de340ddf0b	Merge pull request #437 from stephenh/cancelmetacleaner Stop BlockManagers metadataCleaner.	2013-02-01 12:59:25 -08:00
Imran Rashid	c6190067ae	remove unneeded (and unused) filter on block info	2013-02-01 09:55:25 -08:00
Stephen Haberman	59c57e48df	Stop BlockManagers metadataCleaner.	2013-02-01 10:34:02 -06:00
Matei Zaharia	571af31304	Merge pull request #433 from rxin/master Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-02-01 00:32:41 -08:00
Imran Rashid	8a0a5ed533	track total partitions, in addition to cached partitions; use scala string formatting	2013-02-01 00:23:38 -08:00
Imran Rashid	f127f2ae76	fixup merge (master -> driver renaming)	2013-02-01 00:20:49 -08:00
Reynold Xin	f9af9cee6f	Moved PruneDependency into PartitionPruningRDD.scala.	2013-02-01 00:02:46 -08:00
haitao.yao	b57570fd12	Merge branch 'mesos'	2013-02-01 14:06:45 +08:00
Patrick Wendell	39ab83e957	Small fix from last commit	2013-01-31 21:52:52 -08:00
Patrick Wendell	c33f0ef41a	Some style cleanup	2013-01-31 21:50:02 -08:00
Patrick Wendell	3446d5c8d6	SPARK-673: Capture and re-throw Python exceptions This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.	2013-01-31 18:06:11 -08:00
Reynold Xin	6289d9654e	Removed the TODO comment from PartitionPruningRDD.	2013-01-31 17:49:36 -08:00
Reynold Xin	5b0fc265c2	Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-01-31 17:48:39 -08:00
Stephen Haberman	782187c210	Once we find a split with no block, we don't have to look for more.	2013-01-31 18:27:25 -06:00
Stephen Haberman	418e36caa8	Add more private declarations.	2013-01-31 17:18:33 -06:00
haitao.yao	3190483b98	bug fix for javadoc	2013-01-31 14:23:51 +08:00
Imran Rashid	02a6761589	Merge branch 'master' into blockmanager_info Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2013-01-30 18:52:35 -08:00
Imran Rashid	c1df24d085	rename Slaves --> Executor	2013-01-30 18:51:14 -08:00
Matei Zaharia	d12330bd2c	Merge pull request #426 from woggling/conn-manager-ips Remember ConnectionManagerId used to initiate SendingConnections	2013-01-30 15:02:53 -08:00
Matei Zaharia	612a9fee71	Merge pull request #428 from woggling/mesos-exec-id Make ExecutorIDs include SlaveIDs when running Mesos	2013-01-30 15:01:46 -08:00
Stephen Haberman	871476d506	Include message and exitStatus if availalbe.	2013-01-30 16:56:46 -06:00
Charles Reiss	252845d304	Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend	2013-01-30 10:38:06 -08:00
Charles Reiss	f7de6978c1	Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use the Mesos ExecutorID as a Spark ExecutorID.	2013-01-30 09:38:57 -08:00
Charles Reiss	178b89204c	Refactor DAGScheduler more to allow testing without a separate thread.	2013-01-30 09:19:55 -08:00
Charles Reiss	a3d14c0404	Refactoring to DAGScheduler to aid testing	2013-01-29 18:55:42 -08:00
Charles Reiss	16a0789e10	Remember ConnectionManagerId used to initiate SendingConnections. This prevents ConnectionManager from getting confused if a machine has multiple host names and the one getHostName() finds happens not to be the one that was passed from, e.g., the BlockManagerMaster.	2013-01-29 18:13:59 -08:00
Matei Zaharia	d54b10b6ad	Merge remote-tracking branch 'stephenh/removefailedjob' Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-01-29 18:12:29 -08:00
Matei Zaharia	ccb67ff2ca	Merge pull request #425 from stephenh/toDebugString Add RDD.toDebugString.	2013-01-29 10:44:18 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Imran Rashid	b92259ba57	Merge branch 'master' into blockmanager_info	2013-01-29 09:45:10 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	cbf72bffa5	Include name, if set, in RDD.toString().	2013-01-29 00:20:36 -06:00
Stephen Haberman	3cda14af3f	Add number of splits.	2013-01-29 00:12:31 -06:00
Matei Zaharia	a1ecec8d79	Merge branch 'master' of github.com:mesos/spark	2013-01-28 22:08:44 -08:00
Stephen Haberman	951cfd9ba2	Add JavaRDDLike.toDebugString().	2013-01-29 00:02:17 -06:00
Matei Zaharia	f6eb1f0825	Merge pull request #413 from pwendell/stage-logging SPARK-658: Adding logging of stage duration	2013-01-28 22:01:52 -08:00
Stephen Haberman	b45857c965	Add RDD.toDebugString. Original idea by Nathan Kronenfeld.	2013-01-28 23:56:56 -06:00
Patrick Wendell	7ee824e42e	Units from ms -> s	2013-01-28 21:48:32 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Matei Zaharia	dda2ce017c	Merge pull request #424 from pwendell/logging-cleanup Some DEBUG-level log cleanup.	2013-01-28 21:18:54 -08:00
Patrick Wendell	1f9b486a8b	Some DEBUG-level log cleanup. A few changes to make the DEBUG-level logs less noisy and more readable. - Moved a few very frequent messages to Trace - Changed some BlockManger log messages to make them more understandable SPARK-666 #resolve	2013-01-28 20:29:35 -08:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Imran Rashid	cec9c768c2	convenient name available in StageInfo	2013-01-28 20:09:41 -08:00
Imran Rashid	01d77f329f	expose stageInfo in SparkContext	2013-01-28 20:09:40 -08:00
Imran Rashid	38b83bc66b	can get task runtime summary from task info	2013-01-28 20:09:40 -08:00
Imran Rashid	b88daee916	simple util to summarize distributions	2013-01-28 20:09:40 -08:00
Imran Rashid	b14841455c	track task completion in DAGScheduler, and send a stageCompleted event with taskInfo to SparkListeners	2013-01-28 20:09:40 -08:00
Imran Rashid	0f22c4207f	better formatting for RDDInfo	2013-01-28 20:07:53 -08:00
Imran Rashid	a423ee546c	expose RDD & storage info directly via SparkContext	2013-01-28 20:07:53 -08:00
Patrick Wendell	501433f1d5	Making submission time a field	2013-01-28 10:45:57 -08:00
Patrick Wendell	c423be7d8e	Renaming stage finished function	2013-01-28 10:45:57 -08:00
Patrick Wendell	07f568e1bf	SPARK-658: Adding logging of stage duration	2013-01-28 10:45:57 -08:00
Matei Zaharia	286f8f876f	Change time unit in MetadataCleaner to seconds	2013-01-28 01:29:27 -08:00
Matei Zaharia	f03d9760fd	Clean up BlockManagerUI a little (make it not be an object, merge with Directives, and bind to a random port)	2013-01-27 23:56:14 -08:00
Matei Zaharia	909850729e	Rename more things from slave to executor	2013-01-27 23:17:20 -08:00
Matei Zaharia	44b4a0f88f	Track workers by executor ID instead of hostname to allow multiple executors per machine and remove the need for multiple IP addresses in unit tests.	2013-01-27 19:23:49 -08:00
Matei Zaharia	6ad8540b40	Merge pull request #401 from squito/blockmanager_ui Blockmanager ui	2013-01-27 15:51:08 -08:00
Matei Zaharia	49f6472c0f	Merge pull request #418 from woggling/reregister-deadlock Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously	2013-01-26 18:59:02 -08:00
Charles Reiss	58fc6b2bed	Handle duplicate registrations better.	2013-01-26 18:30:44 -08:00
Charles Reiss	ad4232b4da	Fix deadlock in BlockManager reregistration triggered by failed updates.	2013-01-26 18:30:38 -08:00
Josh Rosen	d49cf0e587	Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668). This workaround is easier than rewriting JavaRDDLike in Java.	2013-01-26 16:13:18 -08:00
Imran Rashid	49c05608f5	add metadatacleaner for persisentRdd map	2013-01-25 17:04:16 -08:00
Stephen Haberman	8efbda0b17	Call executeOnCompleteCallbacks in more finally blocks.	2013-01-25 14:55:33 -06:00
Imran Rashid	a1d9d1767d	fixup `1cadaa1`, changed api of map	2013-01-25 10:05:26 -08:00
Imran Rashid	1cadaa164e	switch to TimeStampedHashMap for storing persistent Rdds	2013-01-25 09:30:21 -08:00
Imran Rashid	539491bbc3	code reformatting	2013-01-25 09:29:59 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Patrick Wendell	b6fc6e6752	SPARK-541: Adding a warning for invalid Master URL Right now Spark silently parses master URL's which do not match any known regex as a Mesos URL. The Mesos error message when an invalid URL gets passed is really confusing, so this warns the user when the implicit conversion is happening.	2013-01-24 14:31:23 -08:00
Matei Zaharia	0fe173a3a5	Merge pull request #410 from rxin/splitpruningrdd Added a clearDependencies method in PartitionPruningRDD.	2013-01-23 23:10:15 -08:00
Reynold Xin	67a43bc7e6	Added a clearDependencies method in PartitionPruningRDD.	2013-01-23 23:06:52 -08:00
Matei Zaharia	fe5e4812fc	Merge pull request #409 from rxin/splitpruningrdd Added pruntSplits method to RDD.	2013-01-23 22:23:22 -08:00
Reynold Xin	c109f29c97	Updated PruneDependency to change "split" to "partition".	2013-01-23 22:22:03 -08:00
Reynold Xin	eedc542a02	Removed pruneSplits method in RDD and renamed SplitsPruningRDD to PartitionPruningRDD.	2013-01-23 22:14:23 -08:00
Reynold Xin	81004b967e	Marked prev RDD as transient in SplitsPruningRDD.	2013-01-23 21:54:27 -08:00
Reynold Xin	636e912f32	Created a PruneDependency to properly assign dependency for SplitsPruningRDD.	2013-01-23 21:21:55 -08:00
Matei Zaharia	548856a224	Merge remote-tracking branch 'woggling/remove-machines' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-01-23 15:44:17 -08:00
Reynold Xin	eb222b7206	Added pruntSplits method to RDD.	2013-01-23 15:29:02 -08:00
Matei Zaharia	1dd82743e0	Fix compile error due to cherry-pick	2013-01-23 13:07:27 -08:00
Imran Rashid	e1985bfa04	be sure to set class loader of kryo instances	2013-01-23 12:51:09 -08:00
Charles Reiss	be4a115a7e	Clarify TODO.	2013-01-23 12:48:45 -08:00
Matei Zaharia	1a3aeeca23	Merge pull request #407 from woggling/no-cache-tracker Eliminate CacheTracker	2013-01-23 12:28:48 -08:00
Charles Reiss	e1027ca639	Actually add CacheManager.	2013-01-23 12:22:11 -08:00
Matei Zaharia	4147e1d47b	Merge pull request #406 from tdas/master Changed StorageLevel and BlockManagerId API to prevent duplication in memory	2013-01-23 12:18:31 -08:00
Matei Zaharia	4d77d554e1	Merge pull request #394 from JoshRosen/add_file_fix Add SparkFiles.get() API to access files added through addFile().	2013-01-23 12:16:30 -08:00
Josh Rosen	ae2ed2947d	Allow PySpark's SparkFiles to be used from driver Fix minor documentation formatting issues.	2013-01-23 10:58:50 -08:00
Tathagata Das	79d55700ce	One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future.	2013-01-23 01:57:09 -08:00
Charles Reiss	d209b6b764	Extra debugging from hostLost()	2013-01-23 01:35:14 -08:00
Charles Reiss	9a27062260	Force generation increment after shuffle map stage	2013-01-23 01:34:44 -08:00
Tathagata Das	155f31398d	Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides.	2013-01-23 01:10:26 -08:00
Tathagata Das	5e11f1e51f	Modified StorageLevel API to ensure zero duplicate objects.	2013-01-22 23:42:53 -08:00
Tathagata Das	bacade6caf	Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.	2013-01-22 22:55:26 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Matei Zaharia	ebaa8f6519	Merge remote-tracking branch 'stephenh/cleanup' Conflicts: core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-01-22 21:05:45 -08:00
Matei Zaharia	d2d273868b	Merge pull request #397 from JoshRosen/refactoring/daemon-threads Refactor daemon thread creation	2013-01-22 21:02:53 -08:00
Stephen Haberman	98d0b7747d	Fix Worker logInfo about unknown executor.	2013-01-22 18:11:51 -06:00
Stephen Haberman	8c51322cd0	Don't bother creating an exception.	2013-01-22 18:09:10 -06:00
Stephen Haberman	fdec42385a	Fix SPARK_MEM in ExecutorRunner.	2013-01-22 18:01:12 -06:00
Stephen Haberman	2437f6741b	Restore SPARK_MEM in executorEnvs.	2013-01-22 18:01:03 -06:00
Matei Zaharia	151c47eef5	Merge pull request #399 from NFLabs/master Fix for hanging spark.HttpFileServer on the kind of virtual network	2013-01-22 15:49:24 -08:00
Stephen Haberman	250fe89679	Handle Master telling the Worker to kill an already-dead executor.	2013-01-22 16:29:05 -06:00
Stephen Haberman	6f2194f757	Call removeJob instead of killing the cluster.	2013-01-22 15:38:58 -06:00
Stephen Haberman	27b3f3f0a9	Handle slaveLost before slaveIdToHost knows about it.	2013-01-22 15:30:42 -06:00
Imran Rashid	905c720e5e	Merge branch 'master' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala	2013-01-22 12:02:27 -08:00
Imran Rashid	50e2b23927	Fix up some problems from the merge	2013-01-22 11:46:01 -08:00
Stephen Haberman	588b24197a	Use default arguments instead of constructor overloads.	2013-01-22 10:19:30 -06:00
Leemoonsoo	7e9ee2e833	Fix for hanging spark.HttpFileServer with kind of virtual network	2013-01-22 23:08:34 +09:00
Charles Reiss	e353886a8c	Use generation numbers for fetch failure tracking	2013-01-22 00:23:31 -08:00
Josh Rosen	551a47a620	Refactor daemon thread pool creation.	2013-01-21 23:31:00 -08:00
Stephen Haberman	a8baeb9327	Further simplify getOrElse call.	2013-01-21 21:30:24 -06:00
Stephen Haberman	2d8218b871	Remove unneeded/now-broken saveAsNewAPIHadoopFile overload.	2013-01-21 20:00:27 -06:00
Josh Rosen	7b9e96c992	Add synchronization to Executor.updateDependencies() (SPARK-662)	2013-01-21 17:34:23 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Stephen Haberman	ffd1623595	Minor cleanup.	2013-01-21 15:55:46 -06:00
Matei Zaharia	a88b44ed3b	Only bind to IPv4 addresses when trying to auto-detect external IP	2013-01-21 11:59:21 -08:00
Matei Zaharia	4d34c7fc3e	Fix compile error caused by cherry-pick	2013-01-21 11:33:48 -08:00
Imran Rashid	a3f571b539	more File -> String changes	2013-01-21 11:21:52 -08:00
Imran Rashid	fe26acc482	remove unused imports	2013-01-21 11:21:46 -08:00
Imran Rashid	c73107500e	send sparkHome as String instead of File over network	2013-01-21 11:21:39 -08:00
Imran Rashid	5bf73df7f0	oops, fix stupid compile error	2013-01-21 11:21:33 -08:00
Imran Rashid	aae5a920a4	get sparkHome the correct way	2013-01-21 11:21:28 -08:00
Imran Rashid	f116d6b5c6	executor can use a different sparkHome from Worker	2013-01-21 11:21:22 -08:00
Stephen Haberman	6ded481999	Merge branch 'master' into hadoopconf Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala	2013-01-21 12:56:48 -06:00
Stephen Haberman	69a417858b	Also use hadoopConfiguration in newAPI methods.	2013-01-21 12:42:11 -06:00
Matei Zaharia	c0b9ceb8c3	Log remote lifecycle events in Akka for easier debugging	2013-01-21 00:23:53 -08:00
Matei Zaharia	c7b5e5f1ec	Merge pull request #389 from JoshRosen/python_rdd_checkpointing Add checkpointing to the Python API	2013-01-20 17:10:44 -08:00
Josh Rosen	9f211dd3f0	Fix PythonPartitioner equality; see SPARK-654. PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.	2013-01-20 15:41:42 -08:00
Josh Rosen	5b6ea9e9a0	Update checkpointing API docs in Python/Java.	2013-01-20 15:31:41 -08:00
Josh Rosen	7ed1bf4b48	Add RDD checkpointing to Python API.	2013-01-20 13:19:19 -08:00
Matei Zaharia	86057ec7c8	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala	2013-01-20 12:47:55 -08:00
Matei Zaharia	8e7f098a2c	Added accumulators to PySpark	2013-01-20 01:57:44 -08:00
Tathagata Das	4f8fe58b25	Merge branch 'mesos-streaming' into streaming Conflicts: core/src/main/scala/spark/api/java/JavaRDDLike.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala core/src/test/scala/spark/JavaAPISuite.java	2013-01-20 01:13:56 -08:00
Tathagata Das	214345ceac	Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().	2013-01-19 23:50:17 -08:00
Imran Rashid	d98caa0fa0	Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala	2013-01-18 18:11:26 -08:00
Patrick Wendell	ee0314c3b3	Merge branch 'streaming' into streaming-java-api	2013-01-17 18:43:00 -08:00
Patrick Wendell	d5570c7968	Adding checkpointing to Java API	2013-01-17 18:41:58 -08:00
Matei Zaharia	54c0f9f185	Fix code that assumed spark.local.dir is only a single directory	2013-01-17 17:40:55 -08:00
Fernand Pajot	742bc841ad	changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir	2013-01-17 16:56:11 -08:00
Matei Zaharia	aff1844155	Merge pull request #381 from squito/remove_threadpool remove unused thread pool	2013-01-16 16:46:42 -08:00
Tathagata Das	f466ee44bc	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/MapOutputTracker.scala	2013-01-16 12:57:11 -08:00
Imran Rashid	eae698f755	remove unused thread pool	2013-01-16 12:21:37 -08:00
Tathagata Das	a805ac4a7c	Disabled checkpoint for PairwiseRDD (pySpark).	2013-01-16 10:55:26 -08:00
Matei Zaharia	4beb084f64	Merge pull request #374 from woggling/null-mapout Generate FetchFailedException even for cached missing map outputs	2013-01-15 14:22:29 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Stephen Haberman	74d3b23929	Add spark.executor.memory to differentiate executor memory from spark-shell memory.	2013-01-15 14:03:28 -06:00
Stephen Haberman	dd583b7ebf	Call executeOnCompleteCallbacks in a finally block.	2013-01-15 10:52:06 -06:00
Tathagata Das	eded21925a	Merge pull request #375 from tdas/streaming Important bug fixes	2013-01-14 23:06:40 -08:00
Charles Reiss	273fb5cc10	Throw FetchFailedException for cached missing locs	2013-01-14 15:26:48 -08:00
Tathagata Das	131be5d62e	Fixed bug in RDD checkpointing.	2013-01-14 03:28:25 -08:00
Tathagata Das	82b0cc90ca	Merge pull request #370 from tdas/streaming Added more documentation and minor change in API for NetworkReceiver	2013-01-13 21:28:12 -08:00
Tathagata Das	0dbd411a56	Added documentation for PairDStreamFunctions.	2013-01-13 21:08:35 -08:00
Matei Zaharia	cb867e9ffb	Merge branch 'master' of github.com:mesos/spark	2013-01-13 19:34:32 -08:00
Matei Zaharia	72408e8dfa	Make filter preserve partitioner info, since it can	2013-01-13 19:34:07 -08:00
Matei Zaharia	9a34409810	Merge pull request #360 from rxin/cogroup-java Changed CoGroupRDD's hash map from Scala to Java.	2013-01-13 15:31:08 -08:00
Reynold Xin	be7166146b	Removed the use of getOrElse to avoid Scala wrapper for every call.	2013-01-13 15:27:28 -08:00
Ryan LeCompte	c31931af7e	switch to uppercase constants	2013-01-13 10:39:47 -08:00
Ryan LeCompte	2305a2c1d9	more code cleanup	2013-01-13 10:01:56 -08:00
Matei Zaharia	fbb3fc4143	Merge pull request #346 from JoshRosen/python-api Python API (PySpark)	2013-01-12 23:49:36 -08:00
Ryan LeCompte	addff2c466	add comment	2013-01-12 09:57:29 -08:00
Ryan LeCompte	0cfea7a2ec	add unit test	2013-01-11 23:48:07 -08:00
Ryan LeCompte	ff10b3aa09	add missing return	2013-01-11 21:03:57 -08:00
Ryan LeCompte	22445fbea9	attempt to sleep for more accurate time period, minor cleanup	2013-01-11 13:30:49 -08:00
Tyson	1731f1fed4	Added an optional format parameter for individual job queries and optimized the jobId query	2013-01-11 15:01:43 -05:00
Tyson	c063e8777e	Added implicit json writers for JobDescription and ExecutorRunner	2013-01-11 14:57:38 -05:00
Stephen Haberman	5c7a127219	Pass a new Configuration that wraps the default hadoopConfiguration.	2013-01-11 11:25:11 -06:00
Stephen Haberman	3e6519a36e	Use hadoopConfiguration for default JobConf in PairRDDFunctions.	2013-01-11 11:24:20 -06:00
Matei Zaharia	2e914d9983	Formatting	2013-01-10 19:13:08 -08:00
Matei Zaharia	3548c9c0c8	Merge branch 'master' of github.com:mesos/spark	2013-01-10 19:06:40 -08:00
Matei Zaharia	6d1c230281	Merge pull request #357 from tysonjh/master JSON support added to WebUI	2013-01-10 19:06:07 -08:00
Matei Zaharia	248995c535	Merge pull request #356 from shane-huang/master Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections	2013-01-10 17:52:23 -08:00
Reynold Xin	bd336f5f40	Changed CoGroupRDD's hash map from Scala to Java.	2013-01-10 17:13:04 -08:00
Stephen Haberman	d1864052c5	Fix invalid asInstanceOf cast.	2013-01-10 12:16:26 -06:00
Stephen Haberman	b15e851279	Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables. For custom properties, use "spark.hadoop." as a prefix instead of just "hadoop.".	2013-01-10 10:55:41 -06:00
shane-huang	9930a95d21	Modified Patch according to comments	2013-01-10 20:09:55 +08:00
Stephen Haberman	e3861ae395	Provide and expose a default Hadoop Configuration. Any "hadoop.*" system properties will be passed along into configuration.	2013-01-09 17:08:14 -06:00
Tyson	549ee388a1	Removed io.spray spray-json dependency as it is not needed.	2013-01-09 15:12:23 -05:00
Tyson	bf9d9946f9	Query parameter reformatted to be more extensible and routing more robust	2013-01-09 11:29:58 -05:00
Tyson	0da2ff102e	Added url query parameter json and handler	2013-01-09 10:40:48 -05:00
Tyson	269fe018c7	JSON object definitions	2013-01-09 10:40:43 -05:00
Matei Zaharia	9cc764f523	Code style	2013-01-08 22:29:57 -08:00
Matei Zaharia	14972141f9	Merge pull request #344 from mbautin/log_preferred_hosts Log preferred hosts	2013-01-08 22:26:34 -08:00
Josh Rosen	b57dd0f160	Add mapPartitionsWithSplit() to PySpark.	2013-01-08 16:05:02 -08:00
Stephen Haberman	8ac0f35be4	Add JavaRDDLike.keyBy.	2013-01-08 09:57:45 -06:00
Stephen Haberman	4ee6b22775	Merge branch 'master' into tupleBy Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-01-08 09:10:10 -06:00
shane-huang	e4cb72da8a	Fix an issue in ConnectionManager where sendingMessage may create too many unnecessary SendingConnections.	2013-01-08 22:40:58 +08:00
Mikhail Bautin	4725b0f643	Fixing if/else coding style for preferred hosts logging	2013-01-07 20:09:26 -08:00
Mikhail Bautin	c41042c816	Log preferred hosts	2013-01-07 20:06:09 -08:00
Matei Zaharia	f7cf035b9b	Merge pull request #350 from tdas/streaming Spark Streaming	2013-01-07 17:40:11 -08:00
Shivaram Venkataraman	77d751731c	Remove unused BoundedMemoryCache file and associated test case.	2013-01-07 15:57:46 -08:00
Shivaram Venkataraman	aed368a970	Update Hadoop dependency to 1.0.3 as 0.20 has Sun specific dependencies. Also fix SequenceFileRDDFunctions to pick the right type conversion across Hadoop versions	2013-01-07 15:57:33 -08:00
Shivaram Venkataraman	f8d579a0c0	Remove dependencies on sun jvm classes. Instead use reflection to infer HotSpot options and total physical memory size	2013-01-07 15:57:18 -08:00
Tathagata Das	3b0a3b89ac	Added better docs for RDDCheckpointData	2013-01-07 14:55:49 -08:00
Tathagata Das	237bac36e9	Renamed examples and added documentation.	2013-01-07 14:37:21 -08:00
Matei Zaharia	1941d9602d	Merge branch 'master' of github.com:mesos/spark	2013-01-07 16:50:39 -05:00
Matei Zaharia	9c32f300fb	Add Accumulable.setValue for easier use in Java	2013-01-07 16:50:23 -05:00
Tathagata Das	1346126485	Changed cleanup to clearOldValues for TimeStampedHashMap and TimeStampedHashSet.	2013-01-07 12:11:27 -08:00
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Matei Zaharia	8fd3a70c18	Add PairRDD.keys() and values() to Java API	2013-01-05 22:46:45 -05:00
Matei Zaharia	b1663752c6	Merge pull request #351 from stephenh/values Add PairRDDFunctions.keys and values.	2013-01-05 19:15:54 -08:00
Matei Zaharia	0982572519	Add methods called just 'accumulator' for int/double in Java API	2013-01-05 22:11:28 -05:00
Matei Zaharia	86af64b0a6	Fix Accumulators in Java, and add a test for them	2013-01-05 20:55:17 -05:00
Matei Zaharia	ecf9c08901	Fix Accumulators in Java, and add a test for them	2013-01-05 20:54:08 -05:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Stephen Haberman	f4e6b9361f	Add RDD.collect(PartialFunction).	2013-01-05 12:14:08 -06:00
Stephen Haberman	8d57c78c83	Add PairRDDFunctions.keys and values.	2013-01-05 12:04:01 -06:00
Josh Rosen	33beba3965	Change PySpark RDD.take() to not call iterator().	2013-01-03 14:52:21 -08:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Josh Rosen	b58340dbd9	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
Josh Rosen	170e451fbd	Minor documentation and style fixes for PySpark.	2013-01-01 13:52:14 -08:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Charles Reiss	58072a7340	Remove some dead comments	2013-01-01 08:07:44 -08:00
Charles Reiss	feadaf72f4	Mark key as not loading in CacheTracker even when compute() fails	2013-01-01 07:57:20 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Tathagata Das	7e0271b438	Refactored a whole lot to push all DStreams into the spark.streaming.dstream package.	2012-12-30 15:19:55 -08:00
Tathagata Das	9e644402c1	Improved jekyll and scala docs. Made many classes and method private to remove them from scala docs.	2012-12-29 18:31:51 -08:00
Josh Rosen	59195c68ec	Update PySpark for compatibility with TaskContext.	2012-12-29 16:01:03 -08:00
Josh Rosen	c5cee53f20	Merge remote-tracking branch 'origin/master' into python-api Conflicts: docs/quick-start.md	2012-12-29 16:00:51 -08:00
Josh Rosen	7ec3595de2	Fix bug (introduced by batching) in PySpark take()	2012-12-28 22:21:16 -08:00
Josh Rosen	397e67103c	Change Utils.fetchFile() warning to SparkException.	2012-12-28 17:37:13 -08:00
Josh Rosen	d64fa72d2e	Add addFile() and addJar() to JavaSparkContext.	2012-12-28 17:00:57 -08:00
Josh Rosen	bd237d4a9d	Add synchronization to LocalScheduler.updateDependencies().	2012-12-28 17:00:57 -08:00
Josh Rosen	f1bf4f0385	Skip deletion of files in clearFiles(). This fixes an issue where Spark could delete original files in the current working directory that were added to the job using addFile(). There was also the potential for addFile() to overwrite local files, which is addressed by changing Utils.fetchFile() to log a warning instead of overwriting a file with new contents. This is a short-term fix; a better long-term solution would be to remove the dependence on storing files in the current working directory, since we can't change the cwd from Java.	2012-12-28 17:00:57 -08:00
Josh Rosen	fbadb1cda5	Mark api.python classes as private; echo Java output to stderr.	2012-12-28 09:06:11 -08:00
Tathagata Das	0bc0a60d30	Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.	2012-12-27 15:37:33 -08:00
Tathagata Das	7c33f76291	Merge branch 'mesos' into dev-merge	2012-12-26 19:19:07 -08:00
Tathagata Das	836042bb9f	Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/scheduler/ResultTask.scala core/src/test/scala/spark/CheckpointSuite.scala	2012-12-26 19:09:01 -08:00
Josh Rosen	1dca0c5180	Remove debug output from PythonPartitioner.	2012-12-26 18:23:06 -08:00
Josh Rosen	4608902fb8	Use filesystem to collect RDDs in PySpark. Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.	2012-12-24 17:20:10 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	60f7338092	Remove the call to close input stream in Kryo serializer.	2012-12-21 15:49:33 -08:00
Matei Zaharia	3334b7c6b5	Merge pull request #341 from rxin/4a3fb06ac2d11125feb08acbbd4df76d1e91b677 Kryo2 update against Spark master	2012-12-21 15:31:23 -08:00
Reynold Xin	eac566a7f4	Merge branch 'master' of github.com:mesos/spark into dev Conflicts: core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/PairRDDFunctions.scala core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerId.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/MetadataCleaner.scala core/src/main/scala/spark/util/TimeStampedHashMap.scala core/src/test/scala/spark/storage/BlockManagerSuite.scala run	2012-12-20 14:53:40 -08:00
Tathagata Das	8512dd3225	Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/test/scala/spark/CheckpointSuite.scala streaming/src/main/scala/spark/streaming/DStream.scala	2012-12-20 14:24:19 -08:00
Tathagata Das	fe777eb77d	Fixed bugs in CheckpointRDD and spark.CheckpointSuite.	2012-12-20 13:39:27 -08:00
Tathagata Das	f9c5b0a6fe	Changed checkpoint writing and reading process.	2012-12-20 11:52:23 -08:00
Matei Zaharia	5e51b889fe	Merge pull request #327 from rxin/spark-633 Added the ability in block manager to remove blocks.	2012-12-20 11:33:38 -08:00
Reynold Xin	9397c5014e	Let the slave notify the master block removal.	2012-12-20 01:37:09 -08:00
Reynold Xin	68c52d80ec	Moved BlockManager's IdGenerator into BlockManager object. Removed some excessive debug messages.	2012-12-19 15:27:23 -08:00
Tathagata Das	5184141936	Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.	2012-12-18 13:30:53 -08:00
Patrick Wendell	bfac06e1f6	SPARK-616: Logging dead workers in Web UI. This patch keeps track of which workers have died and marks them as such in the master web UI. It also handles workers which die and re-register using different actor ID's.	2012-12-17 23:09:05 -08:00
Tathagata Das	72eed2b95e	Converted CheckpointState in RDDCheckpointData to use scala Enumeration.	2012-12-17 18:52:43 -08:00
Matei Zaharia	b82a6dd2c7	Merge pull request #332 from JoshRosen/spark-607 Add try-finally to handle MapOutputTracker timeouts	2012-12-14 11:41:16 -08:00
Reynold Xin	06f855c24d	Merge branch 'spark-633' of github.com:rxin/spark into spark-633	2012-12-14 00:27:24 -08:00
Reynold Xin	8c01295b85	Fixed conflicts from merging Charles' and TD's block manager changes.	2012-12-14 00:26:36 -08:00
Charles Reiss	c528932a41	Code review cleanup.	2012-12-13 22:37:16 -08:00
Charles Reiss	0aad42b5e7	Have standalone cluster report exit codes to clients. Addresses SPARK-639.	2012-12-13 22:37:16 -08:00
Reynold Xin	97434f49b8	Merged TD's block manager refactoring.	2012-12-13 22:32:19 -08:00
Reynold Xin	41e58a519a	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 22:06:47 -08:00
Josh Rosen	cf52d9cade	Add try-finally to handle MapOutputTracker timeouts.	2012-12-13 21:53:30 -08:00
Matei Zaharia	05e225f988	Merge pull request #329 from woggling/executor-status-codes Executor exit status codes	2012-12-13 20:14:10 -08:00
Charles Reiss	b054d3b222	ExecutorLostReason -> ExecutorLossReason	2012-12-13 18:44:07 -08:00
Charles Reiss	24d7aa2d15	Extra whitespace in ExecutorExitCode	2012-12-13 18:39:23 -08:00
Reynold Xin	dc7d7fc286	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 16:48:34 -08:00
Reynold Xin	4f076e105e	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream. Incorporated Matei's command.	2012-12-13 16:41:15 -08:00
Charles Reiss	829206f1a7	Explain slaveLost calls made by StandaloneSchedulerBackend	2012-12-13 16:23:36 -08:00
Charles Reiss	a4041dd87f	Log duplicate slaveLost() calls in ClusterScheduler.	2012-12-13 16:23:36 -08:00
Charles Reiss	fa9df4a45d	Normalize executor exit statuses and report them to the user.	2012-12-13 16:23:31 -08:00
Reynold Xin	eacb98e900	SPARK-635: Pass a TaskContext object to compute() interface and use that to close Hadoop input stream.	2012-12-13 15:41:53 -08:00
Josh Rosen	7c9e3d1c21	Return success or failure in BlockStore.remove().	2012-12-13 15:22:27 -08:00
Reynold Xin	1b7a0451ed	Added the ability in block manager to remove blocks.	2012-12-13 00:04:42 -08:00
Charles Reiss	1d8e2e6cff	Call slaveLost on executor death for standalone clusters.	2012-12-12 21:15:34 -08:00
Tathagata Das	8e74fac215	Made checkpoint data in RDDs optional to further reduce serialized size.	2012-12-11 15:36:12 -08:00
Tathagata Das	fa28f25619	Fixed bug in UnionRDD and CoGroupedRDD	2012-12-11 13:59:43 -08:00
Tathagata Das	746afc2e65	Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask.	2012-12-10 23:36:37 -08:00
Reynold Xin	21b271f5bd	Suppress shuffle block updates when a slave node comes back.	2012-12-10 20:36:03 -08:00
Matei Zaharia	a1a2daa7ef	Merge pull request #317 from woggling/block-manager-heartbeat Implement block manager heartbeat	2012-12-10 11:03:55 -08:00
Charles Reiss	b6b62d774f	Decrease BlockManagerMaster logging verbosity	2012-12-10 00:31:55 -08:00
Charles Reiss	5d3e917d09	Use Akka scheduler for BlockManager heart beats. Adds required ActorSystem argument to BlockManager constructors.	2012-12-10 00:31:50 -08:00
Charles Reiss	b53dd28c90	Changed default block manager heartbeat interval to 5 s	2012-12-09 23:03:34 -08:00
Matei Zaharia	e1d7cd2276	Search for a non-loopback address in Utils.getLocalIpAddress	2012-12-08 00:33:11 -08:00
Patrick Wendell	3e796bdd57	Changes in response to TD's review.	2012-12-07 19:34:05 -08:00
Patrick Wendell	c36ca10241	Adding locality aware parallelize	2012-12-07 16:42:36 -08:00
Tathagata Das	1f3a75ae9e	Modified checkpoint testsuite to more comprehensively test checkpointing of various RDDs. Fixed checkpoint bug (splits referring to parent RDDs or parent splits) in UnionRDD and CoalescedRDD. Fixed bug in testing ShuffledRDD. Removed unnecessary and useless map-side combining step for narrow dependencies in CoGroupedRDD. Removed unncessary WeakReference stuff from many other RDDs.	2012-12-07 13:45:52 -08:00
Charles Reiss	714c8d32d5	Don't divide by milliseconds by 1000 more.	2012-12-06 18:38:34 -08:00
Charles Reiss	8f0819520c	map -> foreach	2012-12-06 18:29:50 -08:00
Charles Reiss	7a033fd795	Make LocalSparkCluster use distinct IPs	2012-12-06 00:03:08 -08:00
Charles Reiss	d21ca010ac	Add block manager heart beats. Renames old message called 'HeartBeat' to 'BlockUpdate'. The BlockManager periodically sends a heart beat message to the master. If the manager is currently not registered. The master responds to the heart beat by indicating whether the BlockManager is currently registered with the master. Additionally, the master now also responds to block updates by indicating whether the BlockManager in question is registered. When the BlockManager detects (by heart beat or failed block update) that it stopped being registered, it reregisters and sends block updates for all its blocks.	2012-12-05 23:35:20 -08:00
Charles Reiss	c9e54a6755	Track block managers by hostname; handle manager removal.	2012-12-05 23:35:20 -08:00
Charles Reiss	5afa2ee9e9	Actually put millis in _lastSeenMs	2012-12-05 23:35:20 -08:00
Charles Reiss	813ac71459	Don't use bogus port number in notifyADeadHost().	2012-12-05 23:35:20 -08:00
Tathagata Das	21a0852976	Refactored RDD checkpointing to minimize extra fields in RDD class.	2012-12-04 22:10:25 -08:00
Tathagata Das	a69a82be26	Added metadata cleaner to HttpBroadcast to clean up old broacast files.	2012-12-03 22:37:31 -08:00
Josh Rosen	cdaa0fad51	Use external addresses in standalone WebUI on EC2.	2012-12-01 18:19:13 -08:00
Tathagata Das	b4dba55f78	Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.	2012-12-02 02:03:05 +00:00
Tathagata Das	477de94894	Minor modifications.	2012-12-01 13:15:06 -08:00
Tathagata Das	6fcd09f499	Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker.	2012-11-29 02:06:33 -08:00
Tathagata Das	c9789751bf	Added metadata cleaner to BlockManager to remove old blocks completely.	2012-11-28 23:18:24 -08:00
Tathagata Das	9e9e9e1d89	Renamed CleanupTask to MetadataCleaner.	2012-11-28 18:48:14 -08:00
Tathagata Das	e463ae4920	Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing.	2012-11-28 14:05:01 -08:00
Tathagata Das	d5e7aad039	Bug fixes	2012-11-28 08:36:55 +00:00
Matei Zaharia	f86960cba9	Merge pull request #313 from rxin/pde_size_compress Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 22:39:25 -08:00
Matei Zaharia	3ebd8e1885	Added zip to Java API	2012-11-27 22:38:09 -08:00
Matei Zaharia	27e43abd19	Added a zip() operation for RDDs with the same shape (number of partitions and number of elements in each partition)	2012-11-27 22:27:47 -08:00
Matei Zaharia	f410a111ad	Merge branch 'master' of github.com:mesos/spark	2012-11-27 20:51:58 -08:00
Josh Rosen	7d71b9a56a	Fix NullPointerException caused by unregistered map outputs.	2012-11-27 20:51:51 -08:00
Matei Zaharia	935c468b71	Merge pull request #311 from woggling/map-output-npe Fix NullPointerException when map output unregistered from MapOutputTracker twice	2012-11-27 20:50:48 -08:00
Reynold Xin	bd6dd1a3a6	Added a partition preserving flag to MapPartitionsWithSplitRDD.	2012-11-27 19:43:30 -08:00
Reynold Xin	f24bfd2dd1	For size compression, compress non zero values into non zero values.	2012-11-27 19:20:45 -08:00
Charles Reiss	cf79de425d	Fix NullPointerException when unregistering a map output twice.	2012-11-27 16:12:05 -08:00
Tathagata Das	b18d70870a	Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.	2012-11-27 15:08:49 -08:00
Tathagata Das	0fe2fc4d5e	Merged branch mesos/master to branch dev.	2012-11-26 13:16:59 -08:00
Tathagata Das	c97ebf6437	Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.	2012-11-19 23:22:07 +00:00
Matei Zaharia	3ff6f4bdee	Merge pull request #304 from mbautin/configurable_local_ip SPARK-624: make the default local IP customizable	2012-11-19 13:23:39 -08:00
mbautin	00f4e3ff9c	Addressing Matei's comment: SPARK_LOCAL_IP environment variable	2012-11-19 11:52:10 -08:00
Tathagata Das	10c1abcb6a	Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.	2012-11-17 17:27:00 -08:00
Charles Reiss	12c24e786c	Set default uncaught exception handler to exit. Among other things, should prevent OutOfMemoryErrors in some daemon threads (such as the network manager) from causing a spark executor to enter a state where it cannot make progress but does not report an error.	2012-11-16 20:12:31 -08:00
mbautin	1f5a7e0e64	SPARK-624: make the default local IP customizable	2012-11-15 13:57:47 -08:00
Matei Zaharia	c23a74df0a	Use DNS names instead of IP addresses in standalone mode, to allow matching with data locality hints from storage systems.	2012-11-15 00:10:52 -08:00
Tathagata Das	8a25d530ed	Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.	2012-11-13 02:16:28 -08:00
Denny	05e3807354	Merge branch 'master' into blockmanagerUI	2012-11-12 10:56:54 -08:00
Denny	4a1be7e0db	Refactor BlockManager UI and adding worker details.	2012-11-12 10:56:35 -08:00
Matei Zaharia	173e0354c0	Detect correctly when one has disconnected from a standalone cluster. SPARK-617 #resolve	2012-11-11 21:06:57 -08:00
Denny	68e0a88282	Merge branch 'master' into blockmanagerUI	2012-11-11 14:00:02 -08:00
Denny	b829fba749	Merge branch 'master' into blockmanagerUI Conflicts: core/src/main/twirl/spark/deploy/worker/index.scala.html	2012-11-11 13:59:40 -08:00
Tathagata Das	04e9e9d93c	Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.	2012-11-11 08:54:21 -08:00
root	acf8272324	Fix K-means example a little	2012-11-10 23:07:21 -08:00
Tathagata Das	355c8e4b17	Fixed deadlock in BlockManager.	2012-11-09 16:28:45 -08:00
Tathagata Das	9915989bfa	Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.	2012-11-09 15:46:15 -08:00
Tathagata Das	de00bc63db	Fixed deadlock in BlockManager. 1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks. 2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads. 3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.	2012-11-09 14:09:37 -08:00
Matei Zaharia	6607f546cc	Added an option to spread out jobs in the standalone mode.	2012-11-08 23:13:12 -08:00
Matei Zaharia	66cbdee941	Fix for connections not being reused (from Josh Rosen)	2012-11-08 09:53:40 -08:00
Imran Rashid	809b2bb1fe	fix bug in getting slave id out of mesos	2012-11-08 00:34:28 -08:00
Matei Zaharia	bb1bce7924	Various fixes to standalone mode and web UI: - Don't report a job as finishing multiple times - Don't show state of workers as LOADING when they're running - Show start and finish times in web UI - Sort web UI tables by ID and time by default	2012-11-07 16:49:53 -08:00
Matei Zaharia	e2b8477487	Made Akka timeout and message frame size configurable, and upped the defaults	2012-11-06 15:58:05 -08:00
Tathagata Das	72b2303f99	Fixed major bugs in checkpointing.	2012-11-05 11:41:36 -08:00
Tathagata Das	d154238789	Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.	2012-11-04 12:12:06 -08:00
Shivaram Venkataraman	a7d967a1ca	Remove unnecessary hash-map put in MemoryStore	2012-11-01 10:46:38 -07:00
Tathagata Das	34e569f40e	Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.	2012-10-31 00:56:40 -07:00
Tathagata Das	0dcd770fdc	Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.	2012-10-30 16:09:37 -07:00
Denny	eb95212f4d	code Formatting	2012-10-29 14:57:32 -07:00
Denny	531ac136bf	BlockManager UI.	2012-10-29 14:53:47 -07:00
Tathagata Das	ac12abc17f	Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).	2012-10-29 11:55:27 -07:00
Josh Rosen	2ccf3b6652	Fix PySpark hash partitioning bug. A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.	2012-10-28 22:30:28 -07:00
root	e782187b4a	Don't throw an error in the block manager when a block is cached on the master due to a locally computed operation Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2012-10-26 00:33:45 -07:00
Matei Zaharia	863a55ae42	Merge remote-tracking branch 'public/master' into dev Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala core/src/main/scala/spark/KryoSerializer.scala core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/executor/Executor.scala core/src/main/scala/spark/network/Connection.scala core/src/main/scala/spark/network/ConnectionManagerTest.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/NewHadoopRDD.scala core/src/main/scala/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockMessage.scala core/src/main/scala/spark/storage/BlockStore.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/AkkaUtils.scala project/SparkBuild.scala run	2012-10-24 23:21:00 -07:00
Matei Zaharia	f63a40fd99	Strip leading mesos:// in URLs passed to Mesos	2012-10-24 21:52:13 -07:00
Matei Zaharia	d290e964ea	Merge pull request #281 from rxin/memreport Added a method to report slave memory status; force serialize accumulator update in local mode.	2012-10-23 22:04:35 -07:00
Matei Zaharia	0bd20c63e2	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala	2012-10-23 22:01:45 -07:00
Josh Rosen	d4f2e5b0ef	Remove PYTHONPATH from SparkContext's executorEnvs. It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD.	2012-10-22 10:28:59 -07:00
Josh Rosen	c23bf1aff4	Add PySpark README and run scripts.	2012-10-20 00:22:27 +00:00
Josh Rosen	52989c8a2c	Update Python API for v0.6.0 compatibility.	2012-10-19 10:24:49 -07:00
Josh Rosen	e21eb6e00d	Merge tag 'v0.6.0' into python-api	2012-10-19 09:44:32 -07:00
Thomas Dudziak	d9c2a89c57	Support for Hadoop 2 distributions such as cdh4	2012-10-18 16:08:54 -07:00
Reynold Xin	4a3fb06ac2	Updated Kryo to 2.20.	2012-10-16 01:10:01 -07:00
Reynold Xin	63fae9bc23	Serialize accumulator updates in TaskResult for local mode.	2012-10-15 21:38:28 -07:00
Reynold Xin	42d20fa8da	Added a method to report slave memory status.	2012-10-14 22:30:53 -07:00
Matei Zaharia	64dbf8d372	Made ShuffleDependency automatically find a shuffle ID for itself	2012-10-14 10:00:22 -07:00
Tathagata Das	e95ff45b53	Implemented checkpointing of StreamingContext and DStream graph.	2012-10-13 20:10:49 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Josh Rosen	33cd3a0c12	Remove map-side combining from ShuffleMapTask. This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	10bcd217d2	Remove mapSideCombine field from Aggregator. Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.	2012-10-13 14:59:20 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	110832e88f	Add helper methods to Aggregator.	2012-10-13 14:57:56 -07:00
Denny	0700d1920a	Protect from null env variables in mesos.	2012-10-13 13:57:59 -07:00
Denny	21047d923e	Protect from setting null environment variables.	2012-10-13 13:44:24 -07:00
Denny	fa41d50f7d	Don't use system envs for Mesos.	2012-10-13 13:15:50 -07:00
Denny	67c42a41d0	Let the user specify environment variables to be passed to the Executors. Also removed unused variables in the ExecutorRunner.	2012-10-13 13:08:44 -07:00
Matei Zaharia	b4067cbad4	More doc updates, and moved Serializer to a subpackage.	2012-10-12 18:19:21 -07:00
Matei Zaharia	8d7b77bcb5	Some doc and usability improvements: - Added a StorageLevels class for easy access to StorageLevel constants in Java - Added doc comments on Function classes in Java - Updated Accumulator and HadoopWriter docs slightly	2012-10-12 17:53:20 -07:00
Matei Zaharia	dca496bb77	Document cartesian() operation	2012-10-12 14:46:41 -07:00
Matei Zaharia	23015ccac0	Merge pull request #271 from shivaram/block-manager-npe-fix Change block manager to accept a ArrayBuffer	2012-10-12 14:36:28 -07:00
Patrick Wendell	dc8adbd359	Adding Java documentation	2012-10-11 00:49:03 -07:00
Shivaram Venkataraman	2cf40c5fd5	Change block manager to accept a ArrayBuffer instead of an iterator to ensure that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface	2012-10-11 00:42:46 -07:00
Denny	d3f095f904	Fixed bug when fetching Jar dependencies. Instead of checking currentFiles check currentJars.	2012-10-10 16:09:53 -07:00
Matei Zaharia	ee2fcb2ce6	Added documentation to all the *RDDFunction classes, and moved them into the spark package to make them more visible. Also documented various other miscellaneous things in the API.	2012-10-09 18:38:36 -07:00
Matei Zaharia	bc0bc672d0	Updates to documentation: - Edited quick start and tuning guide to simplify them a little - Simplified top menu bar - Made private a SparkContext constructor parameter that was left as public - Various small fixes	2012-10-09 14:30:23 -07:00
Andy Konwinski	1d79ff6028	Fixes a typo, adds scaladoc comments to SparkContext constructors.	2012-10-08 22:49:17 -07:00
Patrick Wendell	ac310098ef	More docs in RDD class	2012-10-08 22:25:11 -07:00
Andy Konwinski	bd688940a1	A start on scaladoc for the public APIs.	2012-10-08 21:13:29 -07:00
Mosharaf Chowdhury	edc67bfba8	Merge branch 'dev' into bc-fix-dev	2012-10-08 16:19:13 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Matei Zaharia	039cc6228e	Merge pull request #251 from JoshRosen/docs/internals Document Dependency classes and make minor interface improvements	2012-10-07 09:56:53 -07:00
Reynold Xin	f66c0e9561	Changed the println to logInfo in Utils.fetchFile.	2012-10-07 01:53:24 -07:00
Matei Zaharia	d72db3d7dc	Merge pull request #250 from rxin/dev Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created incorrectly for local mode.	2012-10-07 00:56:53 -07:00
Reynold Xin	80f59e17e2	Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created wrong for local mode.	2012-10-07 00:54:38 -07:00
Josh Rosen	e10308f5a0	Make ShuffleDependency.aggregator explicitly optional. It was confusing to be using new Aggregator[K, V, V](null, null, null, false) to represent the absence of an aggregator.	2012-10-07 00:36:04 -07:00
Matei Zaharia	f930fe5d81	Improve error message	2012-10-07 07:34:36 +00:00
Matei Zaharia	a3bf0ce57f	Don't crash on ask timeout exceptions in deploy.Client.stop() (fixes a crash in tests)	2012-10-07 07:25:41 +00:00
Matei Zaharia	eca570f66a	Removed the need to sleep in tests due to waiting for Akka to shut down	2012-10-07 00:17:59 -07:00
Josh Rosen	4f72066a9a	Document the Dependency classes.	2012-10-07 00:05:37 -07:00
Josh Rosen	3f2571fe98	Remove unused isShuffle field from Dependency.	2012-10-07 00:03:55 -07:00
Matei Zaharia	b2fc3dd902	Log message	2012-10-07 06:43:52 +00:00
Matei Zaharia	ea096f7cd5	More logging	2012-10-07 06:35:48 +00:00
root	554b42cb24	Log more info in MapOutputTracker	2012-10-07 05:02:18 +00:00
root	a73b25826b	Made Akka thread pool and message batch sizes configurable	2012-10-07 04:19:54 +00:00

... 8 9 10 11 12 ...

1505 commits