ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Stephen Haberman	8bd0e888f3	Inline mergePair to look more like the narrow dep branch. No functionality changes, I think this is just more consistent given mergePair isn't called multiple times/recursive. Also added a comment to explain the usual case of having two parent RDDs.	2013-02-05 17:50:25 -06:00
Imran Rashid	1704b124d8	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:33:52 -08:00
Imran Rashid	cfab1a3528	add as many fetch requests as we can, subject to maxBytesInFlight	2013-02-05 14:31:46 -08:00
Imran Rashid	696e4b2167	track remoteFetchTime	2013-02-05 14:29:16 -08:00
Imran Rashid	b29f9cc978	BlockManager.getMultiple returns a custom iterator, to enable tracking of shuffle performance	2013-02-05 14:00:44 -08:00
Imran Rashid	e319ac74c1	cogrouped RDD stores the amount of time taken to read shuffle data in each task	2013-02-05 10:18:16 -08:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Imran Rashid	9df7e2ae55	Shuffle Fetchers use a timed iterator	2013-02-05 10:18:16 -08:00
Imran Rashid	1ad77c4766	add TimedIterator	2013-02-05 10:18:15 -08:00
Imran Rashid	843084d69d	track total bytes written by ShuffleMapTasks	2013-02-05 10:18:15 -08:00
haitao.yao	f609182e5b	Merge branch 'mesos'	2013-02-05 14:09:45 +08:00
Imran Rashid	b430d2359d	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-02-04 21:40:44 -08:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Matei Zaharia	f7b4e428be	Merge pull request #445 from JoshRosen/pyspark_fixes Fix exit status in PySpark unit tests; fix/optimize PySpark's RDD.take()	2013-02-03 21:36:36 -08:00
haitao.yao	faa4d9e31f	Merge branch 'mesos'	2013-02-04 11:40:15 +08:00
Patrick Wendell	b14322956c	Starvation check in Standlone scheduler	2013-02-03 12:45:10 -08:00
Patrick Wendell	667860448a	Starvation check in ClusterScheduler	2013-02-03 12:45:04 -08:00
Matei Zaharia	3bfaf3ab1d	Merge pull request #379 from stephenh/sparkmem Add spark.executor.memory to differentiate executor memory from spark-shell	2013-02-02 23:58:23 -08:00
Matei Zaharia	88ee6163a1	Merge pull request #422 from squito/blockmanager_info RDDInfo available from SparkContext	2013-02-02 23:44:13 -08:00
Matei Zaharia	cd4ca93679	Merge pull request #436 from stephenh/removeextraloop Once we find a split with no block, we don't have to look for more.	2013-02-02 23:39:28 -08:00
Matei Zaharia	d5daaab381	Merge pull request #442 from stephenh/fixsystemnames Fix createActorSystem not actually using the systemName parameter.	2013-02-02 23:38:46 -08:00
Matei Zaharia	9163c3705d	Formatting	2013-02-02 23:34:47 -08:00
Josh Rosen	8fbd5380b7	Fetch fewer objects in PySpark's take() method.	2013-02-03 06:44:49 +00:00
Matei Zaharia	34a7bcdb3a	Formatting	2013-02-02 19:40:30 -08:00
Stephen Haberman	7aba123f0c	Further simplify checking for Nil.	2013-02-02 13:53:28 -06:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Stephen Haberman	cae8a6795c	Fix dangling old variable names.	2013-02-02 02:15:39 -06:00
Stephen Haberman	696eec32c9	Move executorMemory up into SchedulerBackend.	2013-02-02 02:03:26 -06:00
Stephen Haberman	103c375ba0	Merge branch 'master' into sparkmem	2013-02-02 01:57:18 -06:00
Stephen Haberman	28e0cb9f31	Fix createActorSystem not actually using the systemName parameter. This meant all system names were "spark", which worked, but didn't lead to the most intuitive log output. This fixes createActorSystem to use the passed system name, and refactors Master/Worker to encapsulate their system/actor names instead of having the clients guess at them. Note that the driver system name, "spark", is left as is, and is still repeated a few times, but that seems like a separate issue.	2013-02-02 01:11:37 -06:00
Charles Reiss	1fd5ee323d	Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.	2013-02-01 22:33:38 -08:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Stephen Haberman	12c1eb4756	Reduce the amount of duplicate logging Akka does to stdout. Given we have Akka logging go through SLF4j to log4j, we don't need all the extra noise of Akka's stdout logger that is supposedly only used during Akka init time but seems to continue logging lots of noisy network events that we either don't care about or are in the log4j logs anyway. See: http://doc.akka.io/docs/akka/2.0/general/configuration.html # Log level for the very basic logger activated during AkkaApplication startup # Options: ERROR, WARNING, INFO, DEBUG # stdout-loglevel = "WARNING"	2013-02-01 21:21:44 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	4529876db0	Merge branch 'master' of github.com:mesos/spark	2013-02-01 14:07:38 -08:00
Matei Zaharia	9970926ede	formatting	2013-02-01 14:07:34 -08:00
Matei Zaharia	79c24abe4c	Merge pull request #432 from stephenh/moreprivacy Add more private declarations.	2013-02-01 14:06:55 -08:00
Matei Zaharia	de340ddf0b	Merge pull request #437 from stephenh/cancelmetacleaner Stop BlockManagers metadataCleaner.	2013-02-01 12:59:25 -08:00
Imran Rashid	c6190067ae	remove unneeded (and unused) filter on block info	2013-02-01 09:55:25 -08:00
Stephen Haberman	59c57e48df	Stop BlockManagers metadataCleaner.	2013-02-01 10:34:02 -06:00
Matei Zaharia	571af31304	Merge pull request #433 from rxin/master Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-02-01 00:32:41 -08:00
Imran Rashid	8a0a5ed533	track total partitions, in addition to cached partitions; use scala string formatting	2013-02-01 00:23:38 -08:00
Imran Rashid	f127f2ae76	fixup merge (master -> driver renaming)	2013-02-01 00:20:49 -08:00
Reynold Xin	f9af9cee6f	Moved PruneDependency into PartitionPruningRDD.scala.	2013-02-01 00:02:46 -08:00
haitao.yao	b57570fd12	Merge branch 'mesos'	2013-02-01 14:06:45 +08:00
Matei Zaharia	7e2e046e37	Merge pull request #434 from pwendell/python-exceptions SPARK-673: Capture and re-throw Python exceptions	2013-01-31 21:58:26 -08:00
Patrick Wendell	39ab83e957	Small fix from last commit	2013-01-31 21:52:52 -08:00
Patrick Wendell	c33f0ef41a	Some style cleanup	2013-01-31 21:50:02 -08:00
Patrick Wendell	3446d5c8d6	SPARK-673: Capture and re-throw Python exceptions This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.	2013-01-31 18:06:11 -08:00
Reynold Xin	6289d9654e	Removed the TODO comment from PartitionPruningRDD.	2013-01-31 17:49:36 -08:00
Reynold Xin	5b0fc265c2	Changed PartitionPruningRDD's split to make sure it returns the correct split index.	2013-01-31 17:48:39 -08:00
Stephen Haberman	782187c210	Once we find a split with no block, we don't have to look for more.	2013-01-31 18:27:25 -06:00
Stephen Haberman	418e36caa8	Add more private declarations.	2013-01-31 17:18:33 -06:00
Mikhail Bautin	fe3eceab57	Remove activation of profiles by default See the discussion at https://github.com/mesos/spark/pull/355 for why default profile activation is a problem.	2013-01-31 13:30:41 -08:00
haitao.yao	3190483b98	bug fix for javadoc	2013-01-31 14:23:51 +08:00
Imran Rashid	02a6761589	Merge branch 'master' into blockmanager_info Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2013-01-30 18:52:35 -08:00
Imran Rashid	c1df24d085	rename Slaves --> Executor	2013-01-30 18:51:14 -08:00
Matei Zaharia	d12330bd2c	Merge pull request #426 from woggling/conn-manager-ips Remember ConnectionManagerId used to initiate SendingConnections	2013-01-30 15:02:53 -08:00
Matei Zaharia	612a9fee71	Merge pull request #428 from woggling/mesos-exec-id Make ExecutorIDs include SlaveIDs when running Mesos	2013-01-30 15:01:46 -08:00
Stephen Haberman	871476d506	Include message and exitStatus if availalbe.	2013-01-30 16:56:46 -06:00
Charles Reiss	252845d304	Remove remants of attempt to use slaveId-executorId in MesosExecutorBackend	2013-01-30 10:38:06 -08:00
Charles Reiss	f7de6978c1	Use Mesos ExecutorIDs to hold SlaveIDs. Then we can safely use the Mesos ExecutorID as a Spark ExecutorID.	2013-01-30 09:38:57 -08:00
Charles Reiss	7f51458774	Comment at top of DAGSchedulerSuite	2013-01-30 09:34:53 -08:00
Charles Reiss	9c0bae75ad	Change DAGSchedulerSuite to run DAGScheduler in the same Thread.	2013-01-30 09:22:07 -08:00
Charles Reiss	178b89204c	Refactor DAGScheduler more to allow testing without a separate thread.	2013-01-30 09:19:55 -08:00
Charles Reiss	4bf3d7ea12	Clear spark.master.port to cleanup for other tests	2013-01-29 19:05:58 -08:00
Charles Reiss	9eac7d01f0	Add DAGScheduler tests.	2013-01-29 18:55:43 -08:00
Charles Reiss	a3d14c0404	Refactoring to DAGScheduler to aid testing	2013-01-29 18:55:42 -08:00
Charles Reiss	16a0789e10	Remember ConnectionManagerId used to initiate SendingConnections. This prevents ConnectionManager from getting confused if a machine has multiple host names and the one getHostName() finds happens not to be the one that was passed from, e.g., the BlockManagerMaster.	2013-01-29 18:13:59 -08:00
Matei Zaharia	d54b10b6ad	Merge remote-tracking branch 'stephenh/removefailedjob' Conflicts: core/src/main/scala/spark/deploy/master/Master.scala	2013-01-29 18:12:29 -08:00
Matei Zaharia	ccb67ff2ca	Merge pull request #425 from stephenh/toDebugString Add RDD.toDebugString.	2013-01-29 10:44:18 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Charles Reiss	a34096a76d	Add easymock to POMs	2013-01-29 10:04:33 -08:00
Imran Rashid	b92259ba57	Merge branch 'master' into blockmanager_info	2013-01-29 09:45:10 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	cbf72bffa5	Include name, if set, in RDD.toString().	2013-01-29 00:20:36 -06:00
Stephen Haberman	3cda14af3f	Add number of splits.	2013-01-29 00:12:31 -06:00
Matei Zaharia	a1ecec8d79	Merge branch 'master' of github.com:mesos/spark	2013-01-28 22:08:44 -08:00
Stephen Haberman	951cfd9ba2	Add JavaRDDLike.toDebugString().	2013-01-29 00:02:17 -06:00
Matei Zaharia	f6eb1f0825	Merge pull request #413 from pwendell/stage-logging SPARK-658: Adding logging of stage duration	2013-01-28 22:01:52 -08:00
Stephen Haberman	b45857c965	Add RDD.toDebugString. Original idea by Nathan Kronenfeld.	2013-01-28 23:56:56 -06:00
Patrick Wendell	7ee824e42e	Units from ms -> s	2013-01-28 21:48:32 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Matei Zaharia	dda2ce017c	Merge pull request #424 from pwendell/logging-cleanup Some DEBUG-level log cleanup.	2013-01-28 21:18:54 -08:00
Patrick Wendell	1f9b486a8b	Some DEBUG-level log cleanup. A few changes to make the DEBUG-level logs less noisy and more readable. - Moved a few very frequent messages to Trace - Changed some BlockManger log messages to make them more understandable SPARK-666 #resolve	2013-01-28 20:29:35 -08:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Imran Rashid	cec9c768c2	convenient name available in StageInfo	2013-01-28 20:09:41 -08:00
Imran Rashid	01d77f329f	expose stageInfo in SparkContext	2013-01-28 20:09:40 -08:00
Imran Rashid	38b83bc66b	can get task runtime summary from task info	2013-01-28 20:09:40 -08:00
Imran Rashid	b88daee916	simple util to summarize distributions	2013-01-28 20:09:40 -08:00
Imran Rashid	b14841455c	track task completion in DAGScheduler, and send a stageCompleted event with taskInfo to SparkListeners	2013-01-28 20:09:40 -08:00
Imran Rashid	0f22c4207f	better formatting for RDDInfo	2013-01-28 20:07:53 -08:00
Imran Rashid	a423ee546c	expose RDD & storage info directly via SparkContext	2013-01-28 20:07:53 -08:00
Patrick Wendell	501433f1d5	Making submission time a field	2013-01-28 10:45:57 -08:00
Patrick Wendell	c423be7d8e	Renaming stage finished function	2013-01-28 10:45:57 -08:00
Patrick Wendell	07f568e1bf	SPARK-658: Adding logging of stage duration	2013-01-28 10:45:57 -08:00
Matei Zaharia	286f8f876f	Change time unit in MetadataCleaner to seconds	2013-01-28 01:29:27 -08:00
Matei Zaharia	f03d9760fd	Clean up BlockManagerUI a little (make it not be an object, merge with Directives, and bind to a random port)	2013-01-27 23:56:14 -08:00
Matei Zaharia	909850729e	Rename more things from slave to executor	2013-01-27 23:17:20 -08:00
Matei Zaharia	44b4a0f88f	Track workers by executor ID instead of hostname to allow multiple executors per machine and remove the need for multiple IP addresses in unit tests.	2013-01-27 19:23:49 -08:00
Matei Zaharia	6ad8540b40	Merge pull request #401 from squito/blockmanager_ui Blockmanager ui	2013-01-27 15:51:08 -08:00
Matei Zaharia	49f6472c0f	Merge pull request #418 from woggling/reregister-deadlock Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously	2013-01-26 18:59:02 -08:00
Charles Reiss	58fc6b2bed	Handle duplicate registrations better.	2013-01-26 18:30:44 -08:00
Charles Reiss	ad4232b4da	Fix deadlock in BlockManager reregistration triggered by failed updates.	2013-01-26 18:30:38 -08:00
Josh Rosen	d49cf0e587	Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668). This workaround is easier than rewriting JavaRDDLike in Java.	2013-01-26 16:13:18 -08:00
Imran Rashid	49c05608f5	add metadatacleaner for persisentRdd map	2013-01-25 17:04:16 -08:00
Stephen Haberman	8efbda0b17	Call executeOnCompleteCallbacks in more finally blocks.	2013-01-25 14:55:33 -06:00
Imran Rashid	a1d9d1767d	fixup `1cadaa1`, changed api of map	2013-01-25 10:05:26 -08:00
Imran Rashid	1cadaa164e	switch to TimeStampedHashMap for storing persistent Rdds	2013-01-25 09:30:21 -08:00
Imran Rashid	539491bbc3	code reformatting	2013-01-25 09:29:59 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Stephen Haberman	ec43a51b38	Merge branch 'master' into localsparkcontext Conflicts: core/src/test/scala/spark/FileServerSuite.scala core/src/test/scala/spark/RDDSuite.scala	2013-01-24 21:17:30 -06:00
Patrick Wendell	b6fc6e6752	SPARK-541: Adding a warning for invalid Master URL Right now Spark silently parses master URL's which do not match any known regex as a Mesos URL. The Mesos error message when an invalid URL gets passed is really confusing, so this warns the user when the implicit conversion is happening.	2013-01-24 14:31:23 -08:00
Stephen Haberman	230bda2047	Add LocalSparkContext to manage common sc variable.	2013-01-24 11:01:01 -06:00
Matei Zaharia	0fe173a3a5	Merge pull request #410 from rxin/splitpruningrdd Added a clearDependencies method in PartitionPruningRDD.	2013-01-23 23:10:15 -08:00
Reynold Xin	67a43bc7e6	Added a clearDependencies method in PartitionPruningRDD.	2013-01-23 23:06:52 -08:00
Matei Zaharia	fe5e4812fc	Merge pull request #409 from rxin/splitpruningrdd Added pruntSplits method to RDD.	2013-01-23 22:23:22 -08:00
Reynold Xin	c109f29c97	Updated PruneDependency to change "split" to "partition".	2013-01-23 22:22:03 -08:00
Reynold Xin	eedc542a02	Removed pruneSplits method in RDD and renamed SplitsPruningRDD to PartitionPruningRDD.	2013-01-23 22:14:23 -08:00
Reynold Xin	81004b967e	Marked prev RDD as transient in SplitsPruningRDD.	2013-01-23 21:54:27 -08:00
Reynold Xin	636e912f32	Created a PruneDependency to properly assign dependency for SplitsPruningRDD.	2013-01-23 21:21:55 -08:00
Reynold Xin	45cd50d5fe	Updated assert == to ===.	2013-01-23 16:06:58 -08:00
Matei Zaharia	548856a224	Merge remote-tracking branch 'woggling/remove-machines' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-01-23 15:44:17 -08:00
Reynold Xin	c24b3819dd	Added an extra assert for split size check.	2013-01-23 15:34:59 -08:00
Reynold Xin	eb222b7206	Added pruntSplits method to RDD.	2013-01-23 15:29:02 -08:00
Matei Zaharia	1dd82743e0	Fix compile error due to cherry-pick	2013-01-23 13:07:27 -08:00
Charles Reiss	5c7422292e	Remove more dead code from test.	2013-01-23 12:59:51 -08:00
Imran Rashid	e1985bfa04	be sure to set class loader of kryo instances	2013-01-23 12:51:09 -08:00
Charles Reiss	be4a115a7e	Clarify TODO.	2013-01-23 12:48:45 -08:00
Charles Reiss	88b9d240fd	Remove dead code in test.	2013-01-23 12:40:38 -08:00
Matei Zaharia	1a3aeeca23	Merge pull request #407 from woggling/no-cache-tracker Eliminate CacheTracker	2013-01-23 12:28:48 -08:00
Charles Reiss	e1027ca639	Actually add CacheManager.	2013-01-23 12:22:11 -08:00
Matei Zaharia	4147e1d47b	Merge pull request #406 from tdas/master Changed StorageLevel and BlockManagerId API to prevent duplication in memory	2013-01-23 12:18:31 -08:00
Matei Zaharia	4d77d554e1	Merge pull request #394 from JoshRosen/add_file_fix Add SparkFiles.get() API to access files added through addFile().	2013-01-23 12:16:30 -08:00
Josh Rosen	ae2ed2947d	Allow PySpark's SparkFiles to be used from driver Fix minor documentation formatting issues.	2013-01-23 10:58:50 -08:00
Tathagata Das	79d55700ce	One more fix. Made even default constructor of BlockManagerId private to prevent such problems in the future.	2013-01-23 01:57:09 -08:00
Charles Reiss	0b506dd2ec	Add tests of various node failure scenarios.	2013-01-23 01:38:15 -08:00
Charles Reiss	d209b6b764	Extra debugging from hostLost()	2013-01-23 01:35:14 -08:00
Charles Reiss	9a27062260	Force generation increment after shuffle map stage	2013-01-23 01:34:44 -08:00
Tathagata Das	155f31398d	Made StorageLevel constructor private, and added StorageLevels.create() to the Java API. Updates scala and java programming guides.	2013-01-23 01:10:26 -08:00
Tathagata Das	5e11f1e51f	Modified StorageLevel API to ensure zero duplicate objects.	2013-01-22 23:42:53 -08:00
Tathagata Das	bacade6caf	Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.	2013-01-22 22:55:26 -08:00
Josh Rosen	43e9ff9596	Add test for driver hanging on exit (SPARK-530).	2013-01-22 22:47:26 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Matei Zaharia	ebaa8f6519	Merge remote-tracking branch 'stephenh/cleanup' Conflicts: core/src/main/scala/spark/scheduler/local/LocalScheduler.scala	2013-01-22 21:05:45 -08:00
Matei Zaharia	d2d273868b	Merge pull request #397 from JoshRosen/refactoring/daemon-threads Refactor daemon thread creation	2013-01-22 21:02:53 -08:00
Stephen Haberman	98d0b7747d	Fix Worker logInfo about unknown executor.	2013-01-22 18:11:51 -06:00
Stephen Haberman	8c51322cd0	Don't bother creating an exception.	2013-01-22 18:09:10 -06:00
Stephen Haberman	fdec42385a	Fix SPARK_MEM in ExecutorRunner.	2013-01-22 18:01:12 -06:00
Stephen Haberman	2437f6741b	Restore SPARK_MEM in executorEnvs.	2013-01-22 18:01:03 -06:00
Matei Zaharia	151c47eef5	Merge pull request #399 from NFLabs/master Fix for hanging spark.HttpFileServer on the kind of virtual network	2013-01-22 15:49:24 -08:00
Stephen Haberman	250fe89679	Handle Master telling the Worker to kill an already-dead executor.	2013-01-22 16:29:05 -06:00
Stephen Haberman	6f2194f757	Call removeJob instead of killing the cluster.	2013-01-22 15:38:58 -06:00
Stephen Haberman	27b3f3f0a9	Handle slaveLost before slaveIdToHost knows about it.	2013-01-22 15:30:42 -06:00
Imran Rashid	905c720e5e	Merge branch 'master' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala	2013-01-22 12:02:27 -08:00
Imran Rashid	50e2b23927	Fix up some problems from the merge	2013-01-22 11:46:01 -08:00
Stephen Haberman	588b24197a	Use default arguments instead of constructor overloads.	2013-01-22 10:19:30 -06:00
Leemoonsoo	7e9ee2e833	Fix for hanging spark.HttpFileServer with kind of virtual network	2013-01-22 23:08:34 +09:00
Charles Reiss	e353886a8c	Use generation numbers for fetch failure tracking	2013-01-22 00:23:31 -08:00
Josh Rosen	551a47a620	Refactor daemon thread pool creation.	2013-01-21 23:31:00 -08:00
Stephen Haberman	a8baeb9327	Further simplify getOrElse call.	2013-01-21 21:30:24 -06:00
Stephen Haberman	2d8218b871	Remove unneeded/now-broken saveAsNewAPIHadoopFile overload.	2013-01-21 20:00:27 -06:00
Josh Rosen	7b9e96c992	Add synchronization to Executor.updateDependencies() (SPARK-662)	2013-01-21 17:34:23 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Stephen Haberman	ffd1623595	Minor cleanup.	2013-01-21 15:55:46 -06:00
Matei Zaharia	a88b44ed3b	Only bind to IPv4 addresses when trying to auto-detect external IP	2013-01-21 11:59:21 -08:00
Matei Zaharia	4d34c7fc3e	Fix compile error caused by cherry-pick	2013-01-21 11:33:48 -08:00
Imran Rashid	a3f571b539	more File -> String changes	2013-01-21 11:21:52 -08:00
Imran Rashid	fe26acc482	remove unused imports	2013-01-21 11:21:46 -08:00
Imran Rashid	c73107500e	send sparkHome as String instead of File over network	2013-01-21 11:21:39 -08:00
Imran Rashid	5bf73df7f0	oops, fix stupid compile error	2013-01-21 11:21:33 -08:00
Imran Rashid	aae5a920a4	get sparkHome the correct way	2013-01-21 11:21:28 -08:00
Imran Rashid	f116d6b5c6	executor can use a different sparkHome from Worker	2013-01-21 11:21:22 -08:00
Stephen Haberman	6ded481999	Merge branch 'master' into hadoopconf Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala	2013-01-21 12:56:48 -06:00
Stephen Haberman	69a417858b	Also use hadoopConfiguration in newAPI methods.	2013-01-21 12:42:11 -06:00
Matei Zaharia	c0b9ceb8c3	Log remote lifecycle events in Akka for easier debugging	2013-01-21 00:23:53 -08:00
Matei Zaharia	c7b5e5f1ec	Merge pull request #389 from JoshRosen/python_rdd_checkpointing Add checkpointing to the Python API	2013-01-20 17:10:44 -08:00
Josh Rosen	9f211dd3f0	Fix PythonPartitioner equality; see SPARK-654. PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.	2013-01-20 15:41:42 -08:00
Josh Rosen	5b6ea9e9a0	Update checkpointing API docs in Python/Java.	2013-01-20 15:31:41 -08:00
Josh Rosen	7ed1bf4b48	Add RDD checkpointing to Python API.	2013-01-20 13:19:19 -08:00
Matei Zaharia	86057ec7c8	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala	2013-01-20 12:47:55 -08:00
folone	fd6e51deec	Fixed the failing test.	2013-01-20 17:02:58 +01:00
folone	ad8aff6ca4	Merge remote-tracking branch 'upstream/master'	2013-01-20 14:43:20 +01:00
folone	a5403acd4e	Updated maven build for scala 2.10.	2013-01-20 14:42:16 +01:00
Matei Zaharia	8e7f098a2c	Added accumulators to PySpark	2013-01-20 01:57:44 -08:00
Tathagata Das	4f8fe58b25	Merge branch 'mesos-streaming' into streaming Conflicts: core/src/main/scala/spark/api/java/JavaRDDLike.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala core/src/test/scala/spark/JavaAPISuite.java	2013-01-20 01:13:56 -08:00
Tathagata Das	214345ceac	Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().	2013-01-19 23:50:17 -08:00
Imran Rashid	d98caa0fa0	Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala	2013-01-18 18:11:26 -08:00
Patrick Wendell	ee0314c3b3	Merge branch 'streaming' into streaming-java-api	2013-01-17 18:43:00 -08:00
Patrick Wendell	d5570c7968	Adding checkpointing to Java API	2013-01-17 18:41:58 -08:00
Matei Zaharia	54c0f9f185	Fix code that assumed spark.local.dir is only a single directory	2013-01-17 17:40:55 -08:00
Fernand Pajot	742bc841ad	changed HttpBroadcast server cache to be in spark.local.dir instead of java.io.tmpdir	2013-01-17 16:56:11 -08:00
Matei Zaharia	aff1844155	Merge pull request #381 from squito/remove_threadpool remove unused thread pool	2013-01-16 16:46:42 -08:00
Tathagata Das	f466ee44bc	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/MapOutputTracker.scala	2013-01-16 12:57:11 -08:00
Imran Rashid	eae698f755	remove unused thread pool	2013-01-16 12:21:37 -08:00
Tathagata Das	a805ac4a7c	Disabled checkpoint for PairwiseRDD (pySpark).	2013-01-16 10:55:26 -08:00
Matei Zaharia	4beb084f64	Merge pull request #374 from woggling/null-mapout Generate FetchFailedException even for cached missing map outputs	2013-01-15 14:22:29 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Charles Reiss	4078623b9f	Remove broken attempt to test fetching case.	2013-01-15 12:05:54 -08:00
Stephen Haberman	74d3b23929	Add spark.executor.memory to differentiate executor memory from spark-shell memory.	2013-01-15 14:03:28 -06:00
Stephen Haberman	d228bff440	Add a test.	2013-01-15 11:48:50 -06:00
Stephen Haberman	dd583b7ebf	Call executeOnCompleteCallbacks in a finally block.	2013-01-15 10:52:06 -06:00
Tathagata Das	eded21925a	Merge pull request #375 from tdas/streaming Important bug fixes	2013-01-14 23:06:40 -08:00
Charles Reiss	b038999797	Fix accidental spark.master.host reuse	2013-01-14 17:04:44 -08:00
Charles Reiss	7ba34bc007	Additional tests for MapOutputTracker.	2013-01-14 15:27:02 -08:00
Charles Reiss	273fb5cc10	Throw FetchFailedException for cached missing locs	2013-01-14 15:26:48 -08:00
Tathagata Das	131be5d62e	Fixed bug in RDD checkpointing.	2013-01-14 03:28:25 -08:00
folone	25c0739bad	Moved to scala 2.10.0. Notable changes are: - akka 2.0.3 → 2.1.0 - spray 1.0-M1 → 1.1-M7 For now the repl subproject is commented out, as scala reflection api changed very much since the introduction of macros.	2013-01-14 09:52:11 +01:00
Tathagata Das	82b0cc90ca	Merge pull request #370 from tdas/streaming Added more documentation and minor change in API for NetworkReceiver	2013-01-13 21:28:12 -08:00
Tathagata Das	0dbd411a56	Added documentation for PairDStreamFunctions.	2013-01-13 21:08:35 -08:00
Matei Zaharia	cb867e9ffb	Merge branch 'master' of github.com:mesos/spark	2013-01-13 19:34:32 -08:00
Matei Zaharia	72408e8dfa	Make filter preserve partitioner info, since it can	2013-01-13 19:34:07 -08:00
Matei Zaharia	9a34409810	Merge pull request #360 from rxin/cogroup-java Changed CoGroupRDD's hash map from Scala to Java.	2013-01-13 15:31:08 -08:00
Reynold Xin	be7166146b	Removed the use of getOrElse to avoid Scala wrapper for every call.	2013-01-13 15:27:28 -08:00
Ryan LeCompte	c31931af7e	switch to uppercase constants	2013-01-13 10:39:47 -08:00
Ryan LeCompte	2305a2c1d9	more code cleanup	2013-01-13 10:01:56 -08:00
Mikhail Bautin	88d8f11365	Add missing dependency spray-json to Maven build	2013-01-13 00:46:25 -08:00
Matei Zaharia	fbb3fc4143	Merge pull request #346 from JoshRosen/python-api Python API (PySpark)	2013-01-12 23:49:36 -08:00
Matei Zaharia	01413ca0e7	Merge pull request #364 from tysonjh/master Executor and JobDescription JSON support added	2013-01-12 16:17:07 -08:00
Matei Zaharia	995075bf79	Merge pull request #355 from shivaram/default-hadoop-pom Activate hadoop1 profile by default for maven builds	2013-01-12 15:38:36 -08:00
Shivaram Venkataraman	bbc56d85ed	Rename environment variable for hadoop profiles to hadoopVersion	2013-01-12 15:24:13 -08:00
Ryan LeCompte	addff2c466	add comment	2013-01-12 09:57:29 -08:00
Ryan LeCompte	ea20ae6618	add one extra test	2013-01-12 09:18:00 -08:00
Ryan LeCompte	2c77eeebb6	correct test params	2013-01-12 00:13:45 -08:00
Ryan LeCompte	0cfea7a2ec	add unit test	2013-01-11 23:48:07 -08:00
Ryan LeCompte	ff10b3aa09	add missing return	2013-01-11 21:03:57 -08:00
Ryan LeCompte	22445fbea9	attempt to sleep for more accurate time period, minor cleanup	2013-01-11 13:30:49 -08:00
Tyson	1731f1fed4	Added an optional format parameter for individual job queries and optimized the jobId query	2013-01-11 15:01:43 -05:00
Tyson	c063e8777e	Added implicit json writers for JobDescription and ExecutorRunner	2013-01-11 14:57:38 -05:00
Stephen Haberman	5c7a127219	Pass a new Configuration that wraps the default hadoopConfiguration.	2013-01-11 11:25:11 -06:00
Stephen Haberman	3e6519a36e	Use hadoopConfiguration for default JobConf in PairRDDFunctions.	2013-01-11 11:24:20 -06:00
Shivaram Venkataraman	9262522306	Activate hadoop2 profile in pom.xml with -Dhadoop=2	2013-01-10 22:07:34 -08:00
Matei Zaharia	2e914d9983	Formatting	2013-01-10 19:13:08 -08:00
Matei Zaharia	3548c9c0c8	Merge branch 'master' of github.com:mesos/spark	2013-01-10 19:06:40 -08:00
Matei Zaharia	6d1c230281	Merge pull request #357 from tysonjh/master JSON support added to WebUI	2013-01-10 19:06:07 -08:00
Matei Zaharia	248995c535	Merge pull request #356 from shane-huang/master Fix an issue in ConnectionManager where sendMessage may create too many unnecessary connections	2013-01-10 17:52:23 -08:00
Reynold Xin	bd336f5f40	Changed CoGroupRDD's hash map from Scala to Java.	2013-01-10 17:13:04 -08:00
Stephen Haberman	d1864052c5	Fix invalid asInstanceOf cast.	2013-01-10 12:16:26 -06:00
Stephen Haberman	b15e851279	Check for AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY environment variables. For custom properties, use "spark.hadoop." as a prefix instead of just "hadoop.".	2013-01-10 10:55:41 -06:00
shane-huang	9930a95d21	Modified Patch according to comments	2013-01-10 20:09:55 +08:00
Stephen Haberman	e3861ae395	Provide and expose a default Hadoop Configuration. Any "hadoop.*" system properties will be passed along into configuration.	2013-01-09 17:08:14 -06:00
Tyson	549ee388a1	Removed io.spray spray-json dependency as it is not needed.	2013-01-09 15:12:23 -05:00
Tyson	bf9d9946f9	Query parameter reformatted to be more extensible and routing more robust	2013-01-09 11:29:58 -05:00
Tyson	0da2ff102e	Added url query parameter json and handler	2013-01-09 10:40:48 -05:00
Tyson	269fe018c7	JSON object definitions	2013-01-09 10:40:43 -05:00
Matei Zaharia	9cc764f523	Code style	2013-01-08 22:29:57 -08:00
Matei Zaharia	14972141f9	Merge pull request #344 from mbautin/log_preferred_hosts Log preferred hosts	2013-01-08 22:26:34 -08:00
Josh Rosen	b57dd0f160	Add mapPartitionsWithSplit() to PySpark.	2013-01-08 16:05:02 -08:00

... 3 4 5 6 7 ...

1270 commits