ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Patrick Wendell	f0b68c623c	Initial cut at replacing K, V in Java files	2013-02-11 10:03:37 -08:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Matei Zaharia	a4611d66f0	Merge pull request #449 from stephenh/longerdriversuite Increase DriverSuite timeout.	2013-02-05 17:58:22 -08:00
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Charles Reiss	1fd5ee323d	Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.	2013-02-01 22:33:38 -08:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Charles Reiss	7f51458774	Comment at top of DAGSchedulerSuite	2013-01-30 09:34:53 -08:00
Charles Reiss	9c0bae75ad	Change DAGSchedulerSuite to run DAGScheduler in the same Thread.	2013-01-30 09:22:07 -08:00
Charles Reiss	4bf3d7ea12	Clear spark.master.port to cleanup for other tests	2013-01-29 19:05:58 -08:00
Charles Reiss	9eac7d01f0	Add DAGScheduler tests.	2013-01-29 18:55:43 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Matei Zaharia	44b4a0f88f	Track workers by executor ID instead of hostname to allow multiple executors per machine and remove the need for multiple IP addresses in unit tests.	2013-01-27 19:23:49 -08:00
Matei Zaharia	49f6472c0f	Merge pull request #418 from woggling/reregister-deadlock Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously	2013-01-26 18:59:02 -08:00
Charles Reiss	ad4232b4da	Fix deadlock in BlockManager reregistration triggered by failed updates.	2013-01-26 18:30:38 -08:00
Josh Rosen	d49cf0e587	Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668). This workaround is easier than rewriting JavaRDDLike in Java.	2013-01-26 16:13:18 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Stephen Haberman	ec43a51b38	Merge branch 'master' into localsparkcontext Conflicts: core/src/test/scala/spark/FileServerSuite.scala core/src/test/scala/spark/RDDSuite.scala	2013-01-24 21:17:30 -06:00
Stephen Haberman	230bda2047	Add LocalSparkContext to manage common sc variable.	2013-01-24 11:01:01 -06:00
Matei Zaharia	fe5e4812fc	Merge pull request #409 from rxin/splitpruningrdd Added pruntSplits method to RDD.	2013-01-23 22:23:22 -08:00
Reynold Xin	eedc542a02	Removed pruneSplits method in RDD and renamed SplitsPruningRDD to PartitionPruningRDD.	2013-01-23 22:14:23 -08:00
Reynold Xin	45cd50d5fe	Updated assert == to ===.	2013-01-23 16:06:58 -08:00
Matei Zaharia	548856a224	Merge remote-tracking branch 'woggling/remove-machines' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-01-23 15:44:17 -08:00
Reynold Xin	c24b3819dd	Added an extra assert for split size check.	2013-01-23 15:34:59 -08:00
Reynold Xin	eb222b7206	Added pruntSplits method to RDD.	2013-01-23 15:29:02 -08:00
Charles Reiss	5c7422292e	Remove more dead code from test.	2013-01-23 12:59:51 -08:00
Charles Reiss	88b9d240fd	Remove dead code in test.	2013-01-23 12:40:38 -08:00
Matei Zaharia	1a3aeeca23	Merge pull request #407 from woggling/no-cache-tracker Eliminate CacheTracker	2013-01-23 12:28:48 -08:00
Matei Zaharia	4147e1d47b	Merge pull request #406 from tdas/master Changed StorageLevel and BlockManagerId API to prevent duplication in memory	2013-01-23 12:18:31 -08:00
Matei Zaharia	4d77d554e1	Merge pull request #394 from JoshRosen/add_file_fix Add SparkFiles.get() API to access files added through addFile().	2013-01-23 12:16:30 -08:00
Charles Reiss	0b506dd2ec	Add tests of various node failure scenarios.	2013-01-23 01:38:15 -08:00
Tathagata Das	5e11f1e51f	Modified StorageLevel API to ensure zero duplicate objects.	2013-01-22 23:42:53 -08:00
Tathagata Das	bacade6caf	Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.	2013-01-22 22:55:26 -08:00
Josh Rosen	43e9ff9596	Add test for driver hanging on exit (SPARK-530).	2013-01-22 22:47:26 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Stephen Haberman	ffd1623595	Minor cleanup.	2013-01-21 15:55:46 -06:00
Tathagata Das	4f8fe58b25	Merge branch 'mesos-streaming' into streaming Conflicts: core/src/main/scala/spark/api/java/JavaRDDLike.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala core/src/test/scala/spark/JavaAPISuite.java	2013-01-20 01:13:56 -08:00

1 2 3 4 5

249 commits