ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Matei Zaharia	44b4a0f88f	Track workers by executor ID instead of hostname to allow multiple executors per machine and remove the need for multiple IP addresses in unit tests.	2013-01-27 19:23:49 -08:00
Matei Zaharia	49f6472c0f	Merge pull request #418 from woggling/reregister-deadlock Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously	2013-01-26 18:59:02 -08:00
Charles Reiss	ad4232b4da	Fix deadlock in BlockManager reregistration triggered by failed updates.	2013-01-26 18:30:38 -08:00
Josh Rosen	d49cf0e587	Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668). This workaround is easier than rewriting JavaRDDLike in Java.	2013-01-26 16:13:18 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Stephen Haberman	ec43a51b38	Merge branch 'master' into localsparkcontext Conflicts: core/src/test/scala/spark/FileServerSuite.scala core/src/test/scala/spark/RDDSuite.scala	2013-01-24 21:17:30 -06:00
Stephen Haberman	230bda2047	Add LocalSparkContext to manage common sc variable.	2013-01-24 11:01:01 -06:00
Matei Zaharia	fe5e4812fc	Merge pull request #409 from rxin/splitpruningrdd Added pruntSplits method to RDD.	2013-01-23 22:23:22 -08:00
Reynold Xin	eedc542a02	Removed pruneSplits method in RDD and renamed SplitsPruningRDD to PartitionPruningRDD.	2013-01-23 22:14:23 -08:00
Reynold Xin	45cd50d5fe	Updated assert == to ===.	2013-01-23 16:06:58 -08:00
Matei Zaharia	548856a224	Merge remote-tracking branch 'woggling/remove-machines' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-01-23 15:44:17 -08:00
Reynold Xin	c24b3819dd	Added an extra assert for split size check.	2013-01-23 15:34:59 -08:00
Reynold Xin	eb222b7206	Added pruntSplits method to RDD.	2013-01-23 15:29:02 -08:00
Charles Reiss	5c7422292e	Remove more dead code from test.	2013-01-23 12:59:51 -08:00
Charles Reiss	88b9d240fd	Remove dead code in test.	2013-01-23 12:40:38 -08:00
Matei Zaharia	1a3aeeca23	Merge pull request #407 from woggling/no-cache-tracker Eliminate CacheTracker	2013-01-23 12:28:48 -08:00
Matei Zaharia	4147e1d47b	Merge pull request #406 from tdas/master Changed StorageLevel and BlockManagerId API to prevent duplication in memory	2013-01-23 12:18:31 -08:00
Matei Zaharia	4d77d554e1	Merge pull request #394 from JoshRosen/add_file_fix Add SparkFiles.get() API to access files added through addFile().	2013-01-23 12:16:30 -08:00
Charles Reiss	0b506dd2ec	Add tests of various node failure scenarios.	2013-01-23 01:38:15 -08:00
Tathagata Das	5e11f1e51f	Modified StorageLevel API to ensure zero duplicate objects.	2013-01-22 23:42:53 -08:00
Tathagata Das	bacade6caf	Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.	2013-01-22 22:55:26 -08:00
Josh Rosen	43e9ff9596	Add test for driver hanging on exit (SPARK-530).	2013-01-22 22:47:26 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Stephen Haberman	ffd1623595	Minor cleanup.	2013-01-21 15:55:46 -06:00
Tathagata Das	4f8fe58b25	Merge branch 'mesos-streaming' into streaming Conflicts: core/src/main/scala/spark/api/java/JavaRDDLike.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala core/src/test/scala/spark/JavaAPISuite.java	2013-01-20 01:13:56 -08:00
Patrick Wendell	d5570c7968	Adding checkpointing to Java API	2013-01-17 18:41:58 -08:00
Tathagata Das	f466ee44bc	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/MapOutputTracker.scala	2013-01-16 12:57:11 -08:00
Matei Zaharia	4beb084f64	Merge pull request #374 from woggling/null-mapout Generate FetchFailedException even for cached missing map outputs	2013-01-15 14:22:29 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Charles Reiss	4078623b9f	Remove broken attempt to test fetching case.	2013-01-15 12:05:54 -08:00
Stephen Haberman	d228bff440	Add a test.	2013-01-15 11:48:50 -06:00
Charles Reiss	b038999797	Fix accidental spark.master.host reuse	2013-01-14 17:04:44 -08:00
Charles Reiss	7ba34bc007	Additional tests for MapOutputTracker.	2013-01-14 15:27:02 -08:00
Matei Zaharia	72408e8dfa	Make filter preserve partitioner info, since it can	2013-01-13 19:34:07 -08:00
Ryan LeCompte	ea20ae6618	add one extra test	2013-01-12 09:18:00 -08:00
Ryan LeCompte	2c77eeebb6	correct test params	2013-01-12 00:13:45 -08:00
Ryan LeCompte	0cfea7a2ec	add unit test	2013-01-11 23:48:07 -08:00
Stephen Haberman	8ac0f35be4	Add JavaRDDLike.keyBy.	2013-01-08 09:57:45 -06:00
Stephen Haberman	4ee6b22775	Merge branch 'master' into tupleBy Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-01-08 09:10:10 -06:00
Matei Zaharia	f7cf035b9b	Merge pull request #350 from tdas/streaming Spark Streaming	2013-01-07 17:40:11 -08:00
Shivaram Venkataraman	b1336e2fe4	Update expected size of strings to match our dummy string class	2013-01-07 17:00:32 -08:00
Tathagata Das	4719e6d8fe	Changed locations for unit test logs.	2013-01-07 16:06:07 -08:00

1 2 3 4 5

232 commits