ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Matei Zaharia	9970926ede	formatting	2013-02-01 14:07:34 -08:00
Matei Zaharia	ccb67ff2ca	Merge pull request #425 from stephenh/toDebugString Add RDD.toDebugString.	2013-01-29 10:44:18 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	cbf72bffa5	Include name, if set, in RDD.toString().	2013-01-29 00:20:36 -06:00
Stephen Haberman	3cda14af3f	Add number of splits.	2013-01-29 00:12:31 -06:00
Stephen Haberman	b45857c965	Add RDD.toDebugString. Original idea by Nathan Kronenfeld.	2013-01-28 23:56:56 -06:00
Matei Zaharia	6ad8540b40	Merge pull request #401 from squito/blockmanager_ui Blockmanager ui	2013-01-27 15:51:08 -08:00
Imran Rashid	539491bbc3	code reformatting	2013-01-25 09:29:59 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Imran Rashid	905c720e5e	Merge branch 'master' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala	2013-01-22 12:02:27 -08:00
Tathagata Das	214345ceac	Fixed issue https://spark-project.atlassian.net/browse/STREAMING-29 , along with updates to doc comments in SparkContext.checkpoint().	2013-01-19 23:50:17 -08:00
Imran Rashid	d98caa0fa0	Merge remote-tracking branch 'dennybritz/blockmanagerUI' into blockmanager_ui Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala	2013-01-18 18:11:26 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00

1 2 3 4

171 commits