ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	1ef5d0d2c9	Merge pull request #644 from shimingfei/joblogger add Joblogger to Spark (on new Spark code)	2013-06-22 09:35:57 -07:00
Mingfei	5240795154	edit according to comments	2013-06-21 17:38:23 +08:00
Gavin Li	0a2a9bce1e	fix typo and coding style	2013-06-18 21:30:13 +00:00
Gavin Li	4508089fc3	refine comments and add sc.clean	2013-06-17 05:23:46 +00:00
Gavin Li	e6ae049283	Merge remote-tracking branch 'upstream1/master' into enhance_pipe	2013-06-16 22:53:39 +00:00
Gavin Li	fb6d733fa8	update according to comments	2013-06-16 22:32:55 +00:00
Matei Zaharia	f961aac8b2	Merge pull request #649 from ryanlecompte/master Add top K method to RDD using a bounded priority queue	2013-06-15 00:53:41 -07:00
ryanlecompte	e8801d4490	use delegation for BoundedPriorityQueue, add Java API	2013-06-14 23:39:05 -07:00
ryanlecompte	44b8dbaede	use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs	2013-06-13 16:23:15 -07:00
ryanlecompte	db5bca08ff	add a new top K method to RDD using a bounded priority queue	2013-06-12 10:54:16 -07:00
Patrick Wendell	d1bbcebae5	Adding compression to Hadoop save functions	2013-06-09 11:39:35 -07:00
Mingfei	1a4d93c025	modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work	2013-06-08 14:23:39 +08:00
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Reynold Xin	ba5e544461	More block manager cleanup. Implemented a removeRdd method in BlockManager, and use that to implement RDD.unpersist. Previously, unpersist needs to send B akka messages, where B = number of blocks. Now unpersist only needs to send W akka messages, where W = the number of workers.	2013-05-31 01:48:16 -07:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00
Mark Hamstra	b8863a79d3	Merge branch 'master' of https://github.com/mesos/spark into commutative Conflicts: core/src/main/scala/spark/RDD.scala	2013-02-08 18:26:00 -08:00
Mark Hamstra	934a53c8b6	Change docs on 'reduce' since the merging of local reduces no longer preserves ordering, so the reduce function must also be commutative.	2013-02-05 22:19:58 -08:00
Stephen Haberman	a9c8d53cfa	Clean up RDDs, mainly to use getSplits. Also made sure clearDependencies() was calling super, to ensure the getSplits/getDependencies vars in the RDD base class get cleaned up.	2013-02-05 22:16:59 -06:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00

1 2 3 4

184 commits