ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
seanm	a1662326e9	comment adjustment to takeOrdered	2013-07-12 08:38:19 -07:00
seanm	a2c915fba8	giving order to top and making tests more clear	2013-07-11 18:55:00 -07:00
seanm	24705d0f46	adding takeOrdered() to RDD	2013-07-10 10:33:11 -07:00
Matei Zaharia	652ea0f1d8	Allow RDD.takeSample to give samples bigger than the RDD Before, when withReplacement was set to true, we would not get a sample bigger than the RDD's count(). Conflicts: core/src/main/scala/spark/RDD.scala core/src/test/scala/spark/RDDSuite.scala	2013-07-05 11:15:13 -07:00
Matei Zaharia	6586c5e28b	Added a SparkContext accessor to RDD	2013-07-05 11:13:46 -07:00
Matei Zaharia	1ef5d0d2c9	Merge pull request #644 from shimingfei/joblogger add Joblogger to Spark (on new Spark code)	2013-06-22 09:35:57 -07:00
Mingfei	5240795154	edit according to comments	2013-06-21 17:38:23 +08:00
Gavin Li	0a2a9bce1e	fix typo and coding style	2013-06-18 21:30:13 +00:00
Gavin Li	4508089fc3	refine comments and add sc.clean	2013-06-17 05:23:46 +00:00
Gavin Li	e6ae049283	Merge remote-tracking branch 'upstream1/master' into enhance_pipe	2013-06-16 22:53:39 +00:00
Gavin Li	fb6d733fa8	update according to comments	2013-06-16 22:32:55 +00:00
Matei Zaharia	f961aac8b2	Merge pull request #649 from ryanlecompte/master Add top K method to RDD using a bounded priority queue	2013-06-15 00:53:41 -07:00
ryanlecompte	e8801d4490	use delegation for BoundedPriorityQueue, add Java API	2013-06-14 23:39:05 -07:00
ryanlecompte	44b8dbaede	use Iterator.single(elem) instead of Iterator(elem) for improved performance based on scaladocs	2013-06-13 16:23:15 -07:00
ryanlecompte	db5bca08ff	add a new top K method to RDD using a bounded priority queue	2013-06-12 10:54:16 -07:00
Patrick Wendell	d1bbcebae5	Adding compression to Hadoop save functions	2013-06-09 11:39:35 -07:00
Mingfei	1a4d93c025	modify to pass job annotation by localProperties and use daeamon thread to do joblogger's work	2013-06-08 14:23:39 +08:00
Gavin Li	e179ff8a32	update according to comments	2013-06-05 22:41:05 +00:00
Gavin Li	9f84315c05	enhance pipe to support what we can do in hadoop streaming	2013-06-01 00:26:10 +00:00
Reynold Xin	ba5e544461	More block manager cleanup. Implemented a removeRdd method in BlockManager, and use that to implement RDD.unpersist. Previously, unpersist needs to send B akka messages, where B = number of blocks. Now unpersist only needs to send W akka messages, where W = the number of workers.	2013-05-31 01:48:16 -07:00
Mark Hamstra	6e6b3e0d7e	Actually use the cleaned closure in foreachPartition	2013-05-10 13:02:34 -07:00
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
Shivaram Venkataraman	604d3bf56c	Rename partition class and add scala doc	2013-04-28 16:31:07 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Matei Zaharia	6962d40b44	Fix deprecated warning	2013-04-07 20:27:33 -04:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Stephen Haberman	00170eb0b9	Fix are/our typo.	2013-03-22 12:59:08 -05:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Mark Hamstra	ab33e27cc9	constructorOfA -> constructA in doc comments	2013-03-16 15:29:15 -07:00
Mark Hamstra	9784fc1fcd	fix wayward comma in doc comment	2013-03-16 15:25:02 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Stephen Haberman	63fe225587	Simplify SubtractedRDD in preparation from subtractByKey.	2013-03-13 17:17:34 -05:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	6cd68c31cb	Update default.parallelism docs, have StandaloneSchedulerBackend use it. Only brand new RDDs (e.g. parallelize and makeRDD) now use default parallelism, everything else uses their largest parent's partitioner or partition size.	2013-02-16 00:29:11 -06:00
Matei Zaharia	ea08537143	Fixed an exponential recursion that could happen with doCheckpoint due to lack of memoization	2013-02-11 13:23:50 -08:00
Matei Zaharia	f750daa510	Merge pull request #452 from stephenh/misc Add RDD.coalesce, clean up some RDDs, other misc.	2013-02-09 18:12:56 -08:00
Stephen Haberman	da52b16b38	Remove RDD.coalesce default arguments.	2013-02-09 10:11:54 -06:00

1 2 3 4

189 commits