ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	348bcbca1f	Added a method to RDD to expose the ClassManifest.	2012-09-24 16:56:27 -07:00
Matei Zaharia	2d761e3353	Ported performance and FT improvements from latest streaming work	2012-09-12 14:54:40 -07:00
Matei Zaharia	6601a6212b	Added a unit test for cross-partition balancing in sort, and changes to RangePartitioner to make it pass. It turns out that the first partition was always kind of small due to how we picked partition boundaries.	2012-08-03 16:40:45 -04:00
Matei Zaharia	400221f851	Merge branch 'dev' of git://github.com/tdas/spark into dev	2012-07-30 13:54:57 -07:00
Tathagata Das	cf429699e1	Updated the new checkpoint RDD to remember partitioning of the original RDD.	2012-07-27 23:16:37 +00:00
Tathagata Das	024905f682	Added BlockRDD and a first-cut version of checkpoint() to RDD class.	2012-07-27 12:00:49 -07:00
Josh Rosen	e23938c3be	Use mapValues() in JavaPairRDD.cogroupResultToJava().	2012-07-22 15:10:01 -07:00
Josh Rosen	01dce3f569	Add Java API Add distinct() method to RDD. Fix bug in DoubleRDDFunctions.	2012-07-18 17:34:29 -07:00
Matei Zaharia	c53670b9bf	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
Matei Zaharia	f58da6164e	Merge branch 'master' into dev	2012-06-15 23:47:11 -07:00
Matei Zaharia	a96558caa3	Performance improvements to shuffle operations: in particular, preserve RDD partitioning in more cases where it's possible, and use iterators instead of materializing collections when doing joins.	2012-06-09 14:44:18 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Reynold Xin	d0c6e9f639	Made some RDD dependencies transient to reduce the amount of data needed to be serialized in closure serialization. This can significantly reduce the task setup time in Shark when the query involves a large number of (Hive) partitions.	2012-05-16 14:16:55 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	335a6036ad	Converted some tabs to spaces	2012-04-05 11:58:01 -07:00
haoyuan	194c42ab79	Code format.	2012-02-10 08:19:53 -08:00
haoyuan	445e0bb1b5	Format the code a bit mroe.	2012-02-09 15:50:26 -08:00
haoyuan	651932e703	Format the code as coding style agreed by Matei/TD/Haoyuan	2012-02-09 13:26:23 -08:00
Matei Zaharia	fabcc82528	Merge pull request #103 from edisontung/master Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans	2012-01-13 19:20:03 -08:00
Edison Tung	1ecc221f84	Fixed bugs I've fixed the bugs detailed in the diff. One of the bugs was already fixed on the local file (forgot to commit).	2012-01-09 11:59:52 -08:00
Edison Tung	42f8847a21	Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD	2011-12-01 13:43:25 -08:00
Edison Tung	de01b6deaa	Fixed bug in RDD Math.min takes 2 args, not 1. This was not committed earlier for some reason	2011-12-01 13:34:37 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Edison Tung	a3bc012af8	added takeSamples method takeSamples method takes a specified number of samples from the RDD and outputs it in an array.	2011-11-21 16:38:44 -08:00
Ismael Juma	0fba22b3d2	Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch Note that we use scala.Serializable introduced in Scala 2.9 instead of java.io.Serializable. Also, case classes inherit from scala.Serializable by default.	2011-08-02 10:16:33 +01:00
Matei Zaharia	8ea67307b9	Merge branch 'master' into scala-2.9	2011-07-14 14:47:12 -04:00
Matei Zaharia	9ac461d85d	Remove RDD.toString because it looked confusing	2011-07-14 14:39:32 -04:00
Matei Zaharia	38f38dda5b	Merge branch 'master' into scala-2.9	2011-07-14 12:42:02 -04:00
Matei Zaharia	969644df8e	Cleaned up a few issues to do with default parallelism levels. Also renamed HadoopFileWriter to HadoopWriter (since it's not only for files) and fixed a bug for lookup().	2011-07-14 12:40:56 -04:00
Matei Zaharia	d0c7958364	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/HadoopFileWriter.scala	2011-07-13 23:09:33 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	080869c6ef	Merge branch 'master' into scala-2.9	2011-07-13 00:20:08 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Matei Zaharia	9b568d37f7	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/RDD.scala	2011-07-11 22:25:53 -04:00
Matei Zaharia	25c3a7781c	Moved PairRDD and SequenceFileRDD functions to separate source files	2011-07-10 00:06:15 -04:00
Matei Zaharia	393607d5ef	Merge branch 'master' into scala-2.9	2011-06-27 18:08:25 -07:00
Matei Zaharia	2f652f1656	Fix a compile error	2011-06-27 18:07:16 -07:00
Tathagata Das	3f08e1129f	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/SparkContext.scala	2011-06-27 13:43:44 -07:00
Tathagata Das	ad842ac823	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/RDD.scala	2011-06-27 13:39:11 -07:00
Matei Zaharia	bae8a97968	Merge branch 'master' into scala-2.9 Conflicts: repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-06-26 19:22:27 -07:00
Tathagata Das	38f2ba99cc	Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles. 1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type) 2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options 3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile 4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile 5> SparkContext.objectFile() opens the saved ObjectFiles	2011-06-24 19:51:21 -07:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Matei Zaharia	214250016a	Added simple version of lookup	2011-06-20 11:59:16 -07:00
Matei Zaharia	23b42af70a	Merge branch 'master' into scala-2.9	2011-06-19 23:06:21 -07:00
Matei Zaharia	23b1c309fb	Added pipe() operation on RDDs for mapping through a shell command.	2011-06-19 23:05:19 -07:00
Tathagata Das	b5e6645505	Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext. 1> HadoopFileWriter works correctly with task failures 2> It can also take an user specified JobConf object for configuration settings 3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class 4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter	2011-06-16 20:57:57 -07:00
Tathagata Das	869836a2fa	Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task	2011-06-10 19:47:28 -07:00
Tathagata Das	389e56156f	HadoopFileWriter changed to use Hadoop's OutputCommitter	2011-06-09 15:29:22 -07:00
Tathagata Das	24d845833c	First-cut implementation of RDD.SaveAsText	2011-06-05 04:14:43 -07:00
Ismael Juma	82f10bd794	Remove unnecessary toStream calls.	2011-06-01 16:12:42 +01:00

1 2

65 commits