ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	0ccfe20755	Forgot to add a file	2011-07-14 12:42:50 -04:00
Matei Zaharia	38f38dda5b	Merge branch 'master' into scala-2.9	2011-07-14 12:42:02 -04:00
Matei Zaharia	969644df8e	Cleaned up a few issues to do with default parallelism levels. Also renamed HadoopFileWriter to HadoopWriter (since it's not only for files) and fixed a bug for lookup().	2011-07-14 12:40:56 -04:00
Matei Zaharia	2fb906e8e5	Merge branch 'master' into scala-2.9	2011-07-14 00:20:14 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	2439e51a03	Merge branch 'master' into implicit-sequencefile	2011-07-13 23:20:22 -04:00
Matei Zaharia	d0c7958364	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/HadoopFileWriter.scala	2011-07-13 23:09:33 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	da8a3b8926	Increase default value of spark.locality.wait a little	2011-07-13 20:07:24 -04:00
Matei Zaharia	080869c6ef	Merge branch 'master' into scala-2.9	2011-07-13 00:20:08 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Matei Zaharia	9b568d37f7	Merge branch 'master' into scala-2.9 Conflicts: core/src/main/scala/spark/RDD.scala	2011-07-11 22:25:53 -04:00
Matei Zaharia	d05fea24f3	Simplified parallel shuffle fetcher to use URLConnection	2011-07-11 22:12:36 -04:00
Matei Zaharia	25c3a7781c	Moved PairRDD and SequenceFileRDD functions to separate source files	2011-07-10 00:06:15 -04:00
Matei Zaharia	b7f1f62ff5	bug fix	2011-07-09 18:53:02 -04:00
Matei Zaharia	003480f374	Register byte[] with Kryo serializer	2011-07-09 18:08:07 -04:00
Matei Zaharia	aea5cb4413	Added parallel shuffle fetcher	2011-07-09 17:25:56 -04:00
Matei Zaharia	4b1646a25f	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:55 -04:00
Matei Zaharia	07a97d47c2	Support for non-filesystem-based Hadoop data sources	2011-07-06 20:37:34 -04:00
Matei Zaharia	3488c386a9	Initial work to make stuff like sequenceFile[Int, Int] work without requiring the user to provide a Writable type. The approach here might not be the best but it seems to work correctly.	2011-06-28 17:07:04 -07:00
Matei Zaharia	5633299ec6	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-27 22:50:59 -07:00
Matei Zaharia	b0ecf1ee41	Don't pass a null context when running tasks locally	2011-06-27 22:50:43 -07:00
Matei Zaharia	85cad5d9dd	Fixed HadoopFileWriter to compile for Scala 2.9	2011-06-27 22:44:14 -07:00
Matei Zaharia	393607d5ef	Merge branch 'master' into scala-2.9	2011-06-27 18:08:25 -07:00
Matei Zaharia	2f652f1656	Fix a compile error	2011-06-27 18:07:16 -07:00
tdas	ae63972a89	Merge pull request #64 from mesos/td-rdd-save Functionality to save RDDs to Hadoop files	2011-06-27 13:44:55 -07:00
Tathagata Das	3f08e1129f	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/SparkContext.scala	2011-06-27 13:43:44 -07:00
Tathagata Das	ad842ac823	Merge branch 'master' into td-rdd-save Conflicts: core/src/main/scala/spark/RDD.scala	2011-06-27 13:39:11 -07:00
Matei Zaharia	bae8a97968	Merge branch 'master' into scala-2.9 Conflicts: repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala	2011-06-26 19:22:27 -07:00
Matei Zaharia	b187675b68	Print version number 0.3 in REPL	2011-06-26 18:27:01 -07:00
Matei Zaharia	c4dd68ae21	Merge branch 'mos-bt' This merge keeps only the broadcast work in mos-bt because the structure of shuffle has changed with the new RDD design. We still need some kind of parallel shuffle but that will be added later. Conflicts: core/src/main/scala/spark/BitTorrentBroadcast.scala core/src/main/scala/spark/ChainedBroadcast.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/Utils.scala core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala core/src/main/scala/spark/shuffle/DfsShuffle.scala	2011-06-26 18:22:12 -07:00
Tathagata Das	38f2ba99cc	Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles. 1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type) 2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options 3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile 4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile 5> SparkContext.objectFile() opens the saved ObjectFiles	2011-06-24 19:51:21 -07:00
Matei Zaharia	b626562d54	Merge pull request #63 from ogrisel/outer-join Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 12:22:15 -07:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Matei Zaharia	095dd9c444	Merge pull request #62 from ogrisel/cogroup-test Add missing test for RDD.groupWith	2011-06-23 10:12:24 -07:00
Matei Zaharia	e8e35d5fb5	Merge pull request #61 from ogrisel/better-readme Better readme	2011-06-23 10:11:14 -07:00
Tathagata Das	3d2befe831	Improved HadoopFileWriter (saves key and value classes to jobconf)	2011-06-23 08:11:22 -07:00
Olivier Grisel	7ef48a4df0	typo	2011-06-23 02:28:17 +02:00
Olivier Grisel	5b9e0a126d	format	2011-06-23 02:27:14 +02:00
Olivier Grisel	236bcd0d9b	Markdown rendering for the toplevel README.md to improve readability on github	2011-06-23 02:24:04 +02:00
Olivier Grisel	005d1605a4	add missing test for RDD.groupWith	2011-06-23 02:10:52 +02:00
Matei Zaharia	214250016a	Added simple version of lookup	2011-06-20 11:59:16 -07:00
Matei Zaharia	23b42af70a	Merge branch 'master' into scala-2.9	2011-06-19 23:06:21 -07:00
Matei Zaharia	23b1c309fb	Added pipe() operation on RDDs for mapping through a shell command.	2011-06-19 23:05:19 -07:00
Tathagata Das	b5e6645505	Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext. 1> HadoopFileWriter works correctly with task failures 2> It can also take an user specified JobConf object for configuration settings 3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class 4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter	2011-06-16 20:57:57 -07:00
Tathagata Das	869836a2fa	Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task	2011-06-10 19:47:28 -07:00
Tathagata Das	389e56156f	HadoopFileWriter changed to use Hadoop's OutputCommitter	2011-06-09 15:29:22 -07:00
Matei Zaharia	c62bb4091b	Merge remote-tracking branch 'origin/master' into scala-2.9	2011-06-07 00:42:23 -07:00
Matei Zaharia	a413b8e59d	Merge pull request #59 from ijuma/master Move managedStyle to SparkProject	2011-06-07 00:41:50 -07:00
Tathagata Das	24d845833c	First-cut implementation of RDD.SaveAsText	2011-06-05 04:14:43 -07:00

... 61 62 63 64 65 ...

3636 commits