Commit graph

41 commits

Author SHA1 Message Date
Ismael Juma 0fba22b3d2 Fix issue #65: Change @serializable to extends Serializable in 2.9 branch
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia 8ea67307b9 Merge branch 'master' into scala-2.9 2011-07-14 14:47:12 -04:00
Matei Zaharia 9ac461d85d Remove RDD.toString because it looked confusing 2011-07-14 14:39:32 -04:00
Matei Zaharia 38f38dda5b Merge branch 'master' into scala-2.9 2011-07-14 12:42:02 -04:00
Matei Zaharia 969644df8e Cleaned up a few issues to do with default parallelism levels. Also
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia d0c7958364 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia 9c0069188b Updated save code to allow non-file-based OutputFormats and added a test
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia 080869c6ef Merge branch 'master' into scala-2.9 2011-07-13 00:20:08 -04:00
Matei Zaharia 842e14d567 Added mapPartitions operation and a bunch of tests for RDD ops 2011-07-13 00:19:52 -04:00
Matei Zaharia 9b568d37f7 Merge branch 'master' into scala-2.9
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia 25c3a7781c Moved PairRDD and SequenceFileRDD functions to separate source files 2011-07-10 00:06:15 -04:00
Matei Zaharia 393607d5ef Merge branch 'master' into scala-2.9 2011-06-27 18:08:25 -07:00
Matei Zaharia 2f652f1656 Fix a compile error 2011-06-27 18:07:16 -07:00
Tathagata Das 3f08e1129f Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das ad842ac823 Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia bae8a97968 Merge branch 'master' into scala-2.9
Conflicts:
	repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Tathagata Das 38f2ba99cc Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel 2e3531d8bf Implemented RDD.leftOuterJoin and RDD.rightOuterJoin 2011-06-24 11:00:51 +02:00
Matei Zaharia 214250016a Added simple version of lookup 2011-06-20 11:59:16 -07:00
Matei Zaharia 23b42af70a Merge branch 'master' into scala-2.9 2011-06-19 23:06:21 -07:00
Matei Zaharia 23b1c309fb Added pipe() operation on RDDs for mapping through a shell command. 2011-06-19 23:05:19 -07:00
Tathagata Das b5e6645505 Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das 869836a2fa Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task 2011-06-10 19:47:28 -07:00
Tathagata Das 389e56156f HadoopFileWriter changed to use Hadoop's OutputCommitter 2011-06-09 15:29:22 -07:00
Tathagata Das 24d845833c First-cut implementation of RDD.SaveAsText 2011-06-05 04:14:43 -07:00
Ismael Juma 82f10bd794 Remove unnecessary toStream calls. 2011-06-01 16:12:42 +01:00
Ismael Juma 1f27d94c48 Use Array.iterator instead of Iterator.fromArray as the latter is deprecated. 2011-05-26 22:04:42 +01:00
Matei Zaharia cec427e777 Fixed a bug with preferred locations having changed meaning in new RDDs 2011-05-22 17:12:29 -07:00
Matei Zaharia 82329b0b28 Updated scheduler to support running on just some partitions of final RDD 2011-05-19 12:47:09 -07:00
Matei Zaharia fd1d255821 Stop objectifying various trackers, caches, etc. 2011-05-17 12:41:13 -07:00
Matei Zaharia 16c886a581 Optimization for count() 2011-05-13 10:41:34 -07:00
Matei Zaharia 94ba95bcb2 Added flatMapValues 2011-04-12 19:51:58 -07:00
Matei Zaharia 467f056e29 Remove commented code 2011-03-06 23:38:41 -08:00
Matei Zaharia bce95b8458 Finished cogroup stuff 2011-03-06 23:38:16 -08:00
Matei Zaharia 04c2d6a60c stuff 2011-03-06 19:27:03 -08:00
Matei Zaharia 9e59afd710 More work on new RDD design 2011-02-27 19:15:52 -08:00
Matei Zaharia f38f86d59e More stuff 2011-02-27 14:27:12 -08:00
Matei Zaharia 2e6023f2bf stuff 2011-02-26 23:41:44 -08:00
Matei Zaharia 309367c477 Initial work towards new RDD design 2011-02-26 23:15:33 -08:00
Matei Zaharia 99f3f23efa Changed default shuffle to LocalFileShuffle because it's way faster for small files 2011-02-08 17:03:03 -08:00
Matei Zaharia e5c4cd8a5e Made examples and core subprojects 2011-02-01 15:11:08 -08:00
Renamed from src/main/scala/spark/RDD.scala (Browse further)