Commit graph

2211 commits

Author SHA1 Message Date
tdas ae63972a89 Merge pull request #64 from mesos/td-rdd-save
Functionality to save RDDs to Hadoop files
2011-06-27 13:44:55 -07:00
Tathagata Das 3f08e1129f Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das ad842ac823 Merge branch 'master' into td-rdd-save
Conflicts:
	core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia bae8a97968 Merge branch 'master' into scala-2.9
Conflicts:
	repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia b187675b68 Print version number 0.3 in REPL 2011-06-26 18:27:01 -07:00
Matei Zaharia c4dd68ae21 Merge branch 'mos-bt'
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.

Conflicts:
	core/src/main/scala/spark/BitTorrentBroadcast.scala
	core/src/main/scala/spark/ChainedBroadcast.scala
	core/src/main/scala/spark/RDD.scala
	core/src/main/scala/spark/SparkContext.scala
	core/src/main/scala/spark/Utils.scala
	core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
	core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das 38f2ba99cc Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Matei Zaharia b626562d54 Merge pull request #63 from ogrisel/outer-join
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 12:22:15 -07:00
Olivier Grisel 2e3531d8bf Implemented RDD.leftOuterJoin and RDD.rightOuterJoin 2011-06-24 11:00:51 +02:00
Matei Zaharia 095dd9c444 Merge pull request #62 from ogrisel/cogroup-test
Add missing test for RDD.groupWith
2011-06-23 10:12:24 -07:00
Matei Zaharia e8e35d5fb5 Merge pull request #61 from ogrisel/better-readme
Better readme
2011-06-23 10:11:14 -07:00
Tathagata Das 3d2befe831 Improved HadoopFileWriter (saves key and value classes to jobconf) 2011-06-23 08:11:22 -07:00
Olivier Grisel 7ef48a4df0 typo 2011-06-23 02:28:17 +02:00
Olivier Grisel 5b9e0a126d format 2011-06-23 02:27:14 +02:00
Olivier Grisel 236bcd0d9b Markdown rendering for the toplevel README.md to improve readability on github 2011-06-23 02:24:04 +02:00
Olivier Grisel 005d1605a4 add missing test for RDD.groupWith 2011-06-23 02:10:52 +02:00
Matei Zaharia 214250016a Added simple version of lookup 2011-06-20 11:59:16 -07:00
Matei Zaharia 23b42af70a Merge branch 'master' into scala-2.9 2011-06-19 23:06:21 -07:00
Matei Zaharia 23b1c309fb Added pipe() operation on RDDs for mapping through a shell command. 2011-06-19 23:05:19 -07:00
Tathagata Das b5e6645505 Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das 869836a2fa Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task 2011-06-10 19:47:28 -07:00
Tathagata Das 389e56156f HadoopFileWriter changed to use Hadoop's OutputCommitter 2011-06-09 15:29:22 -07:00
Matei Zaharia c62bb4091b Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-07 00:42:23 -07:00
Matei Zaharia a413b8e59d Merge pull request #59 from ijuma/master
Move managedStyle to SparkProject
2011-06-07 00:41:50 -07:00
Tathagata Das 24d845833c First-cut implementation of RDD.SaveAsText 2011-06-05 04:14:43 -07:00
Ismael Juma 1ad4dcd3de Move managedStyle to SparkProject.
I had added it to DepJar by mistake.
2011-06-02 14:06:54 +01:00
Matei Zaharia 3297706ab2 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-06-01 11:46:31 -07:00
Matei Zaharia 9bb448a151 Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57. 2011-06-01 11:45:47 -07:00
Matei Zaharia 850fe3274e Make the runJob API public. Fixes #56. 2011-06-01 11:38:44 -07:00
Matei Zaharia 0e5dbf2abd Merge pull request #58 from ijuma/scala-2.9
Remove unnecessary toStream calls
2011-06-01 11:32:07 -07:00
Ismael Juma 82f10bd794 Remove unnecessary toStream calls. 2011-06-01 16:12:42 +01:00
Matei Zaharia b49d1be65b Ensure logging is initialized before any Spark threads run in the REPL 2011-05-31 23:54:48 -07:00
Matei Zaharia 10fe324845 Merge remote-tracking branch 'origin/master' into scala-2.9 2011-05-31 23:48:11 -07:00
Matei Zaharia 5166d76843 Ensure logging is initialized before spawning any threads to fix issue #45 2011-05-31 23:47:32 -07:00
Matei Zaharia 96daa31a01 Pass quoted arguments properly to run 2011-05-31 23:34:16 -07:00
Matei Zaharia 7862995569 Give SBT a bit more memory so it can do a update / compile / test in one JVM 2011-05-31 23:33:47 -07:00
Matei Zaharia 90f924202b Another fix ported forward for the REPL 2011-05-31 23:11:49 -07:00
Matei Zaharia 3854d23dd4 Pass quoted arguments properly to run 2011-05-31 22:17:48 -07:00
Matei Zaharia 8012e388a8 Give SBT a bit more memory so it can do a update / compile / test in one JVM 2011-05-31 22:17:33 -07:00
Ismael Juma 3def9fdb96 Upgrade to scalacheck 1.9. 2011-05-31 22:11:33 -07:00
Matei Zaharia 0afd35a8dd Some docs in ClosureCleaner 2011-05-31 22:06:30 -07:00
Matei Zaharia 73975d7491 Further fixes to interpreter (adding in some code generation changes I
missed before and setting SparkEnv properly on the threads that execute
each line in the 2.9 interpreter).
2011-05-31 22:05:24 -07:00
Matei Zaharia d52660c969 Ported code generation changes from 2.8 interpreter (to use a class for
each line's object rather than a singleton object so that we can ship
these classes to worker nodes). This is pretty hairy stuff, which would
be nice to avoid in the future by integrating with the interpreter some
other way.
2011-05-31 19:23:15 -07:00
Matei Zaharia beb9c117f0 Merge branch 'master' into scala-2.9
Conflicts:
	project/build/SparkProject.scala
2011-05-31 19:23:07 -07:00
Matei Zaharia bcce6e8d01 Various work to use the 2.9 interpreter 2011-05-31 17:31:51 -07:00
Matei Zaharia 8b0390d344 Instantiate NullWritable properly in HadoopFile 2011-05-30 23:54:14 -07:00
Matei Zaharia 4096c2287e Various fixes 2011-05-29 18:46:01 -07:00
Matei Zaharia ef706ae959 Merge branch 'master' into new-rdds-protobuf
Conflicts:
	run
2011-05-29 16:20:23 -07:00
Matei Zaharia c501cff924 Executor was looking for the wrong constructor for ExecutorClassLoader 2011-05-29 16:15:59 -07:00
Matei Zaharia 50ac1d2a40 Merge remote-tracking branch 'ijuma/issue51' 2011-05-29 15:41:12 -07:00