Matei Zaharia
0ccfe20755
Forgot to add a file
2011-07-14 12:42:50 -04:00
Matei Zaharia
38f38dda5b
Merge branch 'master' into scala-2.9
2011-07-14 12:42:02 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
2fb906e8e5
Merge branch 'master' into scala-2.9
2011-07-14 00:20:14 -04:00
Matei Zaharia
2604939f64
Simplified and documented code a little and added test
2011-07-14 00:19:00 -04:00
Matei Zaharia
2439e51a03
Merge branch 'master' into implicit-sequencefile
2011-07-13 23:20:22 -04:00
Matei Zaharia
d0c7958364
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
da8a3b8926
Increase default value of spark.locality.wait a little
2011-07-13 20:07:24 -04:00
Matei Zaharia
080869c6ef
Merge branch 'master' into scala-2.9
2011-07-13 00:20:08 -04:00
Matei Zaharia
842e14d567
Added mapPartitions operation and a bunch of tests for RDD ops
2011-07-13 00:19:52 -04:00
Matei Zaharia
9b568d37f7
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia
d05fea24f3
Simplified parallel shuffle fetcher to use URLConnection
2011-07-11 22:12:36 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00
Matei Zaharia
b7f1f62ff5
bug fix
2011-07-09 18:53:02 -04:00
Matei Zaharia
003480f374
Register byte[] with Kryo serializer
2011-07-09 18:08:07 -04:00
Matei Zaharia
aea5cb4413
Added parallel shuffle fetcher
2011-07-09 17:25:56 -04:00
Matei Zaharia
4b1646a25f
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:55 -04:00
Matei Zaharia
07a97d47c2
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:34 -04:00
Matei Zaharia
3488c386a9
Initial work to make stuff like sequenceFile[Int, Int] work without
...
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia
5633299ec6
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-27 22:50:59 -07:00
Matei Zaharia
b0ecf1ee41
Don't pass a null context when running tasks locally
2011-06-27 22:50:43 -07:00
Matei Zaharia
85cad5d9dd
Fixed HadoopFileWriter to compile for Scala 2.9
2011-06-27 22:44:14 -07:00
Matei Zaharia
393607d5ef
Merge branch 'master' into scala-2.9
2011-06-27 18:08:25 -07:00
Matei Zaharia
2f652f1656
Fix a compile error
2011-06-27 18:07:16 -07:00
tdas
ae63972a89
Merge pull request #64 from mesos/td-rdd-save
...
Functionality to save RDDs to Hadoop files
2011-06-27 13:44:55 -07:00
Tathagata Das
3f08e1129f
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das
ad842ac823
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia
bae8a97968
Merge branch 'master' into scala-2.9
...
Conflicts:
repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Matei Zaharia
b187675b68
Print version number 0.3 in REPL
2011-06-26 18:27:01 -07:00
Matei Zaharia
c4dd68ae21
Merge branch 'mos-bt'
...
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das
38f2ba99cc
Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
...
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Matei Zaharia
b626562d54
Merge pull request #63 from ogrisel/outer-join
...
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 12:22:15 -07:00
Olivier Grisel
2e3531d8bf
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 11:00:51 +02:00
Matei Zaharia
095dd9c444
Merge pull request #62 from ogrisel/cogroup-test
...
Add missing test for RDD.groupWith
2011-06-23 10:12:24 -07:00
Matei Zaharia
e8e35d5fb5
Merge pull request #61 from ogrisel/better-readme
...
Better readme
2011-06-23 10:11:14 -07:00
Tathagata Das
3d2befe831
Improved HadoopFileWriter (saves key and value classes to jobconf)
2011-06-23 08:11:22 -07:00
Olivier Grisel
7ef48a4df0
typo
2011-06-23 02:28:17 +02:00
Olivier Grisel
5b9e0a126d
format
2011-06-23 02:27:14 +02:00
Olivier Grisel
236bcd0d9b
Markdown rendering for the toplevel README.md to improve readability on github
2011-06-23 02:24:04 +02:00
Olivier Grisel
005d1605a4
add missing test for RDD.groupWith
2011-06-23 02:10:52 +02:00
Matei Zaharia
214250016a
Added simple version of lookup
2011-06-20 11:59:16 -07:00
Matei Zaharia
23b42af70a
Merge branch 'master' into scala-2.9
2011-06-19 23:06:21 -07:00
Matei Zaharia
23b1c309fb
Added pipe() operation on RDDs for mapping through a shell command.
2011-06-19 23:05:19 -07:00
Tathagata Das
b5e6645505
Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
...
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das
869836a2fa
Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task
2011-06-10 19:47:28 -07:00
Tathagata Das
389e56156f
HadoopFileWriter changed to use Hadoop's OutputCommitter
2011-06-09 15:29:22 -07:00
Matei Zaharia
c62bb4091b
Merge remote-tracking branch 'origin/master' into scala-2.9
2011-06-07 00:42:23 -07:00
Matei Zaharia
a413b8e59d
Merge pull request #59 from ijuma/master
...
Move managedStyle to SparkProject
2011-06-07 00:41:50 -07:00
Tathagata Das
24d845833c
First-cut implementation of RDD.SaveAsText
2011-06-05 04:14:43 -07:00