Matei Zaharia
c8eb8b2b90
Set class loader for remote actors to fix a bug that happens in 2.9
2011-07-14 17:29:11 -04:00
Matei Zaharia
e4c3402d2d
Renamed ParallelArray to ParallelCollection
2011-07-14 14:47:01 -04:00
Matei Zaharia
9ac461d85d
Remove RDD.toString because it looked confusing
2011-07-14 14:39:32 -04:00
Matei Zaharia
797b4547c3
Fix tracking of updates in accumulators to solve an issue that would manifest in the 2.9 interpreter
2011-07-14 14:08:34 -04:00
Matei Zaharia
0ccfe20755
Forgot to add a file
2011-07-14 12:42:50 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
2604939f64
Simplified and documented code a little and added test
2011-07-14 00:19:00 -04:00
Matei Zaharia
2439e51a03
Merge branch 'master' into implicit-sequencefile
2011-07-13 23:20:22 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
da8a3b8926
Increase default value of spark.locality.wait a little
2011-07-13 20:07:24 -04:00
Matei Zaharia
842e14d567
Added mapPartitions operation and a bunch of tests for RDD ops
2011-07-13 00:19:52 -04:00
Matei Zaharia
d05fea24f3
Simplified parallel shuffle fetcher to use URLConnection
2011-07-11 22:12:36 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00
Matei Zaharia
b7f1f62ff5
bug fix
2011-07-09 18:53:02 -04:00
Matei Zaharia
003480f374
Register byte[] with Kryo serializer
2011-07-09 18:08:07 -04:00
Matei Zaharia
aea5cb4413
Added parallel shuffle fetcher
2011-07-09 17:25:56 -04:00
Matei Zaharia
4b1646a25f
Support for non-filesystem-based Hadoop data sources
2011-07-06 20:37:55 -04:00
Matei Zaharia
3488c386a9
Initial work to make stuff like sequenceFile[Int, Int] work without
...
requiring the user to provide a Writable type. The approach here might
not be the best but it seems to work correctly.
2011-06-28 17:07:04 -07:00
Matei Zaharia
b0ecf1ee41
Don't pass a null context when running tasks locally
2011-06-27 22:50:43 -07:00
Matei Zaharia
2f652f1656
Fix a compile error
2011-06-27 18:07:16 -07:00
Tathagata Das
3f08e1129f
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das
ad842ac823
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia
c4dd68ae21
Merge branch 'mos-bt'
...
This merge keeps only the broadcast work in mos-bt because the structure
of shuffle has changed with the new RDD design. We still need some kind
of parallel shuffle but that will be added later.
Conflicts:
core/src/main/scala/spark/BitTorrentBroadcast.scala
core/src/main/scala/spark/ChainedBroadcast.scala
core/src/main/scala/spark/RDD.scala
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/Utils.scala
core/src/main/scala/spark/shuffle/BasicLocalFileShuffle.scala
core/src/main/scala/spark/shuffle/DfsShuffle.scala
2011-06-26 18:22:12 -07:00
Tathagata Das
38f2ba99cc
Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
...
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel
2e3531d8bf
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 11:00:51 +02:00
Tathagata Das
3d2befe831
Improved HadoopFileWriter (saves key and value classes to jobconf)
2011-06-23 08:11:22 -07:00
Matei Zaharia
214250016a
Added simple version of lookup
2011-06-20 11:59:16 -07:00
Matei Zaharia
23b1c309fb
Added pipe() operation on RDDs for mapping through a shell command.
2011-06-19 23:05:19 -07:00
Tathagata Das
b5e6645505
Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
...
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das
869836a2fa
Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task
2011-06-10 19:47:28 -07:00
Tathagata Das
389e56156f
HadoopFileWriter changed to use Hadoop's OutputCommitter
2011-06-09 15:29:22 -07:00
Tathagata Das
24d845833c
First-cut implementation of RDD.SaveAsText
2011-06-05 04:14:43 -07:00
Matei Zaharia
9bb448a151
Catch Throwable instead of Exception in LocalScheduler and Executor. Fixes #57 .
2011-06-01 11:45:47 -07:00
Matei Zaharia
850fe3274e
Make the runJob API public. Fixes #56 .
2011-06-01 11:38:44 -07:00
Matei Zaharia
5166d76843
Ensure logging is initialized before spawning any threads to fix issue #45
2011-05-31 23:47:32 -07:00
Matei Zaharia
8b0390d344
Instantiate NullWritable properly in HadoopFile
2011-05-30 23:54:14 -07:00
Matei Zaharia
c501cff924
Executor was looking for the wrong constructor for ExecutorClassLoader
2011-05-29 16:15:59 -07:00
Ismael Juma
1396678baa
Move REPL classes to separate module.
2011-05-27 11:22:50 +01:00
Ismael Juma
164ef4c751
Use explicit asInstanceOf instead of misleading unchecked pattern matching.
...
Also enable -unchecked warnings in SBT build file.
2011-05-27 07:57:10 +01:00
Ismael Juma
89c8ea2bb2
Replace deprecated -
and --
with suggested filterNot (which is uglier).
2011-05-26 22:22:37 +01:00
Ismael Juma
94f05683bd
Replace deprecated first
with head
.
2011-05-26 22:13:41 +01:00
Ismael Juma
0b6a862b68
Use math instead of Math as the latter is deprecated.
2011-05-26 22:06:36 +01:00
Ismael Juma
1f27d94c48
Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.
2011-05-26 22:04:42 +01:00
Ismael Juma
1993a8e556
Use += instead of + for mutable sequences as the latter is deprecated.
2011-05-26 21:59:48 +01:00
Matei Zaharia
cec427e777
Fixed a bug with preferred locations having changed meaning in new RDDs
2011-05-22 17:12:29 -07:00
Matei Zaharia
4c888b2933
Fix queue type for executor
2011-05-22 16:42:05 -07:00
Matei Zaharia
bea3a33012
doc tweak
2011-05-22 16:03:41 -07:00
Matei Zaharia
9bde5a54cb
class loader fix
2011-05-22 16:00:41 -07:00
Matei Zaharia
91c07a33d9
Various fixes to serialization
2011-05-21 22:50:08 -07:00
Matei Zaharia
f61b61c4ac
Merge branch 'master' into new-rdds
2011-05-21 21:25:58 -07:00