Matei Zaharia
335a6036ad
Converted some tabs to spaces
2012-04-05 11:58:01 -07:00
haoyuan
194c42ab79
Code format.
2012-02-10 08:19:53 -08:00
haoyuan
445e0bb1b5
Format the code a bit mroe.
2012-02-09 15:50:26 -08:00
haoyuan
651932e703
Format the code as coding style agreed by Matei/TD/Haoyuan
2012-02-09 13:26:23 -08:00
Matei Zaharia
fabcc82528
Merge pull request #103 from edisontung/master
...
Made improvements to takeSample. Also changed SparkLocalKMeans to SparkKMeans
2012-01-13 19:20:03 -08:00
Edison Tung
1ecc221f84
Fixed bugs
...
I've fixed the bugs detailed in the diff. One of the bugs was already
fixed on the local file (forgot to commit).
2012-01-09 11:59:52 -08:00
Edison Tung
42f8847a21
Revert de01b6deaaee1b43321e0aac330f4a98c0ea61c6^..HEAD
2011-12-01 13:43:25 -08:00
Edison Tung
de01b6deaa
Fixed bug in RDD
...
Math.min takes 2 args, not 1. This was not committed earlier for some
reason
2011-12-01 13:34:37 -08:00
Matei Zaharia
22b8fcf632
Added fold() and aggregate() operations that reuse an object to
...
merge results into rather than requiring a new object allocation
for each element merged. Fixes #95 .
2011-11-30 11:37:47 -08:00
Edison Tung
a3bc012af8
added takeSamples method
...
takeSamples method takes a specified number of samples from the RDD and
outputs it in an array.
2011-11-21 16:38:44 -08:00
Ismael Juma
0fba22b3d2
Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch
...
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia
8ea67307b9
Merge branch 'master' into scala-2.9
2011-07-14 14:47:12 -04:00
Matei Zaharia
9ac461d85d
Remove RDD.toString because it looked confusing
2011-07-14 14:39:32 -04:00
Matei Zaharia
38f38dda5b
Merge branch 'master' into scala-2.9
2011-07-14 12:42:02 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
d0c7958364
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
080869c6ef
Merge branch 'master' into scala-2.9
2011-07-13 00:20:08 -04:00
Matei Zaharia
842e14d567
Added mapPartitions operation and a bunch of tests for RDD ops
2011-07-13 00:19:52 -04:00
Matei Zaharia
9b568d37f7
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-07-11 22:25:53 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00
Matei Zaharia
393607d5ef
Merge branch 'master' into scala-2.9
2011-06-27 18:08:25 -07:00
Matei Zaharia
2f652f1656
Fix a compile error
2011-06-27 18:07:16 -07:00
Tathagata Das
3f08e1129f
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
2011-06-27 13:43:44 -07:00
Tathagata Das
ad842ac823
Merge branch 'master' into td-rdd-save
...
Conflicts:
core/src/main/scala/spark/RDD.scala
2011-06-27 13:39:11 -07:00
Matei Zaharia
bae8a97968
Merge branch 'master' into scala-2.9
...
Conflicts:
repl/src/main/scala/spark/repl/SparkInterpreterLoop.scala
2011-06-26 19:22:27 -07:00
Tathagata Das
38f2ba99cc
Further changes to HadoopFileWriter. Implemented ability to save RDDs as SequenceFiles and ObjectFiles.
...
1> HadoopFileWriter changed to take class types as constructor parameters (no more generic type)
2> Multiple types of RDD.saveAsHadoopFile() implemented to provide more saving options
3> RDD.saveAsSequenceFile() automatically converts basic types to Writable types before saving as SequenceFile
4> RDD.saveAsObjectFile() serializes objects and saves them to a ObjectFile
5> SparkContext.objectFile() opens the saved ObjectFiles
2011-06-24 19:51:21 -07:00
Olivier Grisel
2e3531d8bf
Implemented RDD.leftOuterJoin and RDD.rightOuterJoin
2011-06-24 11:00:51 +02:00
Matei Zaharia
214250016a
Added simple version of lookup
2011-06-20 11:59:16 -07:00
Matei Zaharia
23b42af70a
Merge branch 'master' into scala-2.9
2011-06-19 23:06:21 -07:00
Matei Zaharia
23b1c309fb
Added pipe() operation on RDDs for mapping through a shell command.
2011-06-19 23:05:19 -07:00
Tathagata Das
b5e6645505
Cleaner reimplementation of HadoopFileWriter. Introduced TaskContext.
...
1> HadoopFileWriter works correctly with task failures
2> It can also take an user specified JobConf object for configuration settings
3> A Task can now get information like stage ID, split ID, and attempt ID using TaskContext class
4> Minor changes in SparkContext, DAGScheduler and subclasses to allow specification of TaskContext as a parameter
2011-06-16 20:57:57 -07:00
Tathagata Das
869836a2fa
Implemented TaskContext to hold contextual information (jobID, taskID, attemptID) of a task
2011-06-10 19:47:28 -07:00
Tathagata Das
389e56156f
HadoopFileWriter changed to use Hadoop's OutputCommitter
2011-06-09 15:29:22 -07:00
Tathagata Das
24d845833c
First-cut implementation of RDD.SaveAsText
2011-06-05 04:14:43 -07:00
Ismael Juma
82f10bd794
Remove unnecessary toStream calls.
2011-06-01 16:12:42 +01:00
Ismael Juma
1f27d94c48
Use Array.iterator instead of Iterator.fromArray as the latter is deprecated.
2011-05-26 22:04:42 +01:00
Matei Zaharia
cec427e777
Fixed a bug with preferred locations having changed meaning in new RDDs
2011-05-22 17:12:29 -07:00
Matei Zaharia
82329b0b28
Updated scheduler to support running on just some partitions of final RDD
2011-05-19 12:47:09 -07:00
Matei Zaharia
fd1d255821
Stop objectifying various trackers, caches, etc.
2011-05-17 12:41:13 -07:00
Matei Zaharia
16c886a581
Optimization for count()
2011-05-13 10:41:34 -07:00
Matei Zaharia
94ba95bcb2
Added flatMapValues
2011-04-12 19:51:58 -07:00
Matei Zaharia
467f056e29
Remove commented code
2011-03-06 23:38:41 -08:00
Matei Zaharia
bce95b8458
Finished cogroup stuff
2011-03-06 23:38:16 -08:00
Matei Zaharia
04c2d6a60c
stuff
2011-03-06 19:27:03 -08:00
Matei Zaharia
9e59afd710
More work on new RDD design
2011-02-27 19:15:52 -08:00
Matei Zaharia
f38f86d59e
More stuff
2011-02-27 14:27:12 -08:00
Matei Zaharia
2e6023f2bf
stuff
2011-02-26 23:41:44 -08:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Matei Zaharia
99f3f23efa
Changed default shuffle to LocalFileShuffle because it's way faster for small files
2011-02-08 17:03:03 -08:00