Matei Zaharia
be622cf867
Formatting
2012-07-11 17:31:44 -07:00
Matei Zaharia
e8ae77df24
Added more methods for loading/saving with new Hadoop API
2012-07-11 17:31:33 -07:00
Matei Zaharia
e72afdb817
Some refactoring to make cluster scheduler pluggable.
2012-07-06 15:23:26 -07:00
Matei Zaharia
f58da6164e
Merge branch 'master' into dev
2012-06-15 23:47:11 -07:00
Matei Zaharia
a96558caa3
Performance improvements to shuffle operations: in particular, preserve
...
RDD partitioning in more cases where it's possible, and use iterators
instead of materializing collections when doing joins.
2012-06-09 14:44:18 -07:00
Matei Zaharia
63051dd2bc
Merge in engine improvements from the Spark Streaming project, developed
...
jointly with Tathagata Das and Haoyuan Li. This commit imports the changes
and ports them to Mesos 0.9, but does not yet pass unit tests due to
various classes not supporting a graceful stop() yet.
2012-06-07 12:45:38 -07:00
Matei Zaharia
048276799a
Commit task outputs to Hadoop-supported storage systems in parallel on the
...
cluster instead of on the master. Fixes #110 .
2012-06-06 16:46:53 -07:00
Reynold Xin
42dcdbcb2f
Removed the extra spaces in OrderedRDDFunctions and SortedRDD.
2012-03-29 15:21:57 -07:00
Antonio
620798161b
Added fixes to sorting
2012-02-13 00:07:39 -08:00
Antonio
e93f622665
Added sorting by key for pair RDDs
2012-02-11 00:56:28 -08:00
haoyuan
194c42ab79
Code format.
2012-02-10 08:19:53 -08:00
Charles Reiss
02d43e6986
Add new Hadoop API writing support.
2011-12-01 14:01:28 -08:00
root
3a0e6c4363
Miscellaneous fixes:
...
- Executor should initialize logging properly
- groupByKey should allow custom partitioner
2011-10-17 18:07:35 +00:00
Ankur Dave
2d7057bf5d
Implement PairRDDFunctions.partitionBy
2011-10-09 15:52:09 -07:00
Ankur Dave
06637cb69e
Fix PairRDDFunctions.groupWith partitioning
...
This commit fixes a bug in groupWith that was causing it to destroy
partitioning information. It replaces a call to map with a call to
mapValues, which preserves partitioning.
2011-10-09 15:48:46 -07:00
Ankur Dave
2911a783d6
Add custom partitioner support to PairRDDFunctions.combineByKey
2011-10-09 15:47:20 -07:00
Ismael Juma
0fba22b3d2
Fix issue #65 : Change @serializable to extends Serializable in 2.9 branch
...
Note that we use scala.Serializable introduced in Scala 2.9 instead of
java.io.Serializable. Also, case classes inherit from scala.Serializable by
default.
2011-08-02 10:16:33 +01:00
Matei Zaharia
38f38dda5b
Merge branch 'master' into scala-2.9
2011-07-14 12:42:02 -04:00
Matei Zaharia
969644df8e
Cleaned up a few issues to do with default parallelism levels. Also
...
renamed HadoopFileWriter to HadoopWriter (since it's not only for files)
and fixed a bug for lookup().
2011-07-14 12:40:56 -04:00
Matei Zaharia
d0c7958364
Merge branch 'master' into scala-2.9
...
Conflicts:
core/src/main/scala/spark/HadoopFileWriter.scala
2011-07-13 23:09:33 -04:00
Matei Zaharia
9c0069188b
Updated save code to allow non-file-based OutputFormats and added a test
...
for file-related stuff
2011-07-13 23:04:06 -04:00
Matei Zaharia
25c3a7781c
Moved PairRDD and SequenceFileRDD functions to separate source files
2011-07-10 00:06:15 -04:00