Justin Ma
b3517614d8
Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to
...
ensure that the proper keys will be generated when they cached.
2010-10-07 14:38:25 -07:00
Matei Zaharia
9f20b6b433
Added reduceByKey operation for RDDs containing pairs
2010-10-03 20:28:20 -07:00
root
34eccedbf5
Fixed a rather bad bug in HDFS files that has been in for a while:
...
caching was not working because Split objects did not have a
consistent toString value
2010-10-03 05:06:06 +00:00
Matei Zaharia
7090dea44b
Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster
2010-09-28 23:54:29 -07:00
Justin Ma
7a9ff1cc9a
- Got rid of 'Split' type parameter in RDD
...
- Added SampledRDD, SplitRDD and CartesianRDD
- Made Split a class rather than a type parameter
- Added numCores() to Scheduler to help set default level of parallelism
2010-08-31 12:08:09 -07:00
Justin Ma
ea8c2785dd
now we have sampling with replacement (at least on a per-split basis)
2010-08-18 15:59:35 -07:00
Justin Ma
156bccbe23
HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files
...
MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size
RDD.scala: added support for aggregating RDDs on a per-split basis
(aggregateSplit()) as well as for sampling without replacement (sample())
2010-08-18 15:25:57 -07:00
Matei Zaharia
b56ed67553
Updated code to work with Nexus->Mesos name change
2010-07-25 23:53:46 -04:00
Matei Zaharia
7d0eae17e3
Merge branch 'dev'
...
Conflicts:
src/scala/spark/HdfsFile.scala
src/scala/spark/NexusScheduler.scala
src/test/spark/repl/ReplSuite.scala
2010-06-27 15:21:54 -07:00
Matei Zaharia
323571a177
Initial work on union operation.
2010-06-18 12:54:33 -07:00
Matei Zaharia
cd247b7d86
Created common RDD superclass for distributed files and parallel arrays.
...
This also means that parallel arrays now get all the functionality files
used to have (filter, map, reduce, cache, etc).
2010-06-17 12:49:42 -07:00