ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
root	34eccedbf5	Fixed a rather bad bug in HDFS files that has been in for a while: caching was not working because Split objects did not have a consistent toString value	2010-10-03 05:06:06 +00:00
Matei Zaharia	7090dea44b	Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster	2010-09-28 23:54:29 -07:00
Justin Ma	7a9ff1cc9a	- Got rid of 'Split' type parameter in RDD - Added SampledRDD, SplitRDD and CartesianRDD - Made Split a class rather than a type parameter - Added numCores() to Scheduler to help set default level of parallelism	2010-08-31 12:08:09 -07:00
Justin Ma	ea8c2785dd	now we have sampling with replacement (at least on a per-split basis)	2010-08-18 15:59:35 -07:00
Justin Ma	156bccbe23	HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size RDD.scala: added support for aggregating RDDs on a per-split basis (aggregateSplit()) as well as for sampling without replacement (sample())	2010-08-18 15:25:57 -07:00
Matei Zaharia	b56ed67553	Updated code to work with Nexus->Mesos name change	2010-07-25 23:53:46 -04:00
Matei Zaharia	7d0eae17e3	Merge branch 'dev' Conflicts: src/scala/spark/HdfsFile.scala src/scala/spark/NexusScheduler.scala src/test/spark/repl/ReplSuite.scala	2010-06-27 15:21:54 -07:00
Matei Zaharia	323571a177	Initial work on union operation.	2010-06-18 12:54:33 -07:00
Matei Zaharia	cd247b7d86	Created common RDD superclass for distributed files and parallel arrays. This also means that parallel arrays now get all the functionality files used to have (filter, map, reduce, cache, etc).	2010-06-17 12:49:42 -07:00

9 commits