ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	93a200bc7e	Renamed aggregateSplit() to splitRdd(), plus some style fixes	2010-10-23 15:34:03 -07:00
Matei Zaharia	023ed194b4	Fixed some whitespace	2010-10-16 21:21:16 -07:00
Matei Zaharia	0e2adecdab	Simplified UnionRDD slightly and added a SparkContext.union method for efficiently union-ing a large number of RDDs	2010-10-16 17:13:52 -07:00
Matei Zaharia	630a982b88	Added a getId method to split to force classes to specify a unique ID for each split. This replaces the previous method of calling split.toString, which would produce different results for the same split each time it is deserialized (because the default implementation returns the Java object's address).	2010-10-07 17:17:07 -07:00
Justin Ma	b3517614d8	Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to ensure that the proper keys will be generated when they cached.	2010-10-07 14:38:25 -07:00
Matei Zaharia	9f20b6b433	Added reduceByKey operation for RDDs containing pairs	2010-10-03 20:28:20 -07:00
root	34eccedbf5	Fixed a rather bad bug in HDFS files that has been in for a while: caching was not working because Split objects did not have a consistent toString value	2010-10-03 05:06:06 +00:00
Matei Zaharia	7090dea44b	Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster	2010-09-28 23:54:29 -07:00
Justin Ma	7a9ff1cc9a	- Got rid of 'Split' type parameter in RDD - Added SampledRDD, SplitRDD and CartesianRDD - Made Split a class rather than a type parameter - Added numCores() to Scheduler to help set default level of parallelism	2010-08-31 12:08:09 -07:00
Justin Ma	ea8c2785dd	now we have sampling with replacement (at least on a per-split basis)	2010-08-18 15:59:35 -07:00
Justin Ma	156bccbe23	HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size RDD.scala: added support for aggregating RDDs on a per-split basis (aggregateSplit()) as well as for sampling without replacement (sample())	2010-08-18 15:25:57 -07:00
Matei Zaharia	b56ed67553	Updated code to work with Nexus->Mesos name change	2010-07-25 23:53:46 -04:00
Matei Zaharia	7d0eae17e3	Merge branch 'dev' Conflicts: src/scala/spark/HdfsFile.scala src/scala/spark/NexusScheduler.scala src/test/spark/repl/ReplSuite.scala	2010-06-27 15:21:54 -07:00
Matei Zaharia	323571a177	Initial work on union operation.	2010-06-18 12:54:33 -07:00
Matei Zaharia	cd247b7d86	Created common RDD superclass for distributed files and parallel arrays. This also means that parallel arrays now get all the functionality files used to have (filter, map, reduce, cache, etc).	2010-06-17 12:49:42 -07:00

15 commits