ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Justin Ma	4cbca25f49	Merge branch 'master' into jtma-accumulator	2010-10-07 14:39:54 -07:00
Justin Ma	b3517614d8	Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to ensure that the proper keys will be generated when they cached.	2010-10-07 14:38:25 -07:00
Matei Zaharia	0195ee5ed8	Merge branch 'master' into matei-scheduling	2010-10-05 14:26:20 -07:00
Matei Zaharia	a41ca20375	Added splitWords function in Utils	2010-10-04 12:01:05 -07:00
Matei Zaharia	9f20b6b433	Added reduceByKey operation for RDDs containing pairs	2010-10-03 20:28:20 -07:00
Matei Zaharia	a826294c3a	Merge branch 'master' into matei-scheduling	2010-10-03 13:28:06 -07:00
Matei Zaharia	aef9e5b98c	Renamed ParallelOperation to Job	2010-10-03 13:28:01 -07:00
root	34eccedbf5	Fixed a rather bad bug in HDFS files that has been in for a while: caching was not working because Split objects did not have a consistent toString value	2010-10-03 05:06:06 +00:00
Matei Zaharia	b6debf5da1	Merge branch 'matei-logging'	2010-09-29 10:59:01 -07:00
Matei Zaharia	f50b23b825	Increase default locality wait to 3s. Fixes #20 .	2010-09-29 10:04:00 -07:00
Matei Zaharia	a7c0e2a7c3	Made task-finished log messages slightly nicer	2010-09-29 00:22:11 -07:00
Matei Zaharia	40f69140b6	Made spark-executor output slightly nicer	2010-09-29 00:22:09 -07:00
Matei Zaharia	0d28bdcefd	A couple of minor fixes: - Don't include trailing $'s in class names of Scala objects - Report errors using logError instead of printStackTrace	2010-09-29 00:10:46 -07:00
Matei Zaharia	0fa70a6770	Updated log4j.properties to ignore jetty messages below WARN level	2010-09-28 23:58:19 -07:00
Matei Zaharia	7090dea44b	Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster	2010-09-28 23:54:29 -07:00
Matei Zaharia	516248aa66	Added log4j.properties	2010-09-28 23:22:39 -07:00
Matei Zaharia	332c8b8c22	Removed Hadoop's SLF4J jars	2010-09-28 23:16:28 -07:00
Matei Zaharia	db623defbe	Added Logging trait	2010-09-28 23:12:23 -07:00
Matei Zaharia	c7d233b911	Added log4j jars and paths	2010-09-28 23:08:01 -07:00
Matei Zaharia	e5e9edeeb3	Merge branch 'http-repl-class-serving'	2010-09-28 22:43:04 -07:00
Matei Zaharia	e068f21e01	More work on HTTP class loading	2010-09-28 22:32:38 -07:00
Matei Zaharia	7ef3a20a0c	Modified the interpreter to serve classes to the executors using a Jetty HTTP server instead of a shared (NFS) file system.	2010-09-28 17:55:11 -07:00
Justin Ma	b749f0e209	fixed typo in printing which task is already finished	2010-09-28 17:28:54 -07:00
Justin Ma	b7ce592bec	changes to accumulator to add objects in-place.	2010-09-25 14:37:25 -07:00
Justin Ma	366c09c47b	Let's use future instead of actors	2010-09-13 15:30:22 -07:00
Justin Ma	0896fd6219	Added fork()/join() operations for SparkContext, as well as corresponding changes to MesosScheduler to support multiple ParallelOperations.	2010-09-12 09:01:44 -07:00
Justin Ma	6f0d2c1cbc	round robin scheduling of tasks has been added	2010-09-07 14:03:59 -07:00
Justin Ma	e9ffe6caab	now adding the Split object.	2010-09-01 13:31:06 -07:00
Justin Ma	7a9ff1cc9a	- Got rid of 'Split' type parameter in RDD - Added SampledRDD, SplitRDD and CartesianRDD - Made Split a class rather than a type parameter - Added numCores() to Scheduler to help set default level of parallelism	2010-08-31 12:08:09 -07:00
Justin Ma	ea8c2785dd	now we have sampling with replacement (at least on a per-split basis)	2010-08-18 15:59:35 -07:00
Justin Ma	156bccbe23	HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size RDD.scala: added support for aggregating RDDs on a per-split basis (aggregateSplit()) as well as for sampling without replacement (sample())	2010-08-18 15:25:57 -07:00
Matei Zaharia	75b2ca10c3	Removed HOD from included Hadoop because it was making the project count as Python on GitHub :\|.	2010-08-16 23:16:35 -07:00
Matei Zaharia	1cbffaae6f	Modified Scala interpreter to have it avoid computing string versions of all results when :silent is enabled, so that it is easier to work with large arrays in Spark. (The string version of an array of numbers might not fit in memory even though the array itself does.)	2010-08-15 18:33:27 -07:00
Matei Zaharia	1600c31554	Added latest mesos.jar	2010-08-13 19:03:46 -07:00
Matei Zaharia	0b195927b6	Improved README and added blank templates for config files.	2010-08-13 18:54:32 -07:00
Matei Zaharia	3d8d7fd557	Bug fix from Justin	2010-08-13 11:29:19 -07:00
root	a9481c3514	Update to work with latest Mesos API changes	2010-08-13 07:39:36 +00:00
Matei Zaharia	4488b3bc8a	Fixed a bug where we would incorrectly decide we've finished a parallel operation if Mesos tells us a task is finished twice	2010-08-09 16:46:14 -07:00
Matei Zaharia	f415b071af	Change shell framework's name to "Spark shell"	2010-08-06 12:07:26 -07:00
Matei Zaharia	0e6e577fdf	Add Mesos native library to .gitignore	2010-07-25 23:54:56 -04:00
Matei Zaharia	b56ed67553	Updated code to work with Nexus->Mesos name change	2010-07-25 23:53:46 -04:00
Matei Zaharia	4239f76997	Removed Matei's old start on broadcast code	2010-07-25 23:46:44 -04:00
Matei Zaharia	e240e38ee9	Updated a bunch of libraries, and increased the default memory in run so that unit tests can run successfully.	2010-07-25 21:10:03 -04:00
Matei Zaharia	0435de9e87	Made it possible to set various Spark options and environment variables in general through a conf/spark-env.sh script.	2010-07-19 18:00:30 -07:00
Justin Ma	edad598684	Updated Spark to run with latest Mesos build and Scala-2.8.0.final.	2010-07-19 15:03:49 -07:00
Matei Zaharia	0da5b00d6e	Merge branch 'master' into multi-tracker Conflicts: Makefile run src/scala/spark/Broadcast.scala src/scala/spark/HdfsFile.scala src/scala/spark/NexusScheduler.scala src/scala/spark/SparkContext.scala src/test/spark/repl/ReplSuite.scala third_party/nexus.jar	2010-06-27 22:25:56 -07:00
Matei Zaharia	7d0eae17e3	Merge branch 'dev' Conflicts: src/scala/spark/HdfsFile.scala src/scala/spark/NexusScheduler.scala src/test/spark/repl/ReplSuite.scala	2010-06-27 15:21:54 -07:00
root	6aacaa6870	Made Spark shell class directory configurable.	2010-06-18 23:24:18 +00:00
Matei Zaharia	323571a177	Initial work on union operation.	2010-06-18 12:54:33 -07:00
Matei Zaharia	b54198819e	Added appropriate hashCode, equals and toString to ParallelArraySplit.	2010-06-17 13:19:02 -07:00

... 164 165 166 167 168

8380 commits