Commit graph

8380 commits

Author SHA1 Message Date
Justin Ma 4cbca25f49 Merge branch 'master' into jtma-accumulator 2010-10-07 14:39:54 -07:00
Justin Ma b3517614d8 Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to
ensure that the proper keys will be generated when they cached.
2010-10-07 14:38:25 -07:00
Matei Zaharia 0195ee5ed8 Merge branch 'master' into matei-scheduling 2010-10-05 14:26:20 -07:00
Matei Zaharia a41ca20375 Added splitWords function in Utils 2010-10-04 12:01:05 -07:00
Matei Zaharia 9f20b6b433 Added reduceByKey operation for RDDs containing pairs 2010-10-03 20:28:20 -07:00
Matei Zaharia a826294c3a Merge branch 'master' into matei-scheduling 2010-10-03 13:28:06 -07:00
Matei Zaharia aef9e5b98c Renamed ParallelOperation to Job 2010-10-03 13:28:01 -07:00
root 34eccedbf5 Fixed a rather bad bug in HDFS files that has been in for a while:
caching was not working because Split objects did not have a
consistent toString value
2010-10-03 05:06:06 +00:00
Matei Zaharia b6debf5da1 Merge branch 'matei-logging' 2010-09-29 10:59:01 -07:00
Matei Zaharia f50b23b825 Increase default locality wait to 3s. Fixes #20. 2010-09-29 10:04:00 -07:00
Matei Zaharia a7c0e2a7c3 Made task-finished log messages slightly nicer 2010-09-29 00:22:11 -07:00
Matei Zaharia 40f69140b6 Made spark-executor output slightly nicer 2010-09-29 00:22:09 -07:00
Matei Zaharia 0d28bdcefd A couple of minor fixes:
- Don't include trailing $'s in class names of Scala objects
- Report errors using logError instead of printStackTrace
2010-09-29 00:10:46 -07:00
Matei Zaharia 0fa70a6770 Updated log4j.properties to ignore jetty messages below WARN level 2010-09-28 23:58:19 -07:00
Matei Zaharia 7090dea44b Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster 2010-09-28 23:54:29 -07:00
Matei Zaharia 516248aa66 Added log4j.properties 2010-09-28 23:22:39 -07:00
Matei Zaharia 332c8b8c22 Removed Hadoop's SLF4J jars 2010-09-28 23:16:28 -07:00
Matei Zaharia db623defbe Added Logging trait 2010-09-28 23:12:23 -07:00
Matei Zaharia c7d233b911 Added log4j jars and paths 2010-09-28 23:08:01 -07:00
Matei Zaharia e5e9edeeb3 Merge branch 'http-repl-class-serving' 2010-09-28 22:43:04 -07:00
Matei Zaharia e068f21e01 More work on HTTP class loading 2010-09-28 22:32:38 -07:00
Matei Zaharia 7ef3a20a0c Modified the interpreter to serve classes to the executors using a Jetty
HTTP server instead of a shared (NFS) file system.
2010-09-28 17:55:11 -07:00
Justin Ma b749f0e209 fixed typo in printing which task is already finished 2010-09-28 17:28:54 -07:00
Justin Ma b7ce592bec changes to accumulator to add objects in-place. 2010-09-25 14:37:25 -07:00
Justin Ma 366c09c47b Let's use future instead of actors 2010-09-13 15:30:22 -07:00
Justin Ma 0896fd6219 Added fork()/join() operations for SparkContext, as well as corresponding changes to MesosScheduler to support multiple ParallelOperations. 2010-09-12 09:01:44 -07:00
Justin Ma 6f0d2c1cbc round robin scheduling of tasks has been added 2010-09-07 14:03:59 -07:00
Justin Ma e9ffe6caab now adding the Split object. 2010-09-01 13:31:06 -07:00
Justin Ma 7a9ff1cc9a - Got rid of 'Split' type parameter in RDD
- Added SampledRDD, SplitRDD and CartesianRDD
- Made Split a class rather than a type parameter
- Added numCores() to Scheduler to help set default level of parallelism
2010-08-31 12:08:09 -07:00
Justin Ma ea8c2785dd now we have sampling with replacement (at least on a per-split basis) 2010-08-18 15:59:35 -07:00
Justin Ma 156bccbe23 HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files
MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size
RDD.scala: added support for aggregating RDDs on a per-split basis
(aggregateSplit()) as well as for sampling without replacement (sample())
2010-08-18 15:25:57 -07:00
Matei Zaharia 75b2ca10c3 Removed HOD from included Hadoop because it was making the project count
as Python on GitHub :|.
2010-08-16 23:16:35 -07:00
Matei Zaharia 1cbffaae6f Modified Scala interpreter to have it avoid computing string versions of
all results when :silent is enabled, so that it is easier to work with
large arrays in Spark. (The string version of an array of numbers might
not fit in memory even though the array itself does.)
2010-08-15 18:33:27 -07:00
Matei Zaharia 1600c31554 Added latest mesos.jar 2010-08-13 19:03:46 -07:00
Matei Zaharia 0b195927b6 Improved README and added blank templates for config files. 2010-08-13 18:54:32 -07:00
Matei Zaharia 3d8d7fd557 Bug fix from Justin 2010-08-13 11:29:19 -07:00
root a9481c3514 Update to work with latest Mesos API changes 2010-08-13 07:39:36 +00:00
Matei Zaharia 4488b3bc8a Fixed a bug where we would incorrectly decide we've finished a parallel operation if Mesos tells us a task is finished twice 2010-08-09 16:46:14 -07:00
Matei Zaharia f415b071af Change shell framework's name to "Spark shell" 2010-08-06 12:07:26 -07:00
Matei Zaharia 0e6e577fdf Add Mesos native library to .gitignore 2010-07-25 23:54:56 -04:00
Matei Zaharia b56ed67553 Updated code to work with Nexus->Mesos name change 2010-07-25 23:53:46 -04:00
Matei Zaharia 4239f76997 Removed Matei's old start on broadcast code 2010-07-25 23:46:44 -04:00
Matei Zaharia e240e38ee9 Updated a bunch of libraries, and increased the default memory in run so
that unit tests can run successfully.
2010-07-25 21:10:03 -04:00
Matei Zaharia 0435de9e87 Made it possible to set various Spark options and environment variables
in general through a conf/spark-env.sh script.
2010-07-19 18:00:30 -07:00
Justin Ma edad598684 Updated Spark to run with latest Mesos build and Scala-2.8.0.final. 2010-07-19 15:03:49 -07:00
Matei Zaharia 0da5b00d6e Merge branch 'master' into multi-tracker
Conflicts:
	Makefile
	run
	src/scala/spark/Broadcast.scala
	src/scala/spark/HdfsFile.scala
	src/scala/spark/NexusScheduler.scala
	src/scala/spark/SparkContext.scala
	src/test/spark/repl/ReplSuite.scala
	third_party/nexus.jar
2010-06-27 22:25:56 -07:00
Matei Zaharia 7d0eae17e3 Merge branch 'dev'
Conflicts:
	src/scala/spark/HdfsFile.scala
	src/scala/spark/NexusScheduler.scala
	src/test/spark/repl/ReplSuite.scala
2010-06-27 15:21:54 -07:00
root 6aacaa6870 Made Spark shell class directory configurable. 2010-06-18 23:24:18 +00:00
Matei Zaharia 323571a177 Initial work on union operation. 2010-06-18 12:54:33 -07:00
Matei Zaharia b54198819e Added appropriate hashCode, equals and toString to ParallelArraySplit. 2010-06-17 13:19:02 -07:00