Commit graph

247 commits

Author SHA1 Message Date
Matei Zaharia aa8ccec315 Abort jobs if a task fails more than a limited number of times 2010-10-15 15:57:26 -07:00
Matei Zaharia 57a778426c Updated guava to version r07 2010-10-15 15:55:58 -07:00
Matei Zaharia 31b5b8b4a6 A couple of improvements to ReplSuite:
- Use collect instead of toArray
- Disable the "running on Mesos" test when MESOS_HOME is not set
2010-10-15 15:37:14 -07:00
Matei Zaharia 28d6f23196 Made locality scheduling constant-time and added support for changing
CPU and memory requested per task.
2010-10-15 15:36:40 -07:00
Mosharaf Chowdhury a4c0281902 sendObject now takes parameters instead of relying on class
variables.
2010-10-14 15:36:23 -07:00
Mosharaf Chowdhury a137ca75da Got rid pf dualMode. 2010-10-13 17:01:00 -07:00
Mosharaf Chowdhury 38194e5731 - Changed guidePort to GuideInfo that now contains the hostAddress
as well as the port. This will allow anyone other than the master
to be a guide.
- The GuideInfo object now contains the constants related to
tracker response.
2010-10-13 16:26:18 -07:00
Mosharaf Chowdhury 8690be8f5a Cleared up some formatting.
Branching out from here to work on BT.
2010-10-13 11:40:03 -07:00
Mosharaf Chowdhury 0d67bc1cee multi-tracker branch now compiles and runs; but it crashes right before the
end. The same problem is seen also in the master branch (in the
ChainedStreaming implementation)
2010-10-12 15:39:53 -07:00
Mosharaf Chowdhury 4fdd48295b Added mesos.jar. Still not working. Major changes required. 2010-10-12 13:10:31 -07:00
Mosharaf Chowdhury e73a5f3491 Now compiles with Scala 2.8.0, but doesn't run with nexus.jar
Must update it to use mesos.jar
2010-10-12 13:05:32 -07:00
Mosharaf Chowdhury ad7a9c5a36 Minor cleanup in Broadcast.scala.
Changed BroadcastTest.scala to have multiple broadcasts.
2010-10-12 12:55:43 -07:00
Matei Zaharia a9098ad5d4 Moved Job and SimpleJob to new files 2010-10-07 18:27:26 -07:00
Matei Zaharia a5155206a1 Merge branch 'master' into matei-scheduling 2010-10-07 17:18:32 -07:00
Matei Zaharia 630a982b88 Added a getId method to split to force classes to specify a unique ID
for each split. This replaces the previous method of calling
split.toString, which would produce different results for the same split
each time it is deserialized (because the default implementation returns
the Java object's address).
2010-10-07 17:17:07 -07:00
Matei Zaharia 4d9c2aee98 Merge branch 'master' into matei-scheduling 2010-10-07 16:19:53 -07:00
Justin Ma f9671b086b got rid of unnecessary line 2010-10-07 14:41:10 -07:00
Justin Ma 4cbca25f49 Merge branch 'master' into jtma-accumulator 2010-10-07 14:39:54 -07:00
Justin Ma b3517614d8 Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to
ensure that the proper keys will be generated when they cached.
2010-10-07 14:38:25 -07:00
Matei Zaharia 0195ee5ed8 Merge branch 'master' into matei-scheduling 2010-10-05 14:26:20 -07:00
Matei Zaharia a41ca20375 Added splitWords function in Utils 2010-10-04 12:01:05 -07:00
Matei Zaharia 9f20b6b433 Added reduceByKey operation for RDDs containing pairs 2010-10-03 20:28:20 -07:00
Matei Zaharia a826294c3a Merge branch 'master' into matei-scheduling 2010-10-03 13:28:06 -07:00
Matei Zaharia aef9e5b98c Renamed ParallelOperation to Job 2010-10-03 13:28:01 -07:00
root 34eccedbf5 Fixed a rather bad bug in HDFS files that has been in for a while:
caching was not working because Split objects did not have a
consistent toString value
2010-10-03 05:06:06 +00:00
Matei Zaharia b6debf5da1 Merge branch 'matei-logging' 2010-09-29 10:59:01 -07:00
Matei Zaharia f50b23b825 Increase default locality wait to 3s. Fixes #20. 2010-09-29 10:04:00 -07:00
Matei Zaharia a7c0e2a7c3 Made task-finished log messages slightly nicer 2010-09-29 00:22:11 -07:00
Matei Zaharia 40f69140b6 Made spark-executor output slightly nicer 2010-09-29 00:22:09 -07:00
Matei Zaharia 0d28bdcefd A couple of minor fixes:
- Don't include trailing $'s in class names of Scala objects
- Report errors using logError instead of printStackTrace
2010-09-29 00:10:46 -07:00
Matei Zaharia 0fa70a6770 Updated log4j.properties to ignore jetty messages below WARN level 2010-09-28 23:58:19 -07:00
Matei Zaharia 7090dea44b Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster 2010-09-28 23:54:29 -07:00
Matei Zaharia 516248aa66 Added log4j.properties 2010-09-28 23:22:39 -07:00
Matei Zaharia 332c8b8c22 Removed Hadoop's SLF4J jars 2010-09-28 23:16:28 -07:00
Matei Zaharia db623defbe Added Logging trait 2010-09-28 23:12:23 -07:00
Matei Zaharia c7d233b911 Added log4j jars and paths 2010-09-28 23:08:01 -07:00
Matei Zaharia e5e9edeeb3 Merge branch 'http-repl-class-serving' 2010-09-28 22:43:04 -07:00
Matei Zaharia e068f21e01 More work on HTTP class loading 2010-09-28 22:32:38 -07:00
Matei Zaharia 7ef3a20a0c Modified the interpreter to serve classes to the executors using a Jetty
HTTP server instead of a shared (NFS) file system.
2010-09-28 17:55:11 -07:00
Justin Ma b749f0e209 fixed typo in printing which task is already finished 2010-09-28 17:28:54 -07:00
Justin Ma b7ce592bec changes to accumulator to add objects in-place. 2010-09-25 14:37:25 -07:00
Justin Ma 366c09c47b Let's use future instead of actors 2010-09-13 15:30:22 -07:00
Justin Ma 0896fd6219 Added fork()/join() operations for SparkContext, as well as corresponding changes to MesosScheduler to support multiple ParallelOperations. 2010-09-12 09:01:44 -07:00
Justin Ma 6f0d2c1cbc round robin scheduling of tasks has been added 2010-09-07 14:03:59 -07:00
Justin Ma e9ffe6caab now adding the Split object. 2010-09-01 13:31:06 -07:00
Justin Ma 7a9ff1cc9a - Got rid of 'Split' type parameter in RDD
- Added SampledRDD, SplitRDD and CartesianRDD
- Made Split a class rather than a type parameter
- Added numCores() to Scheduler to help set default level of parallelism
2010-08-31 12:08:09 -07:00
Justin Ma ea8c2785dd now we have sampling with replacement (at least on a per-split basis) 2010-08-18 15:59:35 -07:00
Justin Ma 156bccbe23 HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files
MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size
RDD.scala: added support for aggregating RDDs on a per-split basis
(aggregateSplit()) as well as for sampling without replacement (sample())
2010-08-18 15:25:57 -07:00
Matei Zaharia 75b2ca10c3 Removed HOD from included Hadoop because it was making the project count
as Python on GitHub :|.
2010-08-16 23:16:35 -07:00
Matei Zaharia 1cbffaae6f Modified Scala interpreter to have it avoid computing string versions of
all results when :silent is enabled, so that it is easier to work with
large arrays in Spark. (The string version of an array of numbers might
not fit in memory even though the array itself does.)
2010-08-15 18:33:27 -07:00