Matei Zaharia
0195ee5ed8
Merge branch 'master' into matei-scheduling
2010-10-05 14:26:20 -07:00
Matei Zaharia
a41ca20375
Added splitWords function in Utils
2010-10-04 12:01:05 -07:00
Matei Zaharia
9f20b6b433
Added reduceByKey operation for RDDs containing pairs
2010-10-03 20:28:20 -07:00
Matei Zaharia
a826294c3a
Merge branch 'master' into matei-scheduling
2010-10-03 13:28:06 -07:00
Matei Zaharia
aef9e5b98c
Renamed ParallelOperation to Job
2010-10-03 13:28:01 -07:00
root
34eccedbf5
Fixed a rather bad bug in HDFS files that has been in for a while:
...
caching was not working because Split objects did not have a
consistent toString value
2010-10-03 05:06:06 +00:00
Matei Zaharia
b6debf5da1
Merge branch 'matei-logging'
2010-09-29 10:59:01 -07:00
Matei Zaharia
f50b23b825
Increase default locality wait to 3s. Fixes #20 .
2010-09-29 10:04:00 -07:00
Matei Zaharia
a7c0e2a7c3
Made task-finished log messages slightly nicer
2010-09-29 00:22:11 -07:00
Matei Zaharia
40f69140b6
Made spark-executor output slightly nicer
2010-09-29 00:22:09 -07:00
Matei Zaharia
0d28bdcefd
A couple of minor fixes:
...
- Don't include trailing $'s in class names of Scala objects
- Report errors using logError instead of printStackTrace
2010-09-29 00:10:46 -07:00
Matei Zaharia
0fa70a6770
Updated log4j.properties to ignore jetty messages below WARN level
2010-09-28 23:58:19 -07:00
Matei Zaharia
7090dea44b
Changed printlns to log statements and fixed a bug in run that was causing it to fail on a Mesos cluster
2010-09-28 23:54:29 -07:00
Matei Zaharia
516248aa66
Added log4j.properties
2010-09-28 23:22:39 -07:00
Matei Zaharia
332c8b8c22
Removed Hadoop's SLF4J jars
2010-09-28 23:16:28 -07:00
Matei Zaharia
db623defbe
Added Logging trait
2010-09-28 23:12:23 -07:00
Matei Zaharia
c7d233b911
Added log4j jars and paths
2010-09-28 23:08:01 -07:00
Matei Zaharia
e5e9edeeb3
Merge branch 'http-repl-class-serving'
2010-09-28 22:43:04 -07:00
Matei Zaharia
e068f21e01
More work on HTTP class loading
2010-09-28 22:32:38 -07:00
Matei Zaharia
7ef3a20a0c
Modified the interpreter to serve classes to the executors using a Jetty
...
HTTP server instead of a shared (NFS) file system.
2010-09-28 17:55:11 -07:00
Justin Ma
b749f0e209
fixed typo in printing which task is already finished
2010-09-28 17:28:54 -07:00
Justin Ma
b7ce592bec
changes to accumulator to add objects in-place.
2010-09-25 14:37:25 -07:00
Justin Ma
366c09c47b
Let's use future instead of actors
2010-09-13 15:30:22 -07:00
Justin Ma
0896fd6219
Added fork()/join() operations for SparkContext, as well as corresponding changes to MesosScheduler to support multiple ParallelOperations.
2010-09-12 09:01:44 -07:00
Justin Ma
6f0d2c1cbc
round robin scheduling of tasks has been added
2010-09-07 14:03:59 -07:00
Justin Ma
e9ffe6caab
now adding the Split object.
2010-09-01 13:31:06 -07:00
Justin Ma
7a9ff1cc9a
- Got rid of 'Split' type parameter in RDD
...
- Added SampledRDD, SplitRDD and CartesianRDD
- Made Split a class rather than a type parameter
- Added numCores() to Scheduler to help set default level of parallelism
2010-08-31 12:08:09 -07:00
Justin Ma
ea8c2785dd
now we have sampling with replacement (at least on a per-split basis)
2010-08-18 15:59:35 -07:00
Justin Ma
156bccbe23
HdfsFile.scala: added a try/catch block to exit gracefully for correupted gzip files
...
MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size
RDD.scala: added support for aggregating RDDs on a per-split basis
(aggregateSplit()) as well as for sampling without replacement (sample())
2010-08-18 15:25:57 -07:00
Matei Zaharia
75b2ca10c3
Removed HOD from included Hadoop because it was making the project count
...
as Python on GitHub :|.
2010-08-16 23:16:35 -07:00
Matei Zaharia
1cbffaae6f
Modified Scala interpreter to have it avoid computing string versions of
...
all results when :silent is enabled, so that it is easier to work with
large arrays in Spark. (The string version of an array of numbers might
not fit in memory even though the array itself does.)
2010-08-15 18:33:27 -07:00
Matei Zaharia
1600c31554
Added latest mesos.jar
2010-08-13 19:03:46 -07:00
Matei Zaharia
0b195927b6
Improved README and added blank templates for config files.
2010-08-13 18:54:32 -07:00
Matei Zaharia
3d8d7fd557
Bug fix from Justin
2010-08-13 11:29:19 -07:00
root
a9481c3514
Update to work with latest Mesos API changes
2010-08-13 07:39:36 +00:00
Matei Zaharia
4488b3bc8a
Fixed a bug where we would incorrectly decide we've finished a parallel operation if Mesos tells us a task is finished twice
2010-08-09 16:46:14 -07:00
Matei Zaharia
f415b071af
Change shell framework's name to "Spark shell"
2010-08-06 12:07:26 -07:00
Matei Zaharia
0e6e577fdf
Add Mesos native library to .gitignore
2010-07-25 23:54:56 -04:00
Matei Zaharia
b56ed67553
Updated code to work with Nexus->Mesos name change
2010-07-25 23:53:46 -04:00
Matei Zaharia
4239f76997
Removed Matei's old start on broadcast code
2010-07-25 23:46:44 -04:00
Matei Zaharia
e240e38ee9
Updated a bunch of libraries, and increased the default memory in run so
...
that unit tests can run successfully.
2010-07-25 21:10:03 -04:00
Matei Zaharia
0435de9e87
Made it possible to set various Spark options and environment variables
...
in general through a conf/spark-env.sh script.
2010-07-19 18:00:30 -07:00
Justin Ma
edad598684
Updated Spark to run with latest Mesos build and Scala-2.8.0.final.
2010-07-19 15:03:49 -07:00
Matei Zaharia
0da5b00d6e
Merge branch 'master' into multi-tracker
...
Conflicts:
Makefile
run
src/scala/spark/Broadcast.scala
src/scala/spark/HdfsFile.scala
src/scala/spark/NexusScheduler.scala
src/scala/spark/SparkContext.scala
src/test/spark/repl/ReplSuite.scala
third_party/nexus.jar
2010-06-27 22:25:56 -07:00
Matei Zaharia
7d0eae17e3
Merge branch 'dev'
...
Conflicts:
src/scala/spark/HdfsFile.scala
src/scala/spark/NexusScheduler.scala
src/test/spark/repl/ReplSuite.scala
2010-06-27 15:21:54 -07:00
root
6aacaa6870
Made Spark shell class directory configurable.
2010-06-18 23:24:18 +00:00
Matei Zaharia
323571a177
Initial work on union operation.
2010-06-18 12:54:33 -07:00
Matei Zaharia
b54198819e
Added appropriate hashCode, equals and toString to ParallelArraySplit.
2010-06-17 13:19:02 -07:00
Matei Zaharia
cd247b7d86
Created common RDD superclass for distributed files and parallel arrays.
...
This also means that parallel arrays now get all the functionality files
used to have (filter, map, reduce, cache, etc).
2010-06-17 12:49:42 -07:00
Matei Zaharia
77103eab2a
Fixed README
2010-06-11 14:55:23 -07:00