Matei Zaharia
504f839c65
Removed unnecessary collectAsMap
2010-11-08 08:49:42 -08:00
Matei Zaharia
9d3f05a990
Made shuffle algorithm pluggable and added LocalFileShuffle.
2010-11-08 00:46:12 -08:00
Matei Zaharia
d9ea6d69a5
Create output files one by one instead of at the same time in the map
...
phase of DfsShuffle.
2010-11-06 10:53:57 -07:00
Matei Zaharia
16ff4dc0be
Merge branch 'matei-shuffle' of github.com:mesos/spark into matei-shuffle
2010-11-04 14:40:36 -07:00
Matei Zaharia
d984b8ab23
Properly set the number of output splits in DFS shuffle
2010-11-04 14:39:55 -07:00
root
4cc0984b43
Fixed a small bug in DFS shuffle -- the number of reduce tasks was not being set based on numOutputSplits
2010-11-04 21:34:55 +00:00
Matei Zaharia
96f0be935a
Added groupBy function in RDD
2010-11-03 23:58:53 -07:00
Matei Zaharia
72ec298cd4
Added reduceByKey, groupByKey and join operations based on combine, as
...
well as versions of the shuffle operations that set the number of splits
automatically.
2010-11-03 23:51:11 -07:00
Matei Zaharia
d947cb9778
Fixed a bug with negative hashcodes
2010-11-03 22:52:41 -07:00
Matei Zaharia
44530c310b
Made DFS shuffle's "reduce tasks" fetch inputs in a random order so they
...
don't all hit the same nodes at the same time.
2010-11-03 22:45:44 -07:00
Matei Zaharia
820dac5afe
Initial work towards a simple HDFS-based shuffle.
2010-11-03 21:27:24 -07:00
Matei Zaharia
648f42933a
Made alltests write test output as XML in build/test_results
2010-11-02 12:53:38 -07:00
Matei Zaharia
6f93baa463
'Running on Mesos' test is now only run when MESOS_HOME is set
2010-11-02 12:51:22 -07:00
Matei Zaharia
dd7c5d8e34
Added initial attempt at a BoundedMemoryCache
2010-10-24 19:14:35 -07:00
Matei Zaharia
edf86fdb27
Added SizeEstimator class for use by caches
2010-10-24 18:03:49 -07:00
Matei Zaharia
a481e23761
Made caching pluggable and added soft reference and weak reference caches.
2010-10-23 17:54:25 -07:00
Matei Zaharia
93a200bc7e
Renamed aggregateSplit() to splitRdd(), plus some style fixes
2010-10-23 15:34:03 -07:00
Matei Zaharia
787faf0d0e
Fixed a bug with scheduling of tasks that have no locality preferences.
...
These tasks were being subjected to delay scheduling but then counted as
having been launched on a preferred node. The solution is to have a
separate queue for them and treat them as preferred during scheduling.
2010-10-19 16:07:58 -07:00
Matei Zaharia
0e0ec83570
Undid some changes that Mosharaf inadvertedly committed to master.
2010-10-19 13:58:52 -07:00
Mosharaf Chowdhury
bf7055decf
Merge branch 'master' of git@github.com:mesos/spark
...
Conflicts:
src/scala/spark/SparkContext.scala
Using the latest one from Matei.
2010-10-18 11:08:45 -07:00
Matei Zaharia
b940164db3
Less hacky way of preventing config files from being overwritten when a template file changes
2010-10-16 22:01:05 -07:00
Matei Zaharia
e5fb280ec8
Changed the config files that were included in git to templates which
...
are used to create an initial copy of each config file if the user does
not have one. This way, users won't accidentally commit their changes to
config files to git.
2010-10-16 21:51:25 -07:00
Matei Zaharia
023ed194b4
Fixed some whitespace
2010-10-16 21:21:16 -07:00
Matei Zaharia
74bbfa91c2
Added support for generic Hadoop InputFormats and refactored textFile to
...
use this. Closes #12 .
2010-10-16 19:03:33 -07:00
Matei Zaharia
03238cb7c1
Renamed HdfsFile to HadoopFile
2010-10-16 17:25:09 -07:00
Matei Zaharia
0e2adecdab
Simplified UnionRDD slightly and added a SparkContext.union method for efficiently union-ing a large number of RDDs
2010-10-16 17:13:52 -07:00
Matei Zaharia
166d9f9125
Removed setSparkHome method on SparkContext in favor of having an
...
optional constructor parameter, so that the scheduler is guaranteed that
a Spark home has been set when it first builds its executor arg.
2010-10-16 16:19:47 -07:00
Matei Zaharia
1c082ad5fb
Added the ability to specify a list of JAR files when creating a
...
SparkContext and have the master node serve those to workers.
2010-10-16 16:14:13 -07:00
Matei Zaharia
c0b856a056
Set absolute path for SPARK_HOME
2010-10-16 12:18:02 -07:00
Matei Zaharia
7da569e8a5
Keep track of tasks in each job so that they can be removed when the job exits
2010-10-16 12:11:19 -07:00
Matei Zaharia
bf21bb28f3
Further clarified some code
2010-10-16 11:57:36 -07:00
Matei Zaharia
c21f840a80
Fixed some log messages
2010-10-16 10:40:42 -07:00
Matei Zaharia
dbdd7682eb
Bug fixes and improvements for MesosScheduler and SimpleJob
2010-10-16 10:38:56 -07:00
Matei Zaharia
a4953c5051
Moved Spark home detection to SparkContext and added a setSparkHome
...
method for setting it programatically.
2010-10-16 10:02:22 -07:00
Matei Zaharia
47b38fd207
Bug fix in passing env vars to executors
2010-10-16 09:21:43 -07:00
Matei Zaharia
6c1dee2e42
Added code so that Spark jobs can be launched from outside the Spark
...
directory by setting SPARK_HOME and locating the executor relative to
that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed
along to worker nodes.
2010-10-15 19:42:26 -07:00
Matei Zaharia
ecb1af576e
Moved ClassServer out of repl packaged and renamed it to HttpServer.
2010-10-15 19:04:18 -07:00
Matei Zaharia
a768cf417b
Increased default memory for alltests
2010-10-15 16:17:43 -07:00
Matei Zaharia
aa8ccec315
Abort jobs if a task fails more than a limited number of times
2010-10-15 15:57:26 -07:00
Matei Zaharia
57a778426c
Updated guava to version r07
2010-10-15 15:55:58 -07:00
Matei Zaharia
31b5b8b4a6
A couple of improvements to ReplSuite:
...
- Use collect instead of toArray
- Disable the "running on Mesos" test when MESOS_HOME is not set
2010-10-15 15:37:14 -07:00
Matei Zaharia
28d6f23196
Made locality scheduling constant-time and added support for changing
...
CPU and memory requested per task.
2010-10-15 15:36:40 -07:00
Mosharaf Chowdhury
ad7a9c5a36
Minor cleanup in Broadcast.scala.
...
Changed BroadcastTest.scala to have multiple broadcasts.
2010-10-12 12:55:43 -07:00
Matei Zaharia
a9098ad5d4
Moved Job and SimpleJob to new files
2010-10-07 18:27:26 -07:00
Matei Zaharia
a5155206a1
Merge branch 'master' into matei-scheduling
2010-10-07 17:18:32 -07:00
Matei Zaharia
630a982b88
Added a getId method to split to force classes to specify a unique ID
...
for each split. This replaces the previous method of calling
split.toString, which would produce different results for the same split
each time it is deserialized (because the default implementation returns
the Java object's address).
2010-10-07 17:17:07 -07:00
Matei Zaharia
4d9c2aee98
Merge branch 'master' into matei-scheduling
2010-10-07 16:19:53 -07:00
Justin Ma
f9671b086b
got rid of unnecessary line
2010-10-07 14:41:10 -07:00
Justin Ma
4cbca25f49
Merge branch 'master' into jtma-accumulator
2010-10-07 14:39:54 -07:00
Justin Ma
b3517614d8
Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to
...
ensure that the proper keys will be generated when they cached.
2010-10-07 14:38:25 -07:00