Commit graph

99 commits

Author SHA1 Message Date
Matei Zaharia 648f42933a Made alltests write test output as XML in build/test_results 2010-11-02 12:53:38 -07:00
Matei Zaharia 6f93baa463 'Running on Mesos' test is now only run when MESOS_HOME is set 2010-11-02 12:51:22 -07:00
Matei Zaharia dd7c5d8e34 Added initial attempt at a BoundedMemoryCache 2010-10-24 19:14:35 -07:00
Matei Zaharia edf86fdb27 Added SizeEstimator class for use by caches 2010-10-24 18:03:49 -07:00
Matei Zaharia a481e23761 Made caching pluggable and added soft reference and weak reference caches. 2010-10-23 17:54:25 -07:00
Matei Zaharia 93a200bc7e Renamed aggregateSplit() to splitRdd(), plus some style fixes 2010-10-23 15:34:03 -07:00
Matei Zaharia 787faf0d0e Fixed a bug with scheduling of tasks that have no locality preferences.
These tasks were being subjected to delay scheduling but then counted as
having been launched on a preferred node. The solution is to have a
separate queue for them and treat them as preferred during scheduling.
2010-10-19 16:07:58 -07:00
Matei Zaharia 0e0ec83570 Undid some changes that Mosharaf inadvertedly committed to master. 2010-10-19 13:58:52 -07:00
Mosharaf Chowdhury bf7055decf Merge branch 'master' of git@github.com:mesos/spark
Conflicts:
	src/scala/spark/SparkContext.scala

Using the latest one from Matei.
2010-10-18 11:08:45 -07:00
Matei Zaharia b940164db3 Less hacky way of preventing config files from being overwritten when a template file changes 2010-10-16 22:01:05 -07:00
Matei Zaharia e5fb280ec8 Changed the config files that were included in git to templates which
are used to create an initial copy of each config file if the user does
not have one. This way, users won't accidentally commit their changes to
config files to git.
2010-10-16 21:51:25 -07:00
Matei Zaharia 023ed194b4 Fixed some whitespace 2010-10-16 21:21:16 -07:00
Matei Zaharia 74bbfa91c2 Added support for generic Hadoop InputFormats and refactored textFile to
use this. Closes #12.
2010-10-16 19:03:33 -07:00
Matei Zaharia 03238cb7c1 Renamed HdfsFile to HadoopFile 2010-10-16 17:25:09 -07:00
Matei Zaharia 0e2adecdab Simplified UnionRDD slightly and added a SparkContext.union method for efficiently union-ing a large number of RDDs 2010-10-16 17:13:52 -07:00
Matei Zaharia 166d9f9125 Removed setSparkHome method on SparkContext in favor of having an
optional constructor parameter, so that the scheduler is guaranteed that
a Spark home has been set when it first builds its executor arg.
2010-10-16 16:19:47 -07:00
Matei Zaharia 1c082ad5fb Added the ability to specify a list of JAR files when creating a
SparkContext and have the master node serve those to workers.
2010-10-16 16:14:13 -07:00
Matei Zaharia c0b856a056 Set absolute path for SPARK_HOME 2010-10-16 12:18:02 -07:00
Matei Zaharia 7da569e8a5 Keep track of tasks in each job so that they can be removed when the job exits 2010-10-16 12:11:19 -07:00
Matei Zaharia bf21bb28f3 Further clarified some code 2010-10-16 11:57:36 -07:00
Matei Zaharia c21f840a80 Fixed some log messages 2010-10-16 10:40:42 -07:00
Matei Zaharia dbdd7682eb Bug fixes and improvements for MesosScheduler and SimpleJob 2010-10-16 10:38:56 -07:00
Matei Zaharia a4953c5051 Moved Spark home detection to SparkContext and added a setSparkHome
method for setting it programatically.
2010-10-16 10:02:22 -07:00
Matei Zaharia 47b38fd207 Bug fix in passing env vars to executors 2010-10-16 09:21:43 -07:00
Matei Zaharia 6c1dee2e42 Added code so that Spark jobs can be launched from outside the Spark
directory by setting SPARK_HOME and locating the executor relative to
that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed
along to worker nodes.
2010-10-15 19:42:26 -07:00
Matei Zaharia ecb1af576e Moved ClassServer out of repl packaged and renamed it to HttpServer. 2010-10-15 19:04:18 -07:00
Matei Zaharia a768cf417b Increased default memory for alltests 2010-10-15 16:17:43 -07:00
Matei Zaharia aa8ccec315 Abort jobs if a task fails more than a limited number of times 2010-10-15 15:57:26 -07:00
Matei Zaharia 57a778426c Updated guava to version r07 2010-10-15 15:55:58 -07:00
Matei Zaharia 31b5b8b4a6 A couple of improvements to ReplSuite:
- Use collect instead of toArray
- Disable the "running on Mesos" test when MESOS_HOME is not set
2010-10-15 15:37:14 -07:00
Matei Zaharia 28d6f23196 Made locality scheduling constant-time and added support for changing
CPU and memory requested per task.
2010-10-15 15:36:40 -07:00
Mosharaf Chowdhury ad7a9c5a36 Minor cleanup in Broadcast.scala.
Changed BroadcastTest.scala to have multiple broadcasts.
2010-10-12 12:55:43 -07:00
Matei Zaharia a9098ad5d4 Moved Job and SimpleJob to new files 2010-10-07 18:27:26 -07:00
Matei Zaharia a5155206a1 Merge branch 'master' into matei-scheduling 2010-10-07 17:18:32 -07:00
Matei Zaharia 630a982b88 Added a getId method to split to force classes to specify a unique ID
for each split. This replaces the previous method of calling
split.toString, which would produce different results for the same split
each time it is deserialized (because the default implementation returns
the Java object's address).
2010-10-07 17:17:07 -07:00
Matei Zaharia 4d9c2aee98 Merge branch 'master' into matei-scheduling 2010-10-07 16:19:53 -07:00
Justin Ma f9671b086b got rid of unnecessary line 2010-10-07 14:41:10 -07:00
Justin Ma 4cbca25f49 Merge branch 'master' into jtma-accumulator 2010-10-07 14:39:54 -07:00
Justin Ma b3517614d8 Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to
ensure that the proper keys will be generated when they cached.
2010-10-07 14:38:25 -07:00
Matei Zaharia 0195ee5ed8 Merge branch 'master' into matei-scheduling 2010-10-05 14:26:20 -07:00
Matei Zaharia a41ca20375 Added splitWords function in Utils 2010-10-04 12:01:05 -07:00
Matei Zaharia 9f20b6b433 Added reduceByKey operation for RDDs containing pairs 2010-10-03 20:28:20 -07:00
Matei Zaharia a826294c3a Merge branch 'master' into matei-scheduling 2010-10-03 13:28:06 -07:00
Matei Zaharia aef9e5b98c Renamed ParallelOperation to Job 2010-10-03 13:28:01 -07:00
root 34eccedbf5 Fixed a rather bad bug in HDFS files that has been in for a while:
caching was not working because Split objects did not have a
consistent toString value
2010-10-03 05:06:06 +00:00
Matei Zaharia b6debf5da1 Merge branch 'matei-logging' 2010-09-29 10:59:01 -07:00
Matei Zaharia f50b23b825 Increase default locality wait to 3s. Fixes #20. 2010-09-29 10:04:00 -07:00
Matei Zaharia a7c0e2a7c3 Made task-finished log messages slightly nicer 2010-09-29 00:22:11 -07:00
Matei Zaharia 40f69140b6 Made spark-executor output slightly nicer 2010-09-29 00:22:09 -07:00
Matei Zaharia 0d28bdcefd A couple of minor fixes:
- Don't include trailing $'s in class names of Scala objects
- Report errors using logError instead of printStackTrace
2010-09-29 00:10:46 -07:00