Commit graph

124 commits

Author SHA1 Message Date
Mosharaf Chowdhury fddcdf87c9 Added a small description of how ParallelLFS works. 2010-12-16 11:58:00 -08:00
Mosharaf Chowdhury 77a4017585 Fixed config param naming in ParallelLocalFileShuffle 2010-12-16 11:42:37 -08:00
Mosharaf Chowdhury c5483e39f9 - ParallelLocalFileShuffle does NOT use HttpPipelining at all.
- Config option related to pipelining has been removed.
 - Summary: Basic -> Pipelining / Parallel -> NO pipelining
2010-12-15 22:08:34 -08:00
Mosharaf Chowdhury 56d8a2afa1 - Updated java-opts file of this branch.
- Renamed some ParallelLocalFileShuffle config options for clarity.
2010-12-15 20:56:22 -08:00
Mosharaf Chowdhury 25fb3c4cf6 - Brought back Matei's LocalFileShuffle implementation as BasicLocalFileShuffle
- Renamed parallel-pull version to ParallelLocalFileShuffle
 - Note that setting max-concurrent connections to 1 in ParallelLocalFileShuffle should essentially be the same as BasicLocalFileShuffle
2010-12-15 20:33:28 -08:00
Mosharaf Chowdhury f82cc17bc5 UseHttpPipelining option is brought back in. It works! 2010-12-07 10:07:30 -08:00
Mosharaf Chowdhury 7e2d72c328 Multiple connections created at a time. No upper limit on the server side though. 2010-12-04 18:55:55 -08:00
Mosharaf Chowdhury 540a41163f UseHttpPipelining is 'true' by default. 2010-12-02 19:56:17 -08:00
Mosharaf Chowdhury 0de859fbe2 Enabling/disabling HTTP pipelining is a config option now. Performance tradeoffs are not obvious yet. 2010-12-02 02:32:44 -08:00
Mosharaf Chowdhury 8494b3a4f9 - Added log messages for benchmarking.
- Added GroupByTest.scala for benchmarking.
2010-11-27 23:51:43 -08:00
Matei Zaharia f8ea98d989 Remove -unchecked compiler parameter 2010-11-13 18:39:07 -08:00
Matei Zaharia f8966ffc11 Added a shuffle test with negative hash codes for some keys (this was a bug earlier) 2010-11-12 16:18:45 -08:00
Matei Zaharia d0a9966555 Unit tests for shuffle operations. Fixes #33. 2010-11-12 16:12:14 -08:00
Matei Zaharia 7b25ab87af Added options for using an external HTTP server with LocalFileShuffle 2010-11-09 13:46:30 -08:00
Matei Zaharia 504f839c65 Removed unnecessary collectAsMap 2010-11-08 08:49:42 -08:00
Matei Zaharia 9d3f05a990 Made shuffle algorithm pluggable and added LocalFileShuffle. 2010-11-08 00:46:12 -08:00
Matei Zaharia d9ea6d69a5 Create output files one by one instead of at the same time in the map
phase of DfsShuffle.
2010-11-06 10:53:57 -07:00
Matei Zaharia 16ff4dc0be Merge branch 'matei-shuffle' of github.com:mesos/spark into matei-shuffle 2010-11-04 14:40:36 -07:00
Matei Zaharia d984b8ab23 Properly set the number of output splits in DFS shuffle 2010-11-04 14:39:55 -07:00
root 4cc0984b43 Fixed a small bug in DFS shuffle -- the number of reduce tasks was not being set based on numOutputSplits 2010-11-04 21:34:55 +00:00
Matei Zaharia 96f0be935a Added groupBy function in RDD 2010-11-03 23:58:53 -07:00
Matei Zaharia 72ec298cd4 Added reduceByKey, groupByKey and join operations based on combine, as
well as versions of the shuffle operations that set the number of splits
automatically.
2010-11-03 23:51:11 -07:00
Matei Zaharia d947cb9778 Fixed a bug with negative hashcodes 2010-11-03 22:52:41 -07:00
Matei Zaharia 44530c310b Made DFS shuffle's "reduce tasks" fetch inputs in a random order so they
don't all hit the same nodes at the same time.
2010-11-03 22:45:44 -07:00
Matei Zaharia 820dac5afe Initial work towards a simple HDFS-based shuffle. 2010-11-03 21:27:24 -07:00
Matei Zaharia 648f42933a Made alltests write test output as XML in build/test_results 2010-11-02 12:53:38 -07:00
Matei Zaharia 6f93baa463 'Running on Mesos' test is now only run when MESOS_HOME is set 2010-11-02 12:51:22 -07:00
Matei Zaharia dd7c5d8e34 Added initial attempt at a BoundedMemoryCache 2010-10-24 19:14:35 -07:00
Matei Zaharia edf86fdb27 Added SizeEstimator class for use by caches 2010-10-24 18:03:49 -07:00
Matei Zaharia a481e23761 Made caching pluggable and added soft reference and weak reference caches. 2010-10-23 17:54:25 -07:00
Matei Zaharia 93a200bc7e Renamed aggregateSplit() to splitRdd(), plus some style fixes 2010-10-23 15:34:03 -07:00
Matei Zaharia 787faf0d0e Fixed a bug with scheduling of tasks that have no locality preferences.
These tasks were being subjected to delay scheduling but then counted as
having been launched on a preferred node. The solution is to have a
separate queue for them and treat them as preferred during scheduling.
2010-10-19 16:07:58 -07:00
Matei Zaharia 0e0ec83570 Undid some changes that Mosharaf inadvertedly committed to master. 2010-10-19 13:58:52 -07:00
Mosharaf Chowdhury bf7055decf Merge branch 'master' of git@github.com:mesos/spark
Conflicts:
	src/scala/spark/SparkContext.scala

Using the latest one from Matei.
2010-10-18 11:08:45 -07:00
Matei Zaharia b940164db3 Less hacky way of preventing config files from being overwritten when a template file changes 2010-10-16 22:01:05 -07:00
Matei Zaharia e5fb280ec8 Changed the config files that were included in git to templates which
are used to create an initial copy of each config file if the user does
not have one. This way, users won't accidentally commit their changes to
config files to git.
2010-10-16 21:51:25 -07:00
Matei Zaharia 023ed194b4 Fixed some whitespace 2010-10-16 21:21:16 -07:00
Matei Zaharia 74bbfa91c2 Added support for generic Hadoop InputFormats and refactored textFile to
use this. Closes #12.
2010-10-16 19:03:33 -07:00
Matei Zaharia 03238cb7c1 Renamed HdfsFile to HadoopFile 2010-10-16 17:25:09 -07:00
Matei Zaharia 0e2adecdab Simplified UnionRDD slightly and added a SparkContext.union method for efficiently union-ing a large number of RDDs 2010-10-16 17:13:52 -07:00
Matei Zaharia 166d9f9125 Removed setSparkHome method on SparkContext in favor of having an
optional constructor parameter, so that the scheduler is guaranteed that
a Spark home has been set when it first builds its executor arg.
2010-10-16 16:19:47 -07:00
Matei Zaharia 1c082ad5fb Added the ability to specify a list of JAR files when creating a
SparkContext and have the master node serve those to workers.
2010-10-16 16:14:13 -07:00
Matei Zaharia c0b856a056 Set absolute path for SPARK_HOME 2010-10-16 12:18:02 -07:00
Matei Zaharia 7da569e8a5 Keep track of tasks in each job so that they can be removed when the job exits 2010-10-16 12:11:19 -07:00
Matei Zaharia bf21bb28f3 Further clarified some code 2010-10-16 11:57:36 -07:00
Matei Zaharia c21f840a80 Fixed some log messages 2010-10-16 10:40:42 -07:00
Matei Zaharia dbdd7682eb Bug fixes and improvements for MesosScheduler and SimpleJob 2010-10-16 10:38:56 -07:00
Matei Zaharia a4953c5051 Moved Spark home detection to SparkContext and added a setSparkHome
method for setting it programatically.
2010-10-16 10:02:22 -07:00
Matei Zaharia 47b38fd207 Bug fix in passing env vars to executors 2010-10-16 09:21:43 -07:00
Matei Zaharia 6c1dee2e42 Added code so that Spark jobs can be launched from outside the Spark
directory by setting SPARK_HOME and locating the executor relative to
that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed
along to worker nodes.
2010-10-15 19:42:26 -07:00