Mosharaf Chowdhury
fddcdf87c9
Added a small description of how ParallelLFS works.
2010-12-16 11:58:00 -08:00
Mosharaf Chowdhury
77a4017585
Fixed config param naming in ParallelLocalFileShuffle
2010-12-16 11:42:37 -08:00
Mosharaf Chowdhury
c5483e39f9
- ParallelLocalFileShuffle does NOT use HttpPipelining at all.
...
- Config option related to pipelining has been removed.
- Summary: Basic -> Pipelining / Parallel -> NO pipelining
2010-12-15 22:08:34 -08:00
Mosharaf Chowdhury
56d8a2afa1
- Updated java-opts file of this branch.
...
- Renamed some ParallelLocalFileShuffle config options for clarity.
2010-12-15 20:56:22 -08:00
Mosharaf Chowdhury
25fb3c4cf6
- Brought back Matei's LocalFileShuffle implementation as BasicLocalFileShuffle
...
- Renamed parallel-pull version to ParallelLocalFileShuffle
- Note that setting max-concurrent connections to 1 in ParallelLocalFileShuffle should essentially be the same as BasicLocalFileShuffle
2010-12-15 20:33:28 -08:00
Mosharaf Chowdhury
f82cc17bc5
UseHttpPipelining option is brought back in. It works!
2010-12-07 10:07:30 -08:00
Mosharaf Chowdhury
7e2d72c328
Multiple connections created at a time. No upper limit on the server side though.
2010-12-04 18:55:55 -08:00
Mosharaf Chowdhury
540a41163f
UseHttpPipelining is 'true' by default.
2010-12-02 19:56:17 -08:00
Mosharaf Chowdhury
0de859fbe2
Enabling/disabling HTTP pipelining is a config option now. Performance tradeoffs are not obvious yet.
2010-12-02 02:32:44 -08:00
Mosharaf Chowdhury
8494b3a4f9
- Added log messages for benchmarking.
...
- Added GroupByTest.scala for benchmarking.
2010-11-27 23:51:43 -08:00
Matei Zaharia
f8ea98d989
Remove -unchecked compiler parameter
2010-11-13 18:39:07 -08:00
Matei Zaharia
f8966ffc11
Added a shuffle test with negative hash codes for some keys (this was a bug earlier)
2010-11-12 16:18:45 -08:00
Matei Zaharia
d0a9966555
Unit tests for shuffle operations. Fixes #33 .
2010-11-12 16:12:14 -08:00
Matei Zaharia
7b25ab87af
Added options for using an external HTTP server with LocalFileShuffle
2010-11-09 13:46:30 -08:00
Matei Zaharia
504f839c65
Removed unnecessary collectAsMap
2010-11-08 08:49:42 -08:00
Matei Zaharia
9d3f05a990
Made shuffle algorithm pluggable and added LocalFileShuffle.
2010-11-08 00:46:12 -08:00
Matei Zaharia
d9ea6d69a5
Create output files one by one instead of at the same time in the map
...
phase of DfsShuffle.
2010-11-06 10:53:57 -07:00
Matei Zaharia
16ff4dc0be
Merge branch 'matei-shuffle' of github.com:mesos/spark into matei-shuffle
2010-11-04 14:40:36 -07:00
Matei Zaharia
d984b8ab23
Properly set the number of output splits in DFS shuffle
2010-11-04 14:39:55 -07:00
root
4cc0984b43
Fixed a small bug in DFS shuffle -- the number of reduce tasks was not being set based on numOutputSplits
2010-11-04 21:34:55 +00:00
Matei Zaharia
96f0be935a
Added groupBy function in RDD
2010-11-03 23:58:53 -07:00
Matei Zaharia
72ec298cd4
Added reduceByKey, groupByKey and join operations based on combine, as
...
well as versions of the shuffle operations that set the number of splits
automatically.
2010-11-03 23:51:11 -07:00
Matei Zaharia
d947cb9778
Fixed a bug with negative hashcodes
2010-11-03 22:52:41 -07:00
Matei Zaharia
44530c310b
Made DFS shuffle's "reduce tasks" fetch inputs in a random order so they
...
don't all hit the same nodes at the same time.
2010-11-03 22:45:44 -07:00
Matei Zaharia
820dac5afe
Initial work towards a simple HDFS-based shuffle.
2010-11-03 21:27:24 -07:00
Matei Zaharia
648f42933a
Made alltests write test output as XML in build/test_results
2010-11-02 12:53:38 -07:00
Matei Zaharia
6f93baa463
'Running on Mesos' test is now only run when MESOS_HOME is set
2010-11-02 12:51:22 -07:00
Matei Zaharia
dd7c5d8e34
Added initial attempt at a BoundedMemoryCache
2010-10-24 19:14:35 -07:00
Matei Zaharia
edf86fdb27
Added SizeEstimator class for use by caches
2010-10-24 18:03:49 -07:00
Matei Zaharia
a481e23761
Made caching pluggable and added soft reference and weak reference caches.
2010-10-23 17:54:25 -07:00
Matei Zaharia
93a200bc7e
Renamed aggregateSplit() to splitRdd(), plus some style fixes
2010-10-23 15:34:03 -07:00
Matei Zaharia
787faf0d0e
Fixed a bug with scheduling of tasks that have no locality preferences.
...
These tasks were being subjected to delay scheduling but then counted as
having been launched on a preferred node. The solution is to have a
separate queue for them and treat them as preferred during scheduling.
2010-10-19 16:07:58 -07:00
Matei Zaharia
0e0ec83570
Undid some changes that Mosharaf inadvertedly committed to master.
2010-10-19 13:58:52 -07:00
Mosharaf Chowdhury
bf7055decf
Merge branch 'master' of git@github.com:mesos/spark
...
Conflicts:
src/scala/spark/SparkContext.scala
Using the latest one from Matei.
2010-10-18 11:08:45 -07:00
Matei Zaharia
b940164db3
Less hacky way of preventing config files from being overwritten when a template file changes
2010-10-16 22:01:05 -07:00
Matei Zaharia
e5fb280ec8
Changed the config files that were included in git to templates which
...
are used to create an initial copy of each config file if the user does
not have one. This way, users won't accidentally commit their changes to
config files to git.
2010-10-16 21:51:25 -07:00
Matei Zaharia
023ed194b4
Fixed some whitespace
2010-10-16 21:21:16 -07:00
Matei Zaharia
74bbfa91c2
Added support for generic Hadoop InputFormats and refactored textFile to
...
use this. Closes #12 .
2010-10-16 19:03:33 -07:00
Matei Zaharia
03238cb7c1
Renamed HdfsFile to HadoopFile
2010-10-16 17:25:09 -07:00
Matei Zaharia
0e2adecdab
Simplified UnionRDD slightly and added a SparkContext.union method for efficiently union-ing a large number of RDDs
2010-10-16 17:13:52 -07:00
Matei Zaharia
166d9f9125
Removed setSparkHome method on SparkContext in favor of having an
...
optional constructor parameter, so that the scheduler is guaranteed that
a Spark home has been set when it first builds its executor arg.
2010-10-16 16:19:47 -07:00
Matei Zaharia
1c082ad5fb
Added the ability to specify a list of JAR files when creating a
...
SparkContext and have the master node serve those to workers.
2010-10-16 16:14:13 -07:00
Matei Zaharia
c0b856a056
Set absolute path for SPARK_HOME
2010-10-16 12:18:02 -07:00
Matei Zaharia
7da569e8a5
Keep track of tasks in each job so that they can be removed when the job exits
2010-10-16 12:11:19 -07:00
Matei Zaharia
bf21bb28f3
Further clarified some code
2010-10-16 11:57:36 -07:00
Matei Zaharia
c21f840a80
Fixed some log messages
2010-10-16 10:40:42 -07:00
Matei Zaharia
dbdd7682eb
Bug fixes and improvements for MesosScheduler and SimpleJob
2010-10-16 10:38:56 -07:00
Matei Zaharia
a4953c5051
Moved Spark home detection to SparkContext and added a setSparkHome
...
method for setting it programatically.
2010-10-16 10:02:22 -07:00
Matei Zaharia
47b38fd207
Bug fix in passing env vars to executors
2010-10-16 09:21:43 -07:00
Matei Zaharia
6c1dee2e42
Added code so that Spark jobs can be launched from outside the Spark
...
directory by setting SPARK_HOME and locating the executor relative to
that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed
along to worker nodes.
2010-10-15 19:42:26 -07:00