Matei Zaharia
447debb771
Updated Hadoop to 0.20.2 to include some bug fixes
2011-03-01 10:27:48 -08:00
Matei Zaharia
9e59afd710
More work on new RDD design
2011-02-27 19:15:52 -08:00
Matei Zaharia
f38f86d59e
More stuff
2011-02-27 14:27:12 -08:00
Matei Zaharia
2e6023f2bf
stuff
2011-02-26 23:41:44 -08:00
Matei Zaharia
309367c477
Initial work towards new RDD design
2011-02-26 23:15:33 -08:00
Mosharaf Chowdhury
0416cc22d2
Picking peers weighted by the number of rare blocks they have. A block is rare if there are at most 2 copies in the neighborhood. Better number can be used (some function of neighborhood size)
2011-02-15 16:27:44 -08:00
Mosharaf Chowdhury
cf81da9485
Optimization: Master sends out at least one copy of each block first regardless of whatever a client is asking for. Once one copy of each block is out, Master then responds to specific blocks from individual receivers.
2011-02-14 15:08:33 -08:00
Mosharaf Chowdhury
2b946fb2d1
pickBlockRarestFirst and gossips commented OUT for now.
...
Problem with the rarestFirst implemention is that we are picking peers randomly first and then picking blocks from the random peer using rarestFirst. NOT the right away to do it. It should be the other way around.
Problem with gossip is that peers might end up overwriting newer information by older ones. To fix that we either have to have timestamps or must match the bitVectors before overwriting.
2011-02-13 13:53:15 -08:00
Mosharaf Chowdhury
ca2895ebb0
Fix in rarestFirst implemenation.
...
If there are more than one rarest blocks, pick randomly between them (was deterministic before)
2011-02-10 20:37:44 -08:00
Mosharaf Chowdhury
520bbdc7e3
Peers now gossip about their neighbors when they talk.
2011-02-10 20:15:30 -08:00
Matei Zaharia
dc24aecd8f
Close record readers in HadoopFile after finishing a split
2011-02-10 12:07:48 -08:00
Mosharaf Chowdhury
8034728afc
Merge branch 'master' into mos-bt
2011-02-09 15:30:41 -08:00
Mosharaf Chowdhury
441462bc7f
Fixed some warnings during compilation.
2011-02-09 12:11:43 -08:00
Matei Zaharia
62f1c6f5a8
Remove build.properties from version control
2011-02-09 11:52:56 -08:00
Mosharaf Chowdhury
1a73c0d265
Merged with master. Using sbt.
2011-02-09 10:48:48 -08:00
Mosharaf Chowdhury
495b38658e
Merge branch 'master' into mos-bt
2011-02-09 10:40:23 -08:00
Matei Zaharia
d3df963a13
Brought in some reorganization of build file from Hive branch
2011-02-08 21:27:36 -08:00
Matei Zaharia
e8df4bbd40
Added more SBT stuff to gitignore
2011-02-08 17:06:07 -08:00
Matei Zaharia
26b77aece9
Increased SBT mem to 700 MB so that unit tests run more nicely
2011-02-08 17:03:28 -08:00
Matei Zaharia
99f3f23efa
Changed default shuffle to LocalFileShuffle because it's way faster for small files
2011-02-08 17:03:03 -08:00
Matei Zaharia
f4f7aa2ab2
formatting
2011-02-08 16:39:17 -08:00
Matei Zaharia
ee60aaa0f5
Added a pointer to wiki in readme
2011-02-08 16:38:10 -08:00
Mosharaf Chowdhury
a12c0b6c00
Implemented rarest-first block request policy. Need to test it though.
2011-02-08 14:56:39 -08:00
Mosharaf Chowdhury
dfbc5af6ba
Added the fake shuffle to git.
2011-02-05 12:32:16 -08:00
Mosharaf Chowdhury
9b6d4074f9
Updating hasBlocks variable inside synchronized block
2011-02-05 12:14:41 -08:00
Matei Zaharia
c1c766a93c
Updated readme
2011-02-02 19:21:49 -08:00
Matei Zaharia
50df43bf7b
Added SBT target for building a single JAR with Spark Core and its
...
dependencies
2011-02-02 19:08:14 -08:00
Matei Zaharia
a11fe23017
Moved examples to spark.examples package
2011-02-02 16:30:27 -08:00
Matei Zaharia
82170608b1
Added IntelliJ's build directory to gitignore
2011-02-02 00:30:29 -08:00
Matei Zaharia
ec28b607fd
Merge branch 'master' into sbt
...
Conflicts:
Makefile
core/src/main/java/spark/compress/lzf/LZF.java
core/src/main/java/spark/compress/lzf/LZFInputStream.java
core/src/main/java/spark/compress/lzf/LZFOutputStream.java
core/src/main/native/spark_compress_lzf_LZF.c
run
2011-02-02 00:25:54 -08:00
Matei Zaharia
7f74ee99f6
Added support for IntelliJ IDEA
2011-02-02 00:08:13 -08:00
Matei Zaharia
e5c4cd8a5e
Made examples and core subprojects
2011-02-01 15:11:08 -08:00
Mosharaf Chowdhury
9a6110fd99
Bug fix: a reducer never returns until its shuffleConsumer thread joins.
2011-01-25 20:01:31 -08:00
Mosharaf Chowdhury
b9b461e78f
Updating hasSplits variable inside a synchronized block now. This was causing a concurrency bug. However, this is a hacky solution and should be fixed.
2011-01-22 18:03:38 -08:00
Mosharaf Chowdhury
888a584434
Bug fix in a log statement
2011-01-20 17:01:53 -08:00
Mosharaf Chowdhury
3ab7ddd2e6
Bug fixes + updated limit connecitons shuffle strategy to handle endgame situation.
...
After the last commit the bottleneck shifted from "requesting to tracker for mappers" (now done in batches) to "notifying tracker when threads leave" (done individually)
2011-01-14 01:50:21 -08:00
Mosharaf Chowdhury
c71124ed3d
Revamped tracker interface. Now tracker can send back multiple response at the same time.
...
Turned OFF timers in the reducers due to inconsistent behavior (sometimes they fire, sometimes they don't)
2011-01-14 00:08:58 -08:00
Mosharaf Chowdhury
36b21813fa
Implemented a tracker strategy that allows reducers to create concurrent connections proportional to their time remaining. Which connections to create is random though.
2011-01-13 21:02:53 -08:00
Mosharaf Chowdhury
025b5485b7
Changed speed estimation and time remaining variables to Double instead of Int.
2011-01-13 19:41:04 -08:00
Mosharaf Chowdhury
5bf6369220
Added a tracker strategy that selects random mappers for reducers. This can be used to measure tracker overhead.
2011-01-13 19:22:07 -08:00
Mosharaf Chowdhury
fceae31877
Bug fix in selectSuitableSources: was changing the index value before updating a bitset using that same index variable.
2011-01-10 20:58:59 -08:00
Mosharaf Chowdhury
fd3fd37383
In-memory version of tracker+blocked shuffle checked in.
2011-01-10 17:13:52 -08:00
Mosharaf Chowdhury
d7081a927f
Updated conf/java-opts
2011-01-09 17:19:26 -08:00
Mosharaf Chowdhury
de676428ac
Merge branch 'mos-shuffle-tracked' of git@github.com:mesos/spark into mos-shuffle-tracked
2011-01-07 01:44:32 -08:00
Mosharaf Chowdhury
ba74169162
Dummy log line added in SparkContext for #45
2011-01-06 21:21:35 -08:00
Mosharaf Chowdhury
e76b4f7155
Turned OFF some log messages due to #45 .
2011-01-06 17:07:14 -08:00
Mosharaf Chowdhury
f5ba0b49c6
Bug fix + default config param values update.
...
Updated some default config parameters for broadcast parameters.
Fixed a concurrency bug in the variable that keeps track of how many receivers have finished.
2011-01-06 15:44:24 -08:00
Matei Zaharia
15829df799
Updated simple skewed group-by to generate multiple keys with each
...
hashcode so that Mosharaf's current block implementation works
2011-01-05 22:31:50 -05:00
Mosharaf Chowdhury
6908fa6c31
Updated BalanceRemaining strategy for a better faster-reducers identification mechanism.
2011-01-05 19:19:52 -08:00
Mosharaf Chowdhury
69845d5648
trackerStrategy can now only be accessed by one thread at a time.
...
Previously, synchronized were around individual methods which were working on shared variables and two such methods could be accessed by different threads simultaneously. A source of possible concurrency issues.
2011-01-05 12:18:02 -08:00