Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Stephen Haberman
13368818af
Merge branch 'master' into driver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/ThreadingTest.scala
core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Imran Rashid
efff7bfb33
add long and float accumulatorparams
2013-01-28 20:23:11 -08:00
Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia
49f6472c0f
Merge pull request #418 from woggling/reregister-deadlock
...
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss
ad4232b4da
Fix deadlock in BlockManager reregistration triggered by failed updates.
2013-01-26 18:30:38 -08:00
Josh Rosen
d49cf0e587
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
...
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Stephen Haberman
ec43a51b38
Merge branch 'master' into localsparkcontext
...
Conflicts:
core/src/test/scala/spark/FileServerSuite.scala
core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Stephen Haberman
230bda2047
Add LocalSparkContext to manage common sc variable.
2013-01-24 11:01:01 -06:00
Matei Zaharia
fe5e4812fc
Merge pull request #409 from rxin/splitpruningrdd
...
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin
eedc542a02
Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
...
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin
45cd50d5fe
Updated assert == to ===.
2013-01-23 16:06:58 -08:00
Matei Zaharia
548856a224
Merge remote-tracking branch 'woggling/remove-machines'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin
c24b3819dd
Added an extra assert for split size check.
2013-01-23 15:34:59 -08:00
Reynold Xin
eb222b7206
Added pruntSplits method to RDD.
2013-01-23 15:29:02 -08:00
Charles Reiss
5c7422292e
Remove more dead code from test.
2013-01-23 12:59:51 -08:00
Charles Reiss
88b9d240fd
Remove dead code in test.
2013-01-23 12:40:38 -08:00
Matei Zaharia
1a3aeeca23
Merge pull request #407 from woggling/no-cache-tracker
...
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Matei Zaharia
4147e1d47b
Merge pull request #406 from tdas/master
...
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia
4d77d554e1
Merge pull request #394 from JoshRosen/add_file_fix
...
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Charles Reiss
0b506dd2ec
Add tests of various node failure scenarios.
2013-01-23 01:38:15 -08:00
Tathagata Das
5e11f1e51f
Modified StorageLevel API to ensure zero duplicate objects.
2013-01-22 23:42:53 -08:00
Tathagata Das
bacade6caf
Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.
2013-01-22 22:55:26 -08:00
Josh Rosen
43e9ff9596
Add test for driver hanging on exit (SPARK-530).
2013-01-22 22:47:26 -08:00
Charles Reiss
2849931000
Eliminate CacheTracker.
...
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.
Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Josh Rosen
ef711902c1
Don't download files to master's working directory.
...
This should avoid exceptions caused by existing
files with different contents.
I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman
ffd1623595
Minor cleanup.
2013-01-21 15:55:46 -06:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00
Patrick Wendell
d5570c7968
Adding checkpointing to Java API
2013-01-17 18:41:58 -08:00
Tathagata Das
f466ee44bc
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/MapOutputTracker.scala
2013-01-16 12:57:11 -08:00
Matei Zaharia
4beb084f64
Merge pull request #374 from woggling/null-mapout
...
Generate FetchFailedException even for cached missing map outputs
2013-01-15 14:22:29 -08:00
Tathagata Das
cd1521cfdb
Merge branch 'master' into streaming
...
Conflicts:
core/src/main/scala/spark/rdd/CoGroupedRDD.scala
core/src/main/scala/spark/rdd/FilteredRDD.scala
docs/_layouts/global.html
docs/index.md
run
2013-01-15 12:08:51 -08:00
Charles Reiss
4078623b9f
Remove broken attempt to test fetching case.
2013-01-15 12:05:54 -08:00
Stephen Haberman
d228bff440
Add a test.
2013-01-15 11:48:50 -06:00
Charles Reiss
b038999797
Fix accidental spark.master.host reuse
2013-01-14 17:04:44 -08:00
Charles Reiss
7ba34bc007
Additional tests for MapOutputTracker.
2013-01-14 15:27:02 -08:00
Matei Zaharia
72408e8dfa
Make filter preserve partitioner info, since it can
2013-01-13 19:34:07 -08:00
Ryan LeCompte
ea20ae6618
add one extra test
2013-01-12 09:18:00 -08:00
Ryan LeCompte
2c77eeebb6
correct test params
2013-01-12 00:13:45 -08:00
Ryan LeCompte
0cfea7a2ec
add unit test
2013-01-11 23:48:07 -08:00
Stephen Haberman
8ac0f35be4
Add JavaRDDLike.keyBy.
2013-01-08 09:57:45 -06:00
Stephen Haberman
4ee6b22775
Merge branch 'master' into tupleBy
...
Conflicts:
core/src/test/scala/spark/RDDSuite.scala
2013-01-08 09:10:10 -06:00
Matei Zaharia
f7cf035b9b
Merge pull request #350 from tdas/streaming
...
Spark Streaming
2013-01-07 17:40:11 -08:00
Shivaram Venkataraman
b1336e2fe4
Update expected size of strings to match our dummy string class
2013-01-07 17:00:32 -08:00
Tathagata Das
4719e6d8fe
Changed locations for unit test logs.
2013-01-07 16:06:07 -08:00
Shivaram Venkataraman
55c66d365f
Use a dummy string class in Size Estimator tests to make it resistant to jdk
...
versions
2013-01-07 15:58:00 -08:00
Shivaram Venkataraman
77d751731c
Remove unused BoundedMemoryCache file and associated test case.
2013-01-07 15:57:46 -08:00
Matei Zaharia
1941d9602d
Merge branch 'master' of github.com:mesos/spark
2013-01-07 16:50:39 -05:00