Matei Zaharia
340cc54e47
Merge pull request #471 from stephenh/parallelrdd
...
Move ParallelCollection into spark.rdd package.
2013-02-16 16:39:15 -08:00
Matei Zaharia
3260b6120e
Merge pull request #470 from stephenh/morek
...
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 16:38:38 -08:00
Stephen Haberman
e7713adb99
Move ParallelCollection into spark.rdd package.
2013-02-16 13:20:48 -06:00
Stephen Haberman
ae2234687d
Make CoGroupedRDDs explicitly have the same key type.
2013-02-16 13:10:31 -06:00
Stephen Haberman
4328873294
Add assertion about dependencies.
2013-02-16 01:16:40 -06:00
Stephen Haberman
c34b8ad2c5
Avoid a shuffle if combineByKey is passed the same partitioner.
2013-02-16 00:54:03 -06:00
Patrick Wendell
f0b68c623c
Initial cut at replacing K, V in Java files
2013-02-11 10:03:37 -08:00
Stephen Haberman
f2bc748013
Add RDD.coalesce.
2013-02-05 21:23:36 -06:00
Matei Zaharia
a4611d66f0
Merge pull request #449 from stephenh/longerdriversuite
...
Increase DriverSuite timeout.
2013-02-05 17:58:22 -08:00
Stephen Haberman
1ba3393ceb
Increase DriverSuite timeout.
2013-02-05 17:56:50 -06:00
Matei Zaharia
f6ec547ea7
Small fix to test for distinct
2013-02-04 13:14:54 -08:00
Matei Zaharia
aa4ee1e9e5
Fix failing test
2013-02-04 11:06:31 -08:00
Charles Reiss
6107957962
Merge remote-tracking branch 'base/master' into dag-sched-tests
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-02-02 00:33:30 -08:00
Charles Reiss
1fd5ee323d
Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.
2013-02-01 22:33:38 -08:00
Matei Zaharia
ae26911ec0
Add back test for distinct without parens
2013-02-01 21:07:24 -08:00
Matei Zaharia
8b3041c723
Reduced the memory usage of reduce and similar operations
...
These operations used to wait for all the results to be available in an
array on the driver program before merging them. They now merge values
incrementally as they arrive.
2013-02-01 15:38:42 -08:00
Charles Reiss
7f51458774
Comment at top of DAGSchedulerSuite
2013-01-30 09:34:53 -08:00
Charles Reiss
9c0bae75ad
Change DAGSchedulerSuite to run DAGScheduler in the same Thread.
2013-01-30 09:22:07 -08:00
Charles Reiss
4bf3d7ea12
Clear spark.master.port to cleanup for other tests
2013-01-29 19:05:58 -08:00
Charles Reiss
9eac7d01f0
Add DAGScheduler tests.
2013-01-29 18:55:43 -08:00
Matei Zaharia
9ae11603b4
Merge pull request #415 from stephenh/driver
...
Replace old 'master' term with 'driver'.
2013-01-29 10:41:42 -08:00
Matei Zaharia
64ba6a8c2c
Simplify checkpointing code and RDD class a little:
...
- RDD's getDependencies and getSplits methods are now guaranteed to be
called only once, so subclasses can safely do computation in there
without worrying about caching the results.
- The management of a "splits_" variable that is cleared out when we
checkpoint an RDD is now done in the RDD class.
- A few of the RDD subclasses are simpler.
- CheckpointRDD's compute() method no longer assumes that it is given a
CheckpointRDDSplit -- it can work just as well on a split from the
original RDD, because it only looks at its index. This is important
because things like UnionRDD and ZippedRDD remember the parent's
splits as part of their own and wouldn't work on checkpointed parents.
- RDD.iterator can now reuse cached data if an RDD is computed before it
is checkpointed. It seems like it wouldn't do this before (it always
called iterator() on the CheckpointRDD, which read from HDFS).
2013-01-28 22:30:12 -08:00
Stephen Haberman
13368818af
Merge branch 'master' into driver
...
Conflicts:
core/src/main/scala/spark/SparkContext.scala
core/src/main/scala/spark/SparkEnv.scala
core/src/main/scala/spark/deploy/LocalSparkCluster.scala
core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala
core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala
core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala
core/src/main/scala/spark/storage/BlockManagerMaster.scala
core/src/main/scala/spark/storage/ThreadingTest.scala
core/src/test/scala/spark/MapOutputTrackerSuite.scala
2013-01-28 23:30:24 -06:00
Imran Rashid
efff7bfb33
add long and float accumulatorparams
2013-01-28 20:23:11 -08:00
Matei Zaharia
44b4a0f88f
Track workers by executor ID instead of hostname to allow multiple
...
executors per machine and remove the need for multiple IP addresses in
unit tests.
2013-01-27 19:23:49 -08:00
Matei Zaharia
49f6472c0f
Merge pull request #418 from woggling/reregister-deadlock
...
Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously
2013-01-26 18:59:02 -08:00
Charles Reiss
ad4232b4da
Fix deadlock in BlockManager reregistration triggered by failed updates.
2013-01-26 18:30:38 -08:00
Josh Rosen
d49cf0e587
Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668).
...
This workaround is easier than rewriting JavaRDDLike in Java.
2013-01-26 16:13:18 -08:00
Stephen Haberman
7dfb82a992
Replace old 'master' term with 'driver'.
2013-01-25 11:03:00 -06:00
Stephen Haberman
ec43a51b38
Merge branch 'master' into localsparkcontext
...
Conflicts:
core/src/test/scala/spark/FileServerSuite.scala
core/src/test/scala/spark/RDDSuite.scala
2013-01-24 21:17:30 -06:00
Stephen Haberman
230bda2047
Add LocalSparkContext to manage common sc variable.
2013-01-24 11:01:01 -06:00
Matei Zaharia
fe5e4812fc
Merge pull request #409 from rxin/splitpruningrdd
...
Added pruntSplits method to RDD.
2013-01-23 22:23:22 -08:00
Reynold Xin
eedc542a02
Removed pruneSplits method in RDD and renamed SplitsPruningRDD to
...
PartitionPruningRDD.
2013-01-23 22:14:23 -08:00
Reynold Xin
45cd50d5fe
Updated assert == to ===.
2013-01-23 16:06:58 -08:00
Matei Zaharia
548856a224
Merge remote-tracking branch 'woggling/remove-machines'
...
Conflicts:
core/src/main/scala/spark/scheduler/DAGScheduler.scala
2013-01-23 15:44:17 -08:00
Reynold Xin
c24b3819dd
Added an extra assert for split size check.
2013-01-23 15:34:59 -08:00
Reynold Xin
eb222b7206
Added pruntSplits method to RDD.
2013-01-23 15:29:02 -08:00
Charles Reiss
5c7422292e
Remove more dead code from test.
2013-01-23 12:59:51 -08:00
Charles Reiss
88b9d240fd
Remove dead code in test.
2013-01-23 12:40:38 -08:00
Matei Zaharia
1a3aeeca23
Merge pull request #407 from woggling/no-cache-tracker
...
Eliminate CacheTracker
2013-01-23 12:28:48 -08:00
Matei Zaharia
4147e1d47b
Merge pull request #406 from tdas/master
...
Changed StorageLevel and BlockManagerId API to prevent duplication in memory
2013-01-23 12:18:31 -08:00
Matei Zaharia
4d77d554e1
Merge pull request #394 from JoshRosen/add_file_fix
...
Add SparkFiles.get() API to access files added through addFile().
2013-01-23 12:16:30 -08:00
Charles Reiss
0b506dd2ec
Add tests of various node failure scenarios.
2013-01-23 01:38:15 -08:00
Tathagata Das
5e11f1e51f
Modified StorageLevel API to ensure zero duplicate objects.
2013-01-22 23:42:53 -08:00
Tathagata Das
bacade6caf
Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.
2013-01-22 22:55:26 -08:00
Josh Rosen
43e9ff9596
Add test for driver hanging on exit (SPARK-530).
2013-01-22 22:47:26 -08:00
Charles Reiss
2849931000
Eliminate CacheTracker.
...
Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster
queries.
Adds CacheManager to locally coordinate computation of cached RDDs.
2013-01-22 22:19:30 -08:00
Josh Rosen
ef711902c1
Don't download files to master's working directory.
...
This should avoid exceptions caused by existing
files with different contents.
I also removed some unused code.
2013-01-21 17:34:17 -08:00
Stephen Haberman
ffd1623595
Minor cleanup.
2013-01-21 15:55:46 -06:00
Tathagata Das
4f8fe58b25
Merge branch 'mesos-streaming' into streaming
...
Conflicts:
core/src/main/scala/spark/api/java/JavaRDDLike.scala
core/src/main/scala/spark/api/java/JavaSparkContext.scala
core/src/test/scala/spark/JavaAPISuite.java
2013-01-20 01:13:56 -08:00