ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	98df9d2853	Added removeRdd function in BlockManager.	2013-05-01 20:17:09 -07:00
Reynold Xin	3227ec8edd	Cleaned up Ram's code. Moved SparkContext.remove to RDD.unpersist. Also updated unit tests to make sure they are properly testing for concurrency.	2013-05-01 16:07:44 -07:00
harshars	8481562731	Merged Ram's commit on removing RDDs. Conflicts: core/src/main/scala/spark/SparkContext.scala	2013-05-01 14:42:17 -07:00
Mridul Muralidharan	d960e7e0f8	a) Add support for hyper local scheduling - specific to a host + port - before trying host local scheduling. b) Add some fixes to test code to ensure it passes (and fixes some other issues). c) Fix bug in task scheduling which incorrectly used availableCores instead of all cores on the node.	2013-05-01 20:24:00 +05:30
Prashant Sharma	e8a9d1cdf9	Fixed Warning: expect -> expectResult	2013-05-01 11:35:02 +05:30
Matei Zaharia	f708dda81e	Merge pull request #585 from pwendell/listener-perf [Fix SPARK-742] Task Metrics should not employ per-record timing by default	2013-04-30 07:51:40 -07:00
Patrick Wendell	540be6b154	Modified version of the fix which just removes all per-record tracking.	2013-04-29 11:32:07 -07:00
Patrick Wendell	224fbac061	Spark-742: TaskMetrics should not employ per-record timing. This patch does three things: 1. Makes TimedIterator a trait with two implementations (one a no-op) 2. Makes the default behavior to use the no-op implementation 3. Removes DelegateBlockFetchTracker. This is just cleanup, but it seems like the triat doesn't really reduce complexity in any way. In the future we can add other implementations, e.g. ones which perform sampling.	2013-04-29 11:13:43 -07:00
Prashant Sharma	8f3ac240cb	Fixed Warning: ClassManifest -> ClassTag	2013-04-29 16:39:13 +05:30
Matei Zaharia	0f45347c7b	More unit test fixes	2013-04-28 22:29:27 -07:00
Matei Zaharia	bce4089f22	Fix BlockManagerSuite to deal with clearing spark.hostPort	2013-04-28 22:23:48 -07:00
Matei Zaharia	68c07ea198	Merge pull request #582 from shivaram/master Add zip partitions interface	2013-04-28 20:19:33 -07:00
Shivaram Venkataraman	15acd49f07	Actually rename classes to ZippedPartitions* (the previous commit only renamed the file)	2013-04-28 16:03:22 -07:00
Shivaram Venkataraman	6e84635ab9	Rename classes from MapZipped* to Zipped*	2013-04-28 15:58:40 -07:00
Mridul Muralidharan	afee902443	Attempt to fix streaming test failures after yarn branch merge	2013-04-28 22:26:45 +05:30
Shivaram Venkataraman	0cc6642b7c	Rename to zipPartitions and style changes	2013-04-28 05:11:03 -07:00
Shivaram Venkataraman	c9c4954d99	Add an interface to zip iterators of multiple RDDs The current code supports 2, 3 or 4 arguments but can be extended to more arguments if required.	2013-04-26 16:57:46 -07:00
Prashant Sharma	ad88f083a6	scala 2.10 and master merge	2013-04-24 18:08:26 +05:30
Prashant Sharma	185bb9525a	Manually merged scala-2.10 and master	2013-04-22 14:14:03 +05:30
Andrew xia	e0603d7e8b	refactor the Schedulable interface and add unit test for SchedulingAlgorithm	2013-04-18 13:13:54 +08:00
Mridul Muralidharan	19652a44be	Fix issue with FileSuite failing	2013-04-15 19:16:36 +05:30
Mridul Muralidharan	d90d2af103	Checkpoint commit - compiles and passes a lot of tests - not all though, looking into FileSuite issues	2013-04-15 18:12:11 +05:30
Stephen Haberman	dd854d5b9f	Use Boolean in the Java API, and != for assert.	2013-03-23 11:49:45 -05:00
Stephen Haberman	4ca273edc4	Merge branch 'master' into shufflecoalesce Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-03-23 11:45:45 -05:00
Matei Zaharia	fd53f2fc7b	Merge pull request #510 from markhamstra/WithThing mapWith, flatMapWith and filterWith	2013-03-23 07:13:21 -07:00
Stephen Haberman	1c67c7dfd1	Add a shuffle parameter to coalesce. This is useful for when you want just 1 output file (part-00000) but still up the upstream RDD to be computed in parallel.	2013-03-22 08:54:44 -05:00
Matei Zaharia	35588490cb	Merge pull request #538 from rxin/cogroup Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 19:27:47 -07:00
Reynold Xin	00a11304fd	Added mapSideCombine flag to CoGroupedRDD. Added unit test for CoGroupedRDD.	2013-03-20 13:49:51 +08:00
Mark Hamstra	1fb192ef40	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-16 12:17:13 -07:00
Mark Hamstra	80fc8c82ed	_With[Matei]	2013-03-16 12:16:29 -07:00
Mark Hamstra	38454c4aed	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-16 11:54:44 -07:00
Matei Zaharia	c1e9cdc49f	Merge pull request #525 from stephenh/subtractByKey Add PairRDDFunctions.subtractByKey.	2013-03-16 11:47:45 -07:00
Mark Hamstra	ef75be3bf7	Merge branch 'master' of https://github.com/mesos/spark into foldByKey	2013-03-15 21:41:24 -07:00
Matei Zaharia	cdbfd1e196	Merge pull request #516 from squito/fix_local_metrics Fix local metrics	2013-03-15 15:13:28 -07:00
Mark Hamstra	1a4070477d	whitespace cleanup	2013-03-15 11:28:28 -07:00
Mark Hamstra	16a4ca4537	restrict V type of foldByKey in order to retain ClassManifest; added foldByKey to Java API and test	2013-03-14 13:58:37 -07:00
Stephen Haberman	7d8bb4df3a	Allow subtractByKey's other argument to have a different value type.	2013-03-14 14:44:15 -05:00
Stephen Haberman	4632c45af1	Finished subtractByKeys.	2013-03-14 10:35:34 -05:00
Stephen Haberman	e7f1a69c6b	Add a test for NextIterator.	2013-03-13 10:46:33 -05:00
Mark Hamstra	562893bea3	deleted excess curly braces	2013-03-10 22:43:08 -07:00
Imran Rashid	8a11ac3dc7	increase sleep time	2013-03-10 22:31:44 -07:00
Imran Rashid	9f97f2f9d8	add a small wait to one task to make sure some task runtime really is non-zero	2013-03-10 22:30:18 -07:00
Mark Hamstra	1289e7176b	refactored _With API and added foreachPartition	2013-03-10 22:27:13 -07:00
Mark Hamstra	b57df1f5e3	Merge branch 'master' of https://github.com/mesos/spark into WithThing	2013-03-10 16:56:31 -07:00
Matei Zaharia	2e1bbc4e7e	Merge remote-tracking branch 'woggling/dag-sched-driver-port' Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 16:52:54 -07:00
Matei Zaharia	91a9d093bd	Merge pull request #512 from patelh/fix-kryo-serializer Fix reference bug in Kryo serializer, add test, update version	2013-03-10 15:48:23 -07:00
Matei Zaharia	a59cc6060f	Merge remote-tracking branch 'stephenh/nomocks' Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-03-10 13:39:10 -07:00
Imran Rashid	20f01a0a1b	enable task metrics in local mode, add tests	2013-03-09 21:17:31 -08:00
Charles Reiss	d0216cb38b	Prevent DAGSchedulerSuite from corrupting driver.port. Use the LocalSparkContext abstraction to properly manage clearing spark.driver.port.	2013-03-09 10:49:02 -08:00
Hiral Patel	664e5fd24b	Fix reference bug in Kryo serializer, add test, update version	2013-03-07 22:16:11 -08:00
Mark Hamstra	5ff0810b11	refactor mapWith, flatMapWith and filterWith to each use two parameter lists	2013-03-05 12:25:44 -08:00
Mark Hamstra	d046d8ad32	whitespace formatting	2013-03-05 00:48:13 -08:00
Mark Hamstra	9148b968cf	mapWith, flatMapWith and filterWith	2013-03-04 15:48:47 -08:00
Matei Zaharia	04fb81ffe5	Merge pull request #506 from rxin/spark-706 Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-03-03 17:20:07 -08:00
Imran Rashid	d36abdb053	Merge branch 'master' into stageInfo	2013-03-03 15:20:46 -08:00
Reynold Xin	44134e12bb	Fixed SPARK-706: Failures in block manager put leads to read task hanging.	2013-02-28 15:14:59 -08:00
Stephen Haberman	db957e5bd7	Fix MapOutputTrackerSuite.	2013-02-26 01:38:50 -06:00
Stephen Haberman	a65aa549ff	Override DAGScheduler.runLocally so we can remove the Thread.sleep.	2013-02-25 23:49:32 -06:00
Stephen Haberman	a4adeb255c	Merge branch 'master' into nomocks Conflicts: core/src/test/scala/spark/scheduler/DAGSchedulerSuite.scala	2013-02-25 23:48:52 -06:00
Tathagata Das	c02e064938	Fixed replication bug in BlockManager	2013-02-25 17:27:46 -08:00
Matei Zaharia	d6e6abece3	Merge pull request #459 from stephenh/bettersplits Change defaultPartitioner to use upstream split size.	2013-02-25 09:22:04 -08:00
Stephen Haberman	c44ccf2862	Use default parallelism if its set.	2013-02-24 23:54:03 -06:00
Stephen Haberman	44032bc476	Merge branch 'master' into bettersplits Conflicts: core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/test/scala/spark/ShuffleSuite.scala	2013-02-24 22:08:14 -06:00
Tathagata Das	dff53d1b94	Merge branch 'mesos-master' into streaming	2013-02-24 12:17:22 -08:00
Stephen Haberman	f442e7d83c	Update for split->partition rename.	2013-02-24 00:27:14 -06:00
Stephen Haberman	cec87a0653	Merge branch 'master' into subtract	2013-02-23 23:27:55 -06:00
Charles Reiss	50cf8c8b79	Add fault tolerance test that uses replicated RDDs.	2013-02-22 16:11:53 -08:00
Imran Rashid	ff127cfcd3	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-21 15:16:21 -08:00
Imran Rashid	69f9a7035f	fully revert change to addOnCompleteCallback -- missed this in `e9f53ec`	2013-02-21 15:07:46 -08:00
Tathagata Das	334ab92441	Fixed bug in CheckpointSuite	2013-02-20 10:26:36 -08:00
Tathagata Das	fb9956256d	Merge branch 'mesos-master' into streaming Conflicts: core/src/main/scala/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/spark/streaming/dstream/ReducedWindowedDStream.scala	2013-02-20 09:01:29 -08:00
Matei Zaharia	06e5e6627f	Renamed "splits" to "partitions"	2013-02-17 22:13:26 -08:00
Matei Zaharia	340cc54e47	Merge pull request #471 from stephenh/parallelrdd Move ParallelCollection into spark.rdd package.	2013-02-16 16:39:15 -08:00
Matei Zaharia	3260b6120e	Merge pull request #470 from stephenh/morek Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 16:38:38 -08:00
Stephen Haberman	924f47dd11	Add RDD.subtract. Instead of reusing the cogroup primitive, this adds a SubtractedRDD that knows it only needs to keep rdd1's values (per split) in memory.	2013-02-16 13:38:42 -06:00
Stephen Haberman	e7713adb99	Move ParallelCollection into spark.rdd package.	2013-02-16 13:20:48 -06:00
Stephen Haberman	ae2234687d	Make CoGroupedRDDs explicitly have the same key type.	2013-02-16 13:10:31 -06:00
Stephen Haberman	4328873294	Add assertion about dependencies.	2013-02-16 01:16:40 -06:00
Stephen Haberman	c34b8ad2c5	Avoid a shuffle if combineByKey is passed the same partitioner.	2013-02-16 00:54:03 -06:00
Stephen Haberman	6a2d957843	Tweak test names.	2013-02-16 00:33:49 -06:00
Imran Rashid	bffee929ab	Merge branch 'master' into stageInfo Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/storage/BlockManager.scala	2013-02-15 10:35:04 -08:00
Patrick Wendell	f0b68c623c	Initial cut at replacing K, V in Java files	2013-02-11 10:03:37 -08:00
Tathagata Das	16baea62bc	Fixed bug in CheckpointRDD to prevent exception when the original RDD had zero splits.	2013-02-10 19:14:49 -08:00
Imran Rashid	b7d9e24394	use TaskMetrics to gather all stats; lots of plumbing to get it all the way back to driver	2013-02-10 14:18:52 -08:00
Stephen Haberman	680f42e6cd	Change defaultPartitioner to use upstream split size. Previously it used the SparkContext.defaultParallelism, which occassionally ended up being a very bad guess. Looking at upstream RDDs seems to make better use of the context. Also sorted the upstream RDDs by partition size first, as if we have a hugely-partitioned RDD and tiny-partitioned RDD, it is unlikely we want the resulting RDD to be tiny-partitioned.	2013-02-10 02:27:03 -06:00
Stephen Haberman	921be76533	Use stubs instead of mocks for DAGSchedulerSuite.	2013-02-09 16:42:18 -06:00
Imran Rashid	04e828f7c1	general fixes to Distribution, plus some tests	2013-02-08 19:07:36 -08:00
Stephen Haberman	f2bc748013	Add RDD.coalesce.	2013-02-05 21:23:36 -06:00
Imran Rashid	379564c7e0	setup plumbing to get task metrics; lots of unfinished parts, but basic flow in place	2013-02-05 18:30:21 -08:00
Matei Zaharia	a4611d66f0	Merge pull request #449 from stephenh/longerdriversuite Increase DriverSuite timeout.	2013-02-05 17:58:22 -08:00
Stephen Haberman	1ba3393ceb	Increase DriverSuite timeout.	2013-02-05 17:56:50 -06:00
Imran Rashid	295b534398	task context keeps a handle on Task -- giant hack, temporary for tracking shuffle times & amount	2013-02-05 10:18:16 -08:00
Matei Zaharia	f6ec547ea7	Small fix to test for distinct	2013-02-04 13:14:54 -08:00
Matei Zaharia	aa4ee1e9e5	Fix failing test	2013-02-04 11:06:31 -08:00
Charles Reiss	6107957962	Merge remote-tracking branch 'base/master' into dag-sched-tests Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-02-02 00:33:30 -08:00
Charles Reiss	1fd5ee323d	Code review changes: add sc.stop; style of multiline comments; parens on procedure calls.	2013-02-01 22:33:38 -08:00
Matei Zaharia	ae26911ec0	Add back test for distinct without parens	2013-02-01 21:07:24 -08:00
Matei Zaharia	8b3041c723	Reduced the memory usage of reduce and similar operations These operations used to wait for all the results to be available in an array on the driver program before merging them. They now merge values incrementally as they arrive.	2013-02-01 15:38:42 -08:00
Charles Reiss	7f51458774	Comment at top of DAGSchedulerSuite	2013-01-30 09:34:53 -08:00
Charles Reiss	9c0bae75ad	Change DAGSchedulerSuite to run DAGScheduler in the same Thread.	2013-01-30 09:22:07 -08:00
Charles Reiss	4bf3d7ea12	Clear spark.master.port to cleanup for other tests	2013-01-29 19:05:58 -08:00
Charles Reiss	9eac7d01f0	Add DAGScheduler tests.	2013-01-29 18:55:43 -08:00
Matei Zaharia	9ae11603b4	Merge pull request #415 from stephenh/driver Replace old 'master' term with 'driver'.	2013-01-29 10:41:42 -08:00
Matei Zaharia	64ba6a8c2c	Simplify checkpointing code and RDD class a little: - RDD's getDependencies and getSplits methods are now guaranteed to be called only once, so subclasses can safely do computation in there without worrying about caching the results. - The management of a "splits_" variable that is cleared out when we checkpoint an RDD is now done in the RDD class. - A few of the RDD subclasses are simpler. - CheckpointRDD's compute() method no longer assumes that it is given a CheckpointRDDSplit -- it can work just as well on a split from the original RDD, because it only looks at its index. This is important because things like UnionRDD and ZippedRDD remember the parent's splits as part of their own and wouldn't work on checkpointed parents. - RDD.iterator can now reuse cached data if an RDD is computed before it is checkpointed. It seems like it wouldn't do this before (it always called iterator() on the CheckpointRDD, which read from HDFS).	2013-01-28 22:30:12 -08:00
Stephen Haberman	13368818af	Merge branch 'master' into driver Conflicts: core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/SparkEnv.scala core/src/main/scala/spark/deploy/LocalSparkCluster.scala core/src/main/scala/spark/executor/StandaloneExecutorBackend.scala core/src/main/scala/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/spark/scheduler/cluster/StandaloneClusterMessage.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/ThreadingTest.scala core/src/test/scala/spark/MapOutputTrackerSuite.scala	2013-01-28 23:30:24 -06:00
Imran Rashid	efff7bfb33	add long and float accumulatorparams	2013-01-28 20:23:11 -08:00
Matei Zaharia	44b4a0f88f	Track workers by executor ID instead of hostname to allow multiple executors per machine and remove the need for multiple IP addresses in unit tests.	2013-01-27 19:23:49 -08:00
Matei Zaharia	49f6472c0f	Merge pull request #418 from woggling/reregister-deadlock Fix BlockManager reregistration deadlock; do BlockManager reregistration more asynchronously	2013-01-26 18:59:02 -08:00
Charles Reiss	ad4232b4da	Fix deadlock in BlockManager reregistration triggered by failed updates.	2013-01-26 18:30:38 -08:00
Josh Rosen	d49cf0e587	Fix JavaRDDLike.flatMap(PairFlatMapFunction) (SPARK-668). This workaround is easier than rewriting JavaRDDLike in Java.	2013-01-26 16:13:18 -08:00
Stephen Haberman	7dfb82a992	Replace old 'master' term with 'driver'.	2013-01-25 11:03:00 -06:00
Stephen Haberman	ec43a51b38	Merge branch 'master' into localsparkcontext Conflicts: core/src/test/scala/spark/FileServerSuite.scala core/src/test/scala/spark/RDDSuite.scala	2013-01-24 21:17:30 -06:00
Stephen Haberman	230bda2047	Add LocalSparkContext to manage common sc variable.	2013-01-24 11:01:01 -06:00
Matei Zaharia	fe5e4812fc	Merge pull request #409 from rxin/splitpruningrdd Added pruntSplits method to RDD.	2013-01-23 22:23:22 -08:00
Reynold Xin	eedc542a02	Removed pruneSplits method in RDD and renamed SplitsPruningRDD to PartitionPruningRDD.	2013-01-23 22:14:23 -08:00
Reynold Xin	45cd50d5fe	Updated assert == to ===.	2013-01-23 16:06:58 -08:00
Matei Zaharia	548856a224	Merge remote-tracking branch 'woggling/remove-machines' Conflicts: core/src/main/scala/spark/scheduler/DAGScheduler.scala	2013-01-23 15:44:17 -08:00
Reynold Xin	c24b3819dd	Added an extra assert for split size check.	2013-01-23 15:34:59 -08:00
Reynold Xin	eb222b7206	Added pruntSplits method to RDD.	2013-01-23 15:29:02 -08:00
Charles Reiss	5c7422292e	Remove more dead code from test.	2013-01-23 12:59:51 -08:00
Charles Reiss	88b9d240fd	Remove dead code in test.	2013-01-23 12:40:38 -08:00
Matei Zaharia	1a3aeeca23	Merge pull request #407 from woggling/no-cache-tracker Eliminate CacheTracker	2013-01-23 12:28:48 -08:00
Matei Zaharia	4147e1d47b	Merge pull request #406 from tdas/master Changed StorageLevel and BlockManagerId API to prevent duplication in memory	2013-01-23 12:18:31 -08:00
Matei Zaharia	4d77d554e1	Merge pull request #394 from JoshRosen/add_file_fix Add SparkFiles.get() API to access files added through addFile().	2013-01-23 12:16:30 -08:00
Charles Reiss	0b506dd2ec	Add tests of various node failure scenarios.	2013-01-23 01:38:15 -08:00
Tathagata Das	5e11f1e51f	Modified StorageLevel API to ensure zero duplicate objects.	2013-01-22 23:42:53 -08:00
Tathagata Das	bacade6caf	Modified BlockManagerId API to ensure zero duplicate objects. Fixed BlockManagerId testcase in BlockManagerTestSuite.	2013-01-22 22:55:26 -08:00
Josh Rosen	43e9ff9596	Add test for driver hanging on exit (SPARK-530).	2013-01-22 22:47:26 -08:00
Charles Reiss	2849931000	Eliminate CacheTracker. Replaces DAGScheduler's queries of CacheTracker with BlockManagerMaster queries. Adds CacheManager to locally coordinate computation of cached RDDs.	2013-01-22 22:19:30 -08:00
Josh Rosen	ef711902c1	Don't download files to master's working directory. This should avoid exceptions caused by existing files with different contents. I also removed some unused code.	2013-01-21 17:34:17 -08:00
Stephen Haberman	ffd1623595	Minor cleanup.	2013-01-21 15:55:46 -06:00
folone	fd6e51deec	Fixed the failing test.	2013-01-20 17:02:58 +01:00
folone	ad8aff6ca4	Merge remote-tracking branch 'upstream/master'	2013-01-20 14:43:20 +01:00
Tathagata Das	4f8fe58b25	Merge branch 'mesos-streaming' into streaming Conflicts: core/src/main/scala/spark/api/java/JavaRDDLike.scala core/src/main/scala/spark/api/java/JavaSparkContext.scala core/src/test/scala/spark/JavaAPISuite.java	2013-01-20 01:13:56 -08:00
Patrick Wendell	d5570c7968	Adding checkpointing to Java API	2013-01-17 18:41:58 -08:00
Tathagata Das	f466ee44bc	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/MapOutputTracker.scala	2013-01-16 12:57:11 -08:00
Matei Zaharia	4beb084f64	Merge pull request #374 from woggling/null-mapout Generate FetchFailedException even for cached missing map outputs	2013-01-15 14:22:29 -08:00
Tathagata Das	cd1521cfdb	Merge branch 'master' into streaming Conflicts: core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala docs/_layouts/global.html docs/index.md run	2013-01-15 12:08:51 -08:00
Charles Reiss	4078623b9f	Remove broken attempt to test fetching case.	2013-01-15 12:05:54 -08:00
Stephen Haberman	d228bff440	Add a test.	2013-01-15 11:48:50 -06:00
Charles Reiss	b038999797	Fix accidental spark.master.host reuse	2013-01-14 17:04:44 -08:00
Charles Reiss	7ba34bc007	Additional tests for MapOutputTracker.	2013-01-14 15:27:02 -08:00
folone	25c0739bad	Moved to scala 2.10.0. Notable changes are: - akka 2.0.3 → 2.1.0 - spray 1.0-M1 → 1.1-M7 For now the repl subproject is commented out, as scala reflection api changed very much since the introduction of macros.	2013-01-14 09:52:11 +01:00
Matei Zaharia	72408e8dfa	Make filter preserve partitioner info, since it can	2013-01-13 19:34:07 -08:00
Ryan LeCompte	ea20ae6618	add one extra test	2013-01-12 09:18:00 -08:00
Ryan LeCompte	2c77eeebb6	correct test params	2013-01-12 00:13:45 -08:00
Ryan LeCompte	0cfea7a2ec	add unit test	2013-01-11 23:48:07 -08:00
Stephen Haberman	8ac0f35be4	Add JavaRDDLike.keyBy.	2013-01-08 09:57:45 -06:00
Stephen Haberman	4ee6b22775	Merge branch 'master' into tupleBy Conflicts: core/src/test/scala/spark/RDDSuite.scala	2013-01-08 09:10:10 -06:00
Matei Zaharia	f7cf035b9b	Merge pull request #350 from tdas/streaming Spark Streaming	2013-01-07 17:40:11 -08:00
Shivaram Venkataraman	b1336e2fe4	Update expected size of strings to match our dummy string class	2013-01-07 17:00:32 -08:00
Tathagata Das	4719e6d8fe	Changed locations for unit test logs.	2013-01-07 16:06:07 -08:00
Shivaram Venkataraman	55c66d365f	Use a dummy string class in Size Estimator tests to make it resistant to jdk versions	2013-01-07 15:58:00 -08:00
Shivaram Venkataraman	77d751731c	Remove unused BoundedMemoryCache file and associated test case.	2013-01-07 15:57:46 -08:00
Matei Zaharia	1941d9602d	Merge branch 'master' of github.com:mesos/spark	2013-01-07 16:50:39 -05:00
Matei Zaharia	9c32f300fb	Add Accumulable.setValue for easier use in Java	2013-01-07 16:50:23 -05:00
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Matei Zaharia	b1663752c6	Merge pull request #351 from stephenh/values Add PairRDDFunctions.keys and values.	2013-01-05 19:15:54 -08:00
Matei Zaharia	0982572519	Add methods called just 'accumulator' for int/double in Java API	2013-01-05 22:11:28 -05:00
Matei Zaharia	86af64b0a6	Fix Accumulators in Java, and add a test for them	2013-01-05 20:55:17 -05:00
Matei Zaharia	ecf9c08901	Fix Accumulators in Java, and add a test for them	2013-01-05 20:54:08 -05:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Stephen Haberman	6a0db3b449	Fix typo.	2013-01-05 12:56:17 -06:00
Stephen Haberman	f4e6b9361f	Add RDD.collect(PartialFunction).	2013-01-05 12:14:08 -06:00
Stephen Haberman	8d57c78c83	Add PairRDDFunctions.keys and values.	2013-01-05 12:04:01 -06:00
Tathagata Das	3dc87dd923	Fixed compilation bug in RDDSuite created during merge for mesos/master.	2013-01-01 16:38:04 -08:00
Tathagata Das	d34dba25c2	Merge branch 'mesos' into dev-merge	2013-01-01 15:48:39 -08:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Charles Reiss	21636ee4fa	Test with exception while computing cached RDD.	2013-01-01 08:07:40 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Tathagata Das	0bc0a60d30	Modifications to make sure LocalScheduler terminate cleanly without errors when SparkContext is shutdown, to minimize spurious exception during master failure tests.	2012-12-27 15:37:33 -08:00
Tathagata Das	7c33f76291	Merge branch 'mesos' into dev-merge	2012-12-26 19:19:07 -08:00
Tathagata Das	836042bb9f	Merge branch 'dev-checkpoint' of github.com:radlab/spark into dev-merge Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/scheduler/ResultTask.scala core/src/test/scala/spark/CheckpointSuite.scala	2012-12-26 19:09:01 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	eac566a7f4	Merge branch 'master' of github.com:mesos/spark into dev Conflicts: core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/PairRDDFunctions.scala core/src/main/scala/spark/ParallelCollection.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/CartesianRDD.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/CoalescedRDD.scala core/src/main/scala/spark/rdd/FilteredRDD.scala core/src/main/scala/spark/rdd/FlatMappedRDD.scala core/src/main/scala/spark/rdd/GlommedRDD.scala core/src/main/scala/spark/rdd/HadoopRDD.scala core/src/main/scala/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/spark/rdd/MapPartitionsWithSplitRDD.scala core/src/main/scala/spark/rdd/MappedRDD.scala core/src/main/scala/spark/rdd/PipedRDD.scala core/src/main/scala/spark/rdd/SampledRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala core/src/main/scala/spark/rdd/UnionRDD.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockManagerId.scala core/src/main/scala/spark/storage/BlockManagerMaster.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/MetadataCleaner.scala core/src/main/scala/spark/util/TimeStampedHashMap.scala core/src/test/scala/spark/storage/BlockManagerSuite.scala run	2012-12-20 14:53:40 -08:00
Tathagata Das	8512dd3225	Merge branch 'dev' of github.com:radlab/spark into dev-checkpoint Conflicts: core/src/main/scala/spark/ParallelCollection.scala core/src/test/scala/spark/CheckpointSuite.scala streaming/src/main/scala/spark/streaming/DStream.scala	2012-12-20 14:24:19 -08:00
Tathagata Das	fe777eb77d	Fixed bugs in CheckpointRDD and spark.CheckpointSuite.	2012-12-20 13:39:27 -08:00
Reynold Xin	9397c5014e	Let the slave notify the master block removal.	2012-12-20 01:37:09 -08:00
Tathagata Das	5184141936	Introduced getSpits, getDependencies, and getPreferredLocations in RDD and RDDCheckpointData.	2012-12-18 13:30:53 -08:00
Reynold Xin	8c01295b85	Fixed conflicts from merging Charles' and TD's block manager changes.	2012-12-14 00:26:36 -08:00
Reynold Xin	0235667f73	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 22:33:41 -08:00
Reynold Xin	97434f49b8	Merged TD's block manager refactoring.	2012-12-13 22:32:19 -08:00
Reynold Xin	f4a9e1b9be	Fixed the broken Java unit test from SPARK-635.	2012-12-13 22:22:12 -08:00
Reynold Xin	1b7a0451ed	Added the ability in block manager to remove blocks.	2012-12-13 00:04:42 -08:00
Tathagata Das	8e74fac215	Made checkpoint data in RDDs optional to further reduce serialized size.	2012-12-11 15:36:12 -08:00
Tathagata Das	fa28f25619	Fixed bug in UnionRDD and CoGroupedRDD	2012-12-11 13:59:43 -08:00
Tathagata Das	2a87d816a2	Added clear property to JavaAPISuite to remove port binding errors.	2012-12-11 01:44:43 -08:00
Tathagata Das	746afc2e65	Bunch of bug fixes related to checkpointing in RDDs. RDDCheckpointData object is used to lock all serialization and dependency changes for checkpointing. ResultTask converted to Externalizable and serialized RDD is cached like ShuffleMapTask.	2012-12-10 23:36:37 -08:00
Charles Reiss	5d3e917d09	Use Akka scheduler for BlockManager heart beats. Adds required ActorSystem argument to BlockManager constructors.	2012-12-10 00:31:50 -08:00
Tathagata Das	e427216018	Removed unnecessary testcases.	2012-12-08 12:46:59 -08:00
Tathagata Das	1f3a75ae9e	Modified checkpoint testsuite to more comprehensively test checkpointing of various RDDs. Fixed checkpoint bug (splits referring to parent RDDs or parent splits) in UnionRDD and CoalescedRDD. Fixed bug in testing ShuffledRDD. Removed unnecessary and useless map-side combining step for narrow dependencies in CoGroupedRDD. Removed unncessary WeakReference stuff from many other RDDs.	2012-12-07 13:45:52 -08:00
Charles Reiss	a2a94fdbc7	Tests for block manager heartbeats.	2012-12-05 23:36:05 -08:00
Tathagata Das	21a0852976	Refactored RDD checkpointing to minimize extra fields in RDD class.	2012-12-04 22:10:25 -08:00
Tathagata Das	e463ae4920	Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing.	2012-11-28 14:05:01 -08:00
Matei Zaharia	3ebd8e1885	Added zip to Java API	2012-11-27 22:38:09 -08:00
Matei Zaharia	27e43abd19	Added a zip() operation for RDDs with the same shape (number of partitions and number of elements in each partition)	2012-11-27 22:27:47 -08:00
Matei Zaharia	935c468b71	Merge pull request #311 from woggling/map-output-npe Fix NullPointerException when map output unregistered from MapOutputTracker twice	2012-11-27 20:50:48 -08:00
Reynold Xin	f24bfd2dd1	For size compression, compress non zero values into non zero values.	2012-11-27 19:20:45 -08:00
Charles Reiss	5fa868b98b	Tests for MapOutputTracker.	2012-11-27 16:05:36 -08:00
Tathagata Das	10c1abcb6a	Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.	2012-11-17 17:27:00 -08:00
Tathagata Das	04e9e9d93c	Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.	2012-11-11 08:54:21 -08:00
Tathagata Das	34e569f40e	Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.	2012-10-31 00:56:40 -07:00
Tathagata Das	0dcd770fdc	Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.	2012-10-30 16:09:37 -07:00
Matei Zaharia	0bd20c63e2	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala	2012-10-23 22:01:45 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Josh Rosen	33cd3a0c12	Remove map-side combining from ShuffleMapTask. This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	10bcd217d2	Remove mapSideCombine field from Aggregator. Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.	2012-10-13 14:59:20 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Matei Zaharia	682b2d9329	Added a test for when an RDD only partially fits in memory	2012-10-12 14:58:26 -07:00
Shivaram Venkataraman	8577523f37	Add test to verify if RDD is computed even if block manager has insufficient memory	2012-10-12 14:14:57 -07:00
Shivaram Venkataraman	2cf40c5fd5	Change block manager to accept a ArrayBuffer instead of an iterator to ensure that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface	2012-10-11 00:42:46 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Reynold Xin	80f59e17e2	Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created wrong for local mode.	2012-10-07 00:54:38 -07:00
Matei Zaharia	eca570f66a	Removed the need to sleep in tests due to waiting for Akka to shut down	2012-10-07 00:17:59 -07:00
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	9a3b3f32a3	Pass sizes of map outputs back to MapOutputTracker	2012-10-06 18:46:04 -07:00
Matei Zaharia	716e10ca32	Minor formatting fixes	2012-10-05 22:03:06 -07:00
Andy Konwinski	a242cdd0a6	Factor subclasses of RDD out of RDD.scala into their own classes in the rdd package.	2012-10-05 19:53:54 -07:00
Andy Konwinski	e0067da082	Moves all files in core/src/main/scala/ that have RDD in them from package spark to package spark.rdd and updates all references to them.	2012-10-05 19:23:45 -07:00
Shivaram Venkataraman	b6e4f46a96	Fix SizeEstimator tests to work with String classes in JDK 6 and 7 Conflicts: core/src/test/scala/spark/BoundedMemoryCacheSuite.scala	2012-10-05 16:58:57 -07:00
Imran Rashid	e0698f8f26	change tests to show utility of localValue	2012-10-04 23:05:42 -07:00
Imran Rashid	82a3327862	make accumulator.localValue public, add tests Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala	2012-10-04 23:05:01 -07:00
Matei Zaharia	97cbd699d7	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-02 17:31:01 -07:00
Matei Zaharia	5fda59ab99	Added a test for overly large blocks in memory store	2012-10-02 17:30:40 -07:00
Matei Zaharia	6098f7e87a	Fixed cache replacement behavior of BlockManager: - Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit	2012-10-02 17:25:38 -07:00
Reynold Xin	0898a21b95	Merge branch 'dev' of https://github.com/mesos/spark into dev	2012-10-02 13:08:01 -07:00
Matei Zaharia	22684653a5	Revert "Place Spray repo ahead of Cloudera in Maven search path" This reverts commit `42e0a68082`.	2012-10-02 12:01:32 -07:00
Reynold Xin	b8cd681169	Allow whitespaces in cluster URL configuration for local cluster.	2012-10-02 11:52:12 -07:00
Matei Zaharia	42e0a68082	Place Spray repo ahead of Cloudera in Maven search path	2012-10-02 11:37:19 -07:00
Matei Zaharia	74a9244255	Write all unit test output to a file	2012-10-01 15:07:42 -07:00
Matei Zaharia	0b84871dbc	Remove some printlns in tests	2012-10-01 10:57:26 -07:00
Matei Zaharia	2314132d57	Added a (failing) test for LRU with MEMORY_AND_DISK.	2012-09-30 22:52:16 -07:00
Matei Zaharia	83143f9a5f	Fixed several bugs that caused weird behavior with files in spark-shell: - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.	2012-09-30 21:19:39 -07:00
Matei Zaharia	fd0374b9de	Comment	2012-09-29 21:43:06 -07:00
Matei Zaharia	143ef4f90d	Added a CoalescedRDD class for reducing the number of partitions in an RDD.	2012-09-29 21:30:52 -07:00
Matei Zaharia	c45758ddde	Comment	2012-09-29 20:27:54 -07:00
Matei Zaharia	9b326d01e9	Made BlockManager unmap memory-mapped files when necessary to reduce the number of open files. Also optimized sending of disk-based blocks.	2012-09-29 20:21:54 -07:00
Matei Zaharia	009b0e37e7	Added an option to compress blocks in the block store	2012-09-27 18:45:44 -07:00
Matei Zaharia	7bcb08cef5	Renamed storage levels to something cleaner; fixes #223 .	2012-09-27 17:50:59 -07:00
Matei Zaharia	920fab23c3	Merge pull request #222 from rxin/dev Added MapPartitionsWithSplitRDD.	2012-09-26 23:16:45 -07:00
Matei Zaharia	1ef4f0fbd2	Allow controlling number of splits in sortByKey.	2012-09-26 19:18:47 -07:00
Reynold Xin	1ad1331a34	Added MapPartitionsWithSplitRDD.	2012-09-26 17:11:28 -07:00
Matei Zaharia	d71a358c46	Fixed a test that was getting extremely lucky before, and increased the number of samples used for sorting	2012-09-26 00:25:34 -07:00
Matei Zaharia	6eeb379cf8	Fix some test issues	2012-09-24 15:39:58 -07:00
Reynold Xin	397d3816e1	Separated ShuffledRDD into multiple classes: RepartitionShuffledRDD, ShuffledSortedRDD, and ShuffledAggregatedRDD.	2012-09-19 12:31:45 -07:00
Denny	5e4076e3f2	Merge branch 'dev' into feature/fileserver Conflicts: core/src/main/scala/spark/SparkContext.scala	2012-09-11 16:57:17 -07:00
Matei Zaharia	6d7f907e73	Manually merge pull request #175 by Imran Rashid	2012-09-11 16:00:06 -07:00
Denny	4d3471dd07	Fix serialization bugs and added local cluster tests	2012-09-10 15:39:58 -07:00
Denny	b864c36a30	Dynamically adding jar files and caching fileSets.	2012-09-10 12:49:09 -07:00
Denny	f275fb07da	General FileServer A general fileserver for both JARs and regular files.	2012-09-10 12:48:59 -07:00
Matei Zaharia	a13780670d	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-10 12:48:58 -07:00
Matei Zaharia	995982b3c9	Added a unit test for local-cluster mode and simplified some of the code involved in that	2012-09-07 17:08:36 -07:00
Reynold Xin	c308fbcb79	Removed cache add/remove log messages from CacheTracker. Added log messages on BlockManagerMaster to reflect block add/remove. Also did some minor cleanup of storage package code.	2012-09-05 15:59:48 -07:00
Reynold Xin	a8a2a08a1a	Added a test for testing map-side combine on/off switch.	2012-08-30 12:34:28 -07:00
Matei Zaharia	2c16ae36d7	Set log level in tests to WARN	2012-08-23 20:38:14 -07:00
Matei Zaharia	deedb9e7b7	Fix further issues with tests and broadcast. The broadcast fix is to store values as MEMORY_ONLY_DESER instead of MEMORY_ONLY, which will save substantial time on serialization.	2012-08-23 20:31:49 -07:00
Shivaram Venkataraman	0f4fbb057b	Change BlockManagerSuite test cases to use a deterministic size estimator and update the results to match the new estimates	2012-08-13 13:32:23 -07:00
Shivaram Venkataraman	22ba3a3f77	Add test-cases for 32-bit and no-compressed oops scenarios.	2012-08-13 13:32:10 -07:00
Shivaram Venkataraman	1f68c4b03b	Update test cases to match the new size estimates. Uses 64-bit and compressed oops setting to get deterministic results	2012-08-13 13:31:54 -07:00
Matei Zaharia	6ae3c375a9	Renamed apply() to call() in Java API and allowed it to throw Exceptions	2012-08-12 23:10:19 +02:00
Matei Zaharia	e463e7a333	Merge pull request #167 from JoshRosen/piped-rdd-fixes Detect non-zero exit status from PipedRDD process	2012-08-10 00:56:42 -07:00
Shivaram Venkataraman	ce3444d2cb	Fix testcheckpoint to reuse spark context defined in the class	2012-08-03 18:52:26 -07:00
Matei Zaharia	62898b631f	Made range partition balance tests more aggressive. This is because we pull out such a large sample (10x the number of partitions) that we should expect pretty good balance. The tests are also deterministic so there's no worry about them failing irreproducibly.	2012-08-03 16:46:48 -04:00
Matei Zaharia	6601a6212b	Added a unit test for cross-partition balancing in sort, and changes to RangePartitioner to make it pass. It turns out that the first partition was always kind of small due to how we picked partition boundaries.	2012-08-03 16:40:45 -04:00
Matei Zaharia	3ee2530c0c	Merge branch 'block-manager-fix' into dev	2012-07-30 13:58:46 -07:00
Matei Zaharia	400221f851	Merge branch 'dev' of git://github.com/tdas/spark into dev	2012-07-30 13:54:57 -07:00
Matei Zaharia	ed1b0f8388	Made BlockManagerMaster no longer be a singleton. Also cleaned up a few formatting things throughout block manager code.	2012-07-30 13:53:47 -07:00
Matei Zaharia	d7f089323a	Fixed AccumulatorSuite to clean up SparkContext with BeforeAndAfter	2012-07-28 20:25:42 -07:00
Imran Rashid	f7149c5e46	tasks cannot access value of accumulator	2012-07-28 20:16:17 -07:00
Imran Rashid	f1face1ea9	rename addToAccum to addAccumulator	2012-07-28 20:16:01 -07:00
Imran Rashid	2d666b9d76	add some functionality to Vector, delete copy in AccumulatorSuite	2012-07-28 20:15:51 -07:00
Imran Rashid	83659af11c	Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=) Conflicts: core/src/main/scala/spark/Accumulators.scala	2012-07-28 20:13:51 -07:00
Imran Rashid	ae07f3864c	add Accumulatable, add corresponding docs & tests for accumulators	2012-07-28 20:12:41 -07:00
Matei Zaharia	f6f917bd00	Add a sleep to prevent a failing test. The BlockManager's put seems to be slightly asynchronous, which can cause it to fail this test by not removing stuff from the cache before we put the next value. We should probably change the semantics of put() in this case but it's hard right now. It will also be hard for asynchronously replicated puts.	2012-07-27 16:59:36 -07:00
Matei Zaharia	c0c78d2119	Renamed test more descriptively	2012-07-27 16:28:18 -07:00
Matei Zaharia	dee8ff1b9d	Added a second version of union() without varargs.	2012-07-27 16:27:52 -07:00
Matei Zaharia	b51d733a57	Fixed Java union methods having same erasure. Changed union() methods on lists to take a separate "first element" argument in order to differentiate them to the compiler, because Java 7 considered it an error to have them all take Lists parameterized with different types.	2012-07-27 12:23:27 -07:00
Tathagata Das	024905f682	Added BlockRDD and a first-cut version of checkpoint() to RDD class.	2012-07-27 12:00:49 -07:00
Tathagata Das	0426769f89	Modified the block dropping code for better performance.	2012-07-26 20:53:45 -07:00
Matei Zaharia	5c5aa2ff81	Merge pull request #153 from JoshRosen/new-java-api Java API	2012-07-26 17:20:52 -07:00
Josh Rosen	c5e2810dc7	Add persist(), splits(), glom(), and mapPartitions() to Java API.	2012-07-26 12:46:47 -07:00
Josh Rosen	bf61c10072	Detect non-zero exit status from PipedRDD process.	2012-07-26 11:32:59 -07:00
Denny	4f4a34c025	Stlystic changes Conflicts: core/src/test/scala/spark/MesosSchedulerSuite.scala	2012-07-23 16:32:20 -07:00
Denny	866e6949df	Always destroy SparkContext in after block for the unit tests. Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2012-07-23 16:29:17 -07:00
Josh Rosen	042dcbde33	Add type annotations to Java API methods. Add missing Scala Map to java.util.Map conversions.	2012-07-22 17:35:29 -07:00
Josh Rosen	01dce3f569	Add Java API Add distinct() method to RDD. Fix bug in DoubleRDDFunctions.	2012-07-18 17:34:29 -07:00
Matei Zaharia	408b5a1332	More work on deploy code (adding Worker class)	2012-06-30 16:45:57 -07:00
Matei Zaharia	2fb6e7d71e	Initial framework to get a master and web UI up.	2012-06-30 14:45:55 -07:00
Matei Zaharia	c53670b9bf	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
Matei Zaharia	3920189932	Upgraded to Akka 2 and fixed test execution (which was still parallel across projects).	2012-06-28 23:51:28 -07:00
Tathagata Das	e896a505e2	Added testcase for ByteBufferInputStream bugs.	2012-06-17 16:11:12 -07:00
Matei Zaharia	f58da6164e	Merge branch 'master' into dev	2012-06-15 23:47:11 -07:00
Tathagata Das	c6156da9e2	Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.	2012-06-13 16:26:49 -04:00
Matei Zaharia	e75b1b5cb4	Change the default broadcast implementation to a simple HTTP-based broadcast. Fixes #139.	2012-06-09 15:58:07 -07:00
Matei Zaharia	a96558caa3	Performance improvements to shuffle operations: in particular, preserve RDD partitioning in more cases where it's possible, and use iterators instead of materializing collections when doing joins.	2012-06-09 14:44:18 -07:00
Matei Zaharia	c2c7299d7a	Added BlockManagerSuite, which I'd forgotten to merge.	2012-06-07 13:47:10 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Matei Zaharia	6ae2746d1e	Handle arrays that contain the same element many times better in SizeEstimator. Also added a test for SizeEstimator. Fixes #136.	2012-06-06 16:13:02 -07:00
Matei Zaharia	0a617958d1	Some refactoring to make BoundedMemoryCache test similar to others	2012-06-06 16:12:08 -07:00
Matei Zaharia	e141f644ca	Merge pull request #132 from Benky/rb-first-iteration Little refactoring and unit tests for CacheTrackerActor	2012-05-26 13:15:06 -07:00
Richard Benkovsky	ae64920337	MesosScheduler refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	3a1bcd4028	Added tests for CacheTrackerActor	2012-05-22 11:04:54 +02:00
Richard Benkovsky	518506a7c5	Added tests for Utils.copyStream	2012-05-22 11:04:51 +02:00
Richard Benkovsky	565245871f	BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity	2012-05-20 22:13:35 +02:00
Reynold Xin	16461e2eda	Updated Cache's put method to use a case class for response. Previously it was pretty ugly that put() should return -1 for failures.	2012-05-15 00:31:52 -07:00
Reynold Xin	019e48833f	Added the capacity to report cache usage status back to the cache trackor. This is essential for building a dashboard to see the status of caches on all slaves.	2012-05-14 18:39:04 -07:00
Reynold Xin	761ea65a98	Added a test for the previous commit (failing to serialize task results would throw an exception for local tasks).	2012-04-24 15:14:35 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Antonio	0d93d95bcf	Removed unnecessary import	2012-02-21 19:57:12 -08:00
Antonio	2990298f71	Added sorting testing suite	2012-02-21 19:54:21 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Matei Zaharia	a766780f4c	Added some tests for multithreaded access to Spark.	2012-02-09 22:27:53 -08:00
Matei Zaharia	43a3335090	Simplifying test	2012-02-05 22:46:51 -08:00
Matei Zaharia	eb05154b7a	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:08:25 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	9e4c79a4d3	Closure cleaner unit test	2011-11-08 00:40:15 -08:00
Matei Zaharia	c2b7fd6899	Make parallelize() work efficiently for ranges of Long, Double, etc (splitting them into sub-ranges). Fixes #87.	2011-11-02 15:16:02 -07:00
Matei Zaharia	d12122502b	Various improvements to Kryo serializer: - Replaced modified Kryo version with the standard one augmented with the kryo-serializers package, which includes support for classes with no-arg constructors (that was why we had a modified Kryo before) - The kryo-serializers version also fixes issue #72. - Added a bunch of tests. - Serialize maps and a few other common types properly by default.	2011-07-21 22:09:33 -07:00
Matei Zaharia	e4c3402d2d	Renamed ParallelArray to ParallelCollection	2011-07-14 14:47:01 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Olivier Grisel	005d1605a4	add missing test for RDD.groupWith	2011-06-23 02:10:52 +02:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00
Matei Zaharia	4db50e26c7	Fixed unit tests by making them clean up the SparkContext after use and thus clean up the various singletons (RDDCache, MapOutputTracker, etc). This isn't perfect yet (ideally we shouldn't use singleton objects at all) but we can fix that later.	2011-05-13 12:03:58 -07:00
Matei Zaharia	e5c4cd8a5e	Made examples and core subprojects	2011-02-01 15:11:08 -08:00

... 8 9 10 11 12 ...

784 commits