ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	a8a2a08a1a	Added a test for testing map-side combine on/off switch.	2012-08-30 12:34:28 -07:00
Matei Zaharia	2c16ae36d7	Set log level in tests to WARN	2012-08-23 20:38:14 -07:00
Matei Zaharia	deedb9e7b7	Fix further issues with tests and broadcast. The broadcast fix is to store values as MEMORY_ONLY_DESER instead of MEMORY_ONLY, which will save substantial time on serialization.	2012-08-23 20:31:49 -07:00
Shivaram Venkataraman	0f4fbb057b	Change BlockManagerSuite test cases to use a deterministic size estimator and update the results to match the new estimates	2012-08-13 13:32:23 -07:00
Shivaram Venkataraman	22ba3a3f77	Add test-cases for 32-bit and no-compressed oops scenarios.	2012-08-13 13:32:10 -07:00
Shivaram Venkataraman	1f68c4b03b	Update test cases to match the new size estimates. Uses 64-bit and compressed oops setting to get deterministic results	2012-08-13 13:31:54 -07:00
Matei Zaharia	6ae3c375a9	Renamed apply() to call() in Java API and allowed it to throw Exceptions	2012-08-12 23:10:19 +02:00
Matei Zaharia	e463e7a333	Merge pull request #167 from JoshRosen/piped-rdd-fixes Detect non-zero exit status from PipedRDD process	2012-08-10 00:56:42 -07:00
Shivaram Venkataraman	ce3444d2cb	Fix testcheckpoint to reuse spark context defined in the class	2012-08-03 18:52:26 -07:00
Matei Zaharia	62898b631f	Made range partition balance tests more aggressive. This is because we pull out such a large sample (10x the number of partitions) that we should expect pretty good balance. The tests are also deterministic so there's no worry about them failing irreproducibly.	2012-08-03 16:46:48 -04:00
Matei Zaharia	6601a6212b	Added a unit test for cross-partition balancing in sort, and changes to RangePartitioner to make it pass. It turns out that the first partition was always kind of small due to how we picked partition boundaries.	2012-08-03 16:40:45 -04:00
Matei Zaharia	3ee2530c0c	Merge branch 'block-manager-fix' into dev	2012-07-30 13:58:46 -07:00
Matei Zaharia	400221f851	Merge branch 'dev' of git://github.com/tdas/spark into dev	2012-07-30 13:54:57 -07:00
Matei Zaharia	ed1b0f8388	Made BlockManagerMaster no longer be a singleton. Also cleaned up a few formatting things throughout block manager code.	2012-07-30 13:53:47 -07:00
Matei Zaharia	d7f089323a	Fixed AccumulatorSuite to clean up SparkContext with BeforeAndAfter	2012-07-28 20:25:42 -07:00
Imran Rashid	f7149c5e46	tasks cannot access value of accumulator	2012-07-28 20:16:17 -07:00
Imran Rashid	f1face1ea9	rename addToAccum to addAccumulator	2012-07-28 20:16:01 -07:00
Imran Rashid	2d666b9d76	add some functionality to Vector, delete copy in AccumulatorSuite	2012-07-28 20:15:51 -07:00
Imran Rashid	83659af11c	Accumulator now inherits from Accumulable, whcih simplifies a bunch of other things (eg., no +:=) Conflicts: core/src/main/scala/spark/Accumulators.scala	2012-07-28 20:13:51 -07:00
Imran Rashid	ae07f3864c	add Accumulatable, add corresponding docs & tests for accumulators	2012-07-28 20:12:41 -07:00
Matei Zaharia	f6f917bd00	Add a sleep to prevent a failing test. The BlockManager's put seems to be slightly asynchronous, which can cause it to fail this test by not removing stuff from the cache before we put the next value. We should probably change the semantics of put() in this case but it's hard right now. It will also be hard for asynchronously replicated puts.	2012-07-27 16:59:36 -07:00
Matei Zaharia	c0c78d2119	Renamed test more descriptively	2012-07-27 16:28:18 -07:00
Matei Zaharia	dee8ff1b9d	Added a second version of union() without varargs.	2012-07-27 16:27:52 -07:00
Matei Zaharia	b51d733a57	Fixed Java union methods having same erasure. Changed union() methods on lists to take a separate "first element" argument in order to differentiate them to the compiler, because Java 7 considered it an error to have them all take Lists parameterized with different types.	2012-07-27 12:23:27 -07:00
Tathagata Das	024905f682	Added BlockRDD and a first-cut version of checkpoint() to RDD class.	2012-07-27 12:00:49 -07:00
Tathagata Das	0426769f89	Modified the block dropping code for better performance.	2012-07-26 20:53:45 -07:00
Matei Zaharia	5c5aa2ff81	Merge pull request #153 from JoshRosen/new-java-api Java API	2012-07-26 17:20:52 -07:00
Josh Rosen	c5e2810dc7	Add persist(), splits(), glom(), and mapPartitions() to Java API.	2012-07-26 12:46:47 -07:00
Josh Rosen	bf61c10072	Detect non-zero exit status from PipedRDD process.	2012-07-26 11:32:59 -07:00
Denny	4f4a34c025	Stlystic changes Conflicts: core/src/test/scala/spark/MesosSchedulerSuite.scala	2012-07-23 16:32:20 -07:00
Denny	866e6949df	Always destroy SparkContext in after block for the unit tests. Conflicts: core/src/test/scala/spark/ShuffleSuite.scala	2012-07-23 16:29:17 -07:00
Josh Rosen	042dcbde33	Add type annotations to Java API methods. Add missing Scala Map to java.util.Map conversions.	2012-07-22 17:35:29 -07:00
Josh Rosen	01dce3f569	Add Java API Add distinct() method to RDD. Fix bug in DoubleRDDFunctions.	2012-07-18 17:34:29 -07:00
Matei Zaharia	408b5a1332	More work on deploy code (adding Worker class)	2012-06-30 16:45:57 -07:00
Matei Zaharia	2fb6e7d71e	Initial framework to get a master and web UI up.	2012-06-30 14:45:55 -07:00
Matei Zaharia	c53670b9bf	Various code style fixes, mostly from IntelliJ IDEA	2012-06-29 18:47:12 -07:00
Matei Zaharia	3920189932	Upgraded to Akka 2 and fixed test execution (which was still parallel across projects).	2012-06-28 23:51:28 -07:00
Tathagata Das	e896a505e2	Added testcase for ByteBufferInputStream bugs.	2012-06-17 16:11:12 -07:00
Matei Zaharia	f58da6164e	Merge branch 'master' into dev	2012-06-15 23:47:11 -07:00
Tathagata Das	c6156da9e2	Multiple bug fixes to pass the testsuites ShuffleSuite and BlockManagerSuite.	2012-06-13 16:26:49 -04:00
Matei Zaharia	e75b1b5cb4	Change the default broadcast implementation to a simple HTTP-based broadcast. Fixes #139.	2012-06-09 15:58:07 -07:00
Matei Zaharia	a96558caa3	Performance improvements to shuffle operations: in particular, preserve RDD partitioning in more cases where it's possible, and use iterators instead of materializing collections when doing joins.	2012-06-09 14:44:18 -07:00
Matei Zaharia	c2c7299d7a	Added BlockManagerSuite, which I'd forgotten to merge.	2012-06-07 13:47:10 -07:00
Matei Zaharia	63051dd2bc	Merge in engine improvements from the Spark Streaming project, developed jointly with Tathagata Das and Haoyuan Li. This commit imports the changes and ports them to Mesos 0.9, but does not yet pass unit tests due to various classes not supporting a graceful stop() yet.	2012-06-07 12:45:38 -07:00
Matei Zaharia	6ae2746d1e	Handle arrays that contain the same element many times better in SizeEstimator. Also added a test for SizeEstimator. Fixes #136.	2012-06-06 16:13:02 -07:00
Matei Zaharia	0a617958d1	Some refactoring to make BoundedMemoryCache test similar to others	2012-06-06 16:12:08 -07:00
Matei Zaharia	e141f644ca	Merge pull request #132 from Benky/rb-first-iteration Little refactoring and unit tests for CacheTrackerActor	2012-05-26 13:15:06 -07:00
Richard Benkovsky	ae64920337	MesosScheduler refactoring	2012-05-22 11:04:54 +02:00
Richard Benkovsky	3a1bcd4028	Added tests for CacheTrackerActor	2012-05-22 11:04:54 +02:00
Richard Benkovsky	518506a7c5	Added tests for Utils.copyStream	2012-05-22 11:04:51 +02:00
Richard Benkovsky	565245871f	BoundedMemoryCache.put fails when estimated size of 'value' is larger than cache capacity	2012-05-20 22:13:35 +02:00
Reynold Xin	16461e2eda	Updated Cache's put method to use a case class for response. Previously it was pretty ugly that put() should return -1 for failures.	2012-05-15 00:31:52 -07:00
Reynold Xin	019e48833f	Added the capacity to report cache usage status back to the cache trackor. This is essential for building a dashboard to see the status of caches on all slaves.	2012-05-14 18:39:04 -07:00
Reynold Xin	761ea65a98	Added a test for the previous commit (failing to serialize task results would throw an exception for local tasks).	2012-04-24 15:14:35 -07:00
Reynold Xin	e601b3b9e5	Added the ability to set environmental variables in piped rdd.	2012-04-17 16:40:56 -07:00
Matei Zaharia	c7af538ac1	Some fixes to sorting for when the RDD has fewer elements than the number of partitions we ask to partition it into. Also, removed a test that was taking way too long to run.	2012-03-17 13:08:36 -07:00
Matei Zaharia	1e10df0a46	Merge pull request #111 from alupher/master Adding sorting to RDDs	2012-02-24 15:50:14 -08:00
Antonio	0d93d95bcf	Removed unnecessary import	2012-02-21 19:57:12 -08:00
Antonio	2990298f71	Added sorting testing suite	2012-02-21 19:54:21 -08:00
Matei Zaharia	aa04f87cd2	Added support for parallel execution of jobs in DAGScheduler.	2012-02-19 22:50:23 -08:00
Matei Zaharia	a766780f4c	Added some tests for multithreaded access to Spark.	2012-02-09 22:27:53 -08:00
Matei Zaharia	43a3335090	Simplifying test	2012-02-05 22:46:51 -08:00
Matei Zaharia	eb05154b7a	Fixed a failure recovery bug and added some tests for fault recovery.	2012-01-13 19:08:25 -08:00
Matei Zaharia	e269f6f7ea	Register RDDs with the MapOutputTracker even if they have no partitions. Fixes #105.	2012-01-05 15:59:20 -05:00
Matei Zaharia	735843a049	Merge remote-tracking branch 'origin/charles-newhadoop'	2011-12-02 21:59:30 -08:00
Charles Reiss	66f05f383e	Add new Hadoop API reading support.	2011-12-01 14:02:10 -08:00
Charles Reiss	02d43e6986	Add new Hadoop API writing support.	2011-12-01 14:01:28 -08:00
Matei Zaharia	22b8fcf632	Added fold() and aggregate() operations that reuse an object to merge results into rather than requiring a new object allocation for each element merged. Fixes #95.	2011-11-30 11:37:47 -08:00
Matei Zaharia	9e4c79a4d3	Closure cleaner unit test	2011-11-08 00:40:15 -08:00
Matei Zaharia	c2b7fd6899	Make parallelize() work efficiently for ranges of Long, Double, etc (splitting them into sub-ranges). Fixes #87.	2011-11-02 15:16:02 -07:00
Matei Zaharia	d12122502b	Various improvements to Kryo serializer: - Replaced modified Kryo version with the standard one augmented with the kryo-serializers package, which includes support for classes with no-arg constructors (that was why we had a modified Kryo before) - The kryo-serializers version also fixes issue #72. - Added a bunch of tests. - Serialize maps and a few other common types properly by default.	2011-07-21 22:09:33 -07:00
Matei Zaharia	e4c3402d2d	Renamed ParallelArray to ParallelCollection	2011-07-14 14:47:01 -04:00
Matei Zaharia	2604939f64	Simplified and documented code a little and added test	2011-07-14 00:19:00 -04:00
Matei Zaharia	9c0069188b	Updated save code to allow non-file-based OutputFormats and added a test for file-related stuff	2011-07-13 23:04:06 -04:00
Matei Zaharia	842e14d567	Added mapPartitions operation and a bunch of tests for RDD ops	2011-07-13 00:19:52 -04:00
Olivier Grisel	2e3531d8bf	Implemented RDD.leftOuterJoin and RDD.rightOuterJoin	2011-06-24 11:00:51 +02:00
Olivier Grisel	005d1605a4	add missing test for RDD.groupWith	2011-06-23 02:10:52 +02:00
Ismael Juma	1396678baa	Move REPL classes to separate module.	2011-05-27 11:22:50 +01:00
Matei Zaharia	4db50e26c7	Fixed unit tests by making them clean up the SparkContext after use and thus clean up the various singletons (RDDCache, MapOutputTracker, etc). This isn't perfect yet (ideally we shouldn't use singleton objects at all) but we can fix that later.	2011-05-13 12:03:58 -07:00
Matei Zaharia	e5c4cd8a5e	Made examples and core subprojects	2011-02-01 15:11:08 -08:00

... 35 36 37 38 39

1930 commits