ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Stephen Haberman	8dc06069fe	Rename RDD.tupleBy to keyBy.	2013-01-06 15:21:45 -06:00
Stephen Haberman	1fdb6946b5	Add RDD.tupleBy.	2013-01-05 13:07:59 -06:00
Matei Zaharia	55809fbc6d	Merge pull request #349 from woggling/cache-finally Avoid stalls when computation of cached RDD throws exception	2013-01-01 08:21:33 -08:00
Charles Reiss	21636ee4fa	Test with exception while computing cached RDD.	2013-01-01 08:07:40 -08:00
Josh Rosen	f803953998	Raise exception when hashing Java arrays (SPARK-597)	2012-12-31 20:20:11 -08:00
Mark Hamstra	903f3518df	fall back to filter-map-collect when calling lookup() on an RDD without a partitioner	2012-12-24 13:18:45 -08:00
Mark Hamstra	61be8566e2	Allow distinct() to be called without parentheses when using the default number of splits.	2012-12-24 02:36:47 -08:00
Reynold Xin	9397c5014e	Let the slave notify the master block removal.	2012-12-20 01:37:09 -08:00
Reynold Xin	8c01295b85	Fixed conflicts from merging Charles' and TD's block manager changes.	2012-12-14 00:26:36 -08:00
Reynold Xin	0235667f73	Merge branch 'master' of github.com:mesos/spark into spark-633	2012-12-13 22:33:41 -08:00
Reynold Xin	97434f49b8	Merged TD's block manager refactoring.	2012-12-13 22:32:19 -08:00
Reynold Xin	f4a9e1b9be	Fixed the broken Java unit test from SPARK-635.	2012-12-13 22:22:12 -08:00
Reynold Xin	1b7a0451ed	Added the ability in block manager to remove blocks.	2012-12-13 00:04:42 -08:00
Charles Reiss	5d3e917d09	Use Akka scheduler for BlockManager heart beats. Adds required ActorSystem argument to BlockManager constructors.	2012-12-10 00:31:50 -08:00
Charles Reiss	a2a94fdbc7	Tests for block manager heartbeats.	2012-12-05 23:36:05 -08:00
Matei Zaharia	3ebd8e1885	Added zip to Java API	2012-11-27 22:38:09 -08:00
Matei Zaharia	27e43abd19	Added a zip() operation for RDDs with the same shape (number of partitions and number of elements in each partition)	2012-11-27 22:27:47 -08:00
Matei Zaharia	935c468b71	Merge pull request #311 from woggling/map-output-npe Fix NullPointerException when map output unregistered from MapOutputTracker twice	2012-11-27 20:50:48 -08:00
Reynold Xin	f24bfd2dd1	For size compression, compress non zero values into non zero values.	2012-11-27 19:20:45 -08:00
Charles Reiss	5fa868b98b	Tests for MapOutputTracker.	2012-11-27 16:05:36 -08:00
Matei Zaharia	0bd20c63e2	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala	2012-10-23 22:01:45 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Josh Rosen	33cd3a0c12	Remove map-side combining from ShuffleMapTask. This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	10bcd217d2	Remove mapSideCombine field from Aggregator. Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.	2012-10-13 14:59:20 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00
Matei Zaharia	682b2d9329	Added a test for when an RDD only partially fits in memory	2012-10-12 14:58:26 -07:00
Shivaram Venkataraman	8577523f37	Add test to verify if RDD is computed even if block manager has insufficient memory	2012-10-12 14:14:57 -07:00
Shivaram Venkataraman	2cf40c5fd5	Change block manager to accept a ArrayBuffer instead of an iterator to ensure that the computation can proceed even if we run out of memory to cache the block. Update CacheTracker to use this new interface	2012-10-11 00:42:46 -07:00
Matei Zaharia	efc5423210	Made compression configurable separately for shuffle, broadcast and RDDs	2012-10-07 11:30:53 -07:00
Reynold Xin	80f59e17e2	Fixed a bug in addFile that if the file is specified as "file:///", the symlink is created wrong for local mode.	2012-10-07 00:54:38 -07:00
Matei Zaharia	eca570f66a	Removed the need to sleep in tests due to waiting for Akka to shut down	2012-10-07 00:17:59 -07:00
Matei Zaharia	dc28a3ac0a	Modified shuffle to limit the maximum outstanding data size in bytes, instead of the maximum number of outstanding fetches. This should make it faster when there are many small map output files, as well as more robust to overallocating memory on large map outputs.	2012-10-06 20:07:10 -07:00
Matei Zaharia	9a3b3f32a3	Pass sizes of map outputs back to MapOutputTracker	2012-10-06 18:46:04 -07:00
Matei Zaharia	716e10ca32	Minor formatting fixes	2012-10-05 22:03:06 -07:00
Andy Konwinski	a242cdd0a6	Factor subclasses of RDD out of RDD.scala into their own classes in the rdd package.	2012-10-05 19:53:54 -07:00
Andy Konwinski	e0067da082	Moves all files in core/src/main/scala/ that have RDD in them from package spark to package spark.rdd and updates all references to them.	2012-10-05 19:23:45 -07:00
Shivaram Venkataraman	b6e4f46a96	Fix SizeEstimator tests to work with String classes in JDK 6 and 7 Conflicts: core/src/test/scala/spark/BoundedMemoryCacheSuite.scala	2012-10-05 16:58:57 -07:00
Imran Rashid	e0698f8f26	change tests to show utility of localValue	2012-10-04 23:05:42 -07:00
Imran Rashid	82a3327862	make accumulator.localValue public, add tests Conflicts: core/src/test/scala/spark/AccumulatorSuite.scala	2012-10-04 23:05:01 -07:00
Matei Zaharia	97cbd699d7	Merge branch 'dev' of github.com:mesos/spark into dev	2012-10-02 17:31:01 -07:00
Matei Zaharia	5fda59ab99	Added a test for overly large blocks in memory store	2012-10-02 17:30:40 -07:00
Matei Zaharia	6098f7e87a	Fixed cache replacement behavior of BlockManager: - Partitions that get dropped to disk will now be loaded back into RAM after they're accessed again - Same-RDD rule for cache replacement is now implemented (don't drop partitions from an RDD to make room for other partitions from itself) - Items stored as MEMORY_AND_DISK go into memory only first, instead of being eagerly written out to disk - MemoryStore.ensureFreeSpace is called within a lock on the writer thread to prevent race conditions (this can still be optimized to allow multiple concurrent calls to it but it's a start) - MemoryStore does not accept blocks larger than its limit	2012-10-02 17:25:38 -07:00
Reynold Xin	0898a21b95	Merge branch 'dev' of https://github.com/mesos/spark into dev	2012-10-02 13:08:01 -07:00
Matei Zaharia	22684653a5	Revert "Place Spray repo ahead of Cloudera in Maven search path" This reverts commit `42e0a68082`.	2012-10-02 12:01:32 -07:00
Reynold Xin	b8cd681169	Allow whitespaces in cluster URL configuration for local cluster.	2012-10-02 11:52:12 -07:00
Matei Zaharia	42e0a68082	Place Spray repo ahead of Cloudera in Maven search path	2012-10-02 11:37:19 -07:00
Matei Zaharia	74a9244255	Write all unit test output to a file	2012-10-01 15:07:42 -07:00
Matei Zaharia	0b84871dbc	Remove some printlns in tests	2012-10-01 10:57:26 -07:00
Matei Zaharia	2314132d57	Added a (failing) test for LRU with MEMORY_AND_DISK.	2012-09-30 22:52:16 -07:00
Matei Zaharia	83143f9a5f	Fixed several bugs that caused weird behavior with files in spark-shell: - SizeEstimator was following through a ClassLoader field of Hadoop JobConfs, which referenced the whole interpreter, Scala compiler, etc. Chaos ensued, giving an estimated size in the tens of gigabytes. - Broadcast variables in local mode were only stored as MEMORY_ONLY and never made accessible over a server, so they fell out of the cache when they were deemed too large and couldn't be reloaded.	2012-09-30 21:19:39 -07:00

1 2 3

150 commits