ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Tathagata Das	e427216018	Removed unnecessary testcases.	2012-12-08 12:46:59 -08:00
Tathagata Das	a69a82be26	Added metadata cleaner to HttpBroadcast to clean up old broacast files.	2012-12-03 22:37:31 -08:00
Tathagata Das	b4dba55f78	Made RDD checkpoint not create a new thread. Fixed bug in detecting when spark.cleaner.delay is insufficient.	2012-12-02 02:03:05 +00:00
Tathagata Das	477de94894	Minor modifications.	2012-12-01 13:15:06 -08:00
Tathagata Das	6fcd09f499	Added TimeStampedHashSet and used that to cleanup the list of registered RDD IDs in CacheTracker.	2012-11-29 02:06:33 -08:00
Tathagata Das	c9789751bf	Added metadata cleaner to BlockManager to remove old blocks completely.	2012-11-28 23:18:24 -08:00
Tathagata Das	9e9e9e1d89	Renamed CleanupTask to MetadataCleaner.	2012-11-28 18:48:14 -08:00
Tathagata Das	e463ae4920	Modified StorageLevel and BlockManagerId to cache common objects and use cached object while deserializing.	2012-11-28 14:05:01 -08:00
Tathagata Das	d5e7aad039	Bug fixes	2012-11-28 08:36:55 +00:00
Tathagata Das	b18d70870a	Modified bunch HashMaps in Spark to use TimeStampedHashMap and made various modules use CleanupTask to periodically clean up metadata.	2012-11-27 15:08:49 -08:00
Tathagata Das	0fe2fc4d5e	Merged branch mesos/master to branch dev.	2012-11-26 13:16:59 -08:00
Tathagata Das	c97ebf6437	Fixed bug in the number of splits in RDD after checkpointing. Modified reduceByKeyAndWindow (naive) computation from window+reduceByKey to reduceByKey+window+reduceByKey.	2012-11-19 23:22:07 +00:00
Matei Zaharia	3ff6f4bdee	Merge pull request #304 from mbautin/configurable_local_ip SPARK-624: make the default local IP customizable	2012-11-19 13:23:39 -08:00
mbautin	00f4e3ff9c	Addressing Matei's comment: SPARK_LOCAL_IP environment variable	2012-11-19 11:52:10 -08:00
Tathagata Das	10c1abcb6a	Fixed checkpointing bug in CoGroupedRDD. CoGroupSplits kept around the RDD splits of its parent RDDs, thus checkpointing its parents did not release the references to the parent splits.	2012-11-17 17:27:00 -08:00
Charles Reiss	12c24e786c	Set default uncaught exception handler to exit. Among other things, should prevent OutOfMemoryErrors in some daemon threads (such as the network manager) from causing a spark executor to enter a state where it cannot make progress but does not report an error.	2012-11-16 20:12:31 -08:00
mbautin	1f5a7e0e64	SPARK-624: make the default local IP customizable	2012-11-15 13:57:47 -08:00
Matei Zaharia	c23a74df0a	Use DNS names instead of IP addresses in standalone mode, to allow matching with data locality hints from storage systems.	2012-11-15 00:10:52 -08:00
Tathagata Das	8a25d530ed	Optimized checkpoint writing by reusing FileSystem object. Fixed bug in updating of checkpoint data in DStream where the checkpointed RDDs, upon recovery, were not recognized as checkpointed RDDs and therefore deleted from HDFS. Made InputStreamsSuite more robust to timing delays.	2012-11-13 02:16:28 -08:00
Matei Zaharia	173e0354c0	Detect correctly when one has disconnected from a standalone cluster. SPARK-617 #resolve	2012-11-11 21:06:57 -08:00
Tathagata Das	04e9e9d93c	Refactored BlockManagerMaster (not BlockManagerMasterActor) to simplify the code and fix live lock problem in unlimited attempts to contact the master. Also added testcases in the BlockManagerSuite to test BlockManagerMaster methods getPeers and getLocations.	2012-11-11 08:54:21 -08:00
root	acf8272324	Fix K-means example a little	2012-11-10 23:07:21 -08:00
Tathagata Das	355c8e4b17	Fixed deadlock in BlockManager.	2012-11-09 16:28:45 -08:00
Tathagata Das	9915989bfa	Incorporated Matei's suggestions. Tested with 5 producer(consumer) threads each doing 50k puts (gets), took 15 minutes to run, no errors or deadlocks.	2012-11-09 15:46:15 -08:00
Tathagata Das	de00bc63db	Fixed deadlock in BlockManager. 1. Changed the lock structure of BlockManager by replacing the 337 coarse-grained locks to use BlockInfo objects as per-block fine-grained locks. 2. Changed the MemoryStore lock structure by making the block putting threads lock on a different object (not the memory store) thus making sure putting threads minimally blocks to the getting treads. 3. Added spark.storage.ThreadingTest to stress test the BlockManager using 5 block producer and 5 block consumer threads.	2012-11-09 14:09:37 -08:00
Matei Zaharia	6607f546cc	Added an option to spread out jobs in the standalone mode.	2012-11-08 23:13:12 -08:00
Matei Zaharia	66cbdee941	Fix for connections not being reused (from Josh Rosen)	2012-11-08 09:53:40 -08:00
Imran Rashid	809b2bb1fe	fix bug in getting slave id out of mesos	2012-11-08 00:34:28 -08:00
Matei Zaharia	bb1bce7924	Various fixes to standalone mode and web UI: - Don't report a job as finishing multiple times - Don't show state of workers as LOADING when they're running - Show start and finish times in web UI - Sort web UI tables by ID and time by default	2012-11-07 16:49:53 -08:00
Matei Zaharia	e2b8477487	Made Akka timeout and message frame size configurable, and upped the defaults	2012-11-06 15:58:05 -08:00
Tathagata Das	72b2303f99	Fixed major bugs in checkpointing.	2012-11-05 11:41:36 -08:00
Tathagata Das	d154238789	Made checkpointing of dstream graph to work with checkpointing of RDDs. For streams requiring checkpointing of its RDD, the default checkpoint interval is set to 10 seconds.	2012-11-04 12:12:06 -08:00
Shivaram Venkataraman	a7d967a1ca	Remove unnecessary hash-map put in MemoryStore	2012-11-01 10:46:38 -07:00
Tathagata Das	34e569f40e	Added 'synchronized' to RDD serialization to ensure checkpoint-related changes are reflected atomically in the task closure. Added to tests to ensure that jobs running on an RDD on which checkpointing is in progress does hurt the result of the job.	2012-10-31 00:56:40 -07:00
Tathagata Das	0dcd770fdc	Added checkpointing support to all RDDs, along with CheckpointSuite to test checkpointing in them.	2012-10-30 16:09:37 -07:00
Tathagata Das	ac12abc17f	Modified RDD API to make dependencies a var (therefore can be changed to checkpointed hadoop rdd) and othere references to parent RDDs either through dependencies or through a weak reference (to allow finalizing when dependencies do not refer to it any more).	2012-10-29 11:55:27 -07:00
root	e782187b4a	Don't throw an error in the block manager when a block is cached on the master due to a locally computed operation Conflicts: core/src/main/scala/spark/storage/BlockManagerMaster.scala	2012-10-26 00:33:45 -07:00
Matei Zaharia	863a55ae42	Merge remote-tracking branch 'public/master' into dev Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala core/src/main/scala/spark/KryoSerializer.scala core/src/main/scala/spark/MapOutputTracker.scala core/src/main/scala/spark/RDD.scala core/src/main/scala/spark/SparkContext.scala core/src/main/scala/spark/executor/Executor.scala core/src/main/scala/spark/network/Connection.scala core/src/main/scala/spark/network/ConnectionManagerTest.scala core/src/main/scala/spark/rdd/BlockRDD.scala core/src/main/scala/spark/rdd/NewHadoopRDD.scala core/src/main/scala/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/spark/scheduler/cluster/StandaloneSchedulerBackend.scala core/src/main/scala/spark/storage/BlockManager.scala core/src/main/scala/spark/storage/BlockMessage.scala core/src/main/scala/spark/storage/BlockStore.scala core/src/main/scala/spark/storage/StorageLevel.scala core/src/main/scala/spark/util/AkkaUtils.scala project/SparkBuild.scala run	2012-10-24 23:21:00 -07:00
Matei Zaharia	f63a40fd99	Strip leading mesos:// in URLs passed to Mesos	2012-10-24 21:52:13 -07:00
Matei Zaharia	d290e964ea	Merge pull request #281 from rxin/memreport Added a method to report slave memory status; force serialize accumulator update in local mode.	2012-10-23 22:04:35 -07:00
Matei Zaharia	0bd20c63e2	Merge remote-tracking branch 'JoshRosen/shuffle_refactoring' into dev Conflicts: core/src/main/scala/spark/Dependency.scala core/src/main/scala/spark/rdd/CoGroupedRDD.scala core/src/main/scala/spark/rdd/ShuffledRDD.scala	2012-10-23 22:01:45 -07:00
Thomas Dudziak	d9c2a89c57	Support for Hadoop 2 distributions such as cdh4	2012-10-18 16:08:54 -07:00
Reynold Xin	63fae9bc23	Serialize accumulator updates in TaskResult for local mode.	2012-10-15 21:38:28 -07:00
Reynold Xin	42d20fa8da	Added a method to report slave memory status.	2012-10-14 22:30:53 -07:00
Matei Zaharia	64dbf8d372	Made ShuffleDependency automatically find a shuffle ID for itself	2012-10-14 10:00:22 -07:00
Tathagata Das	e95ff45b53	Implemented checkpointing of StreamingContext and DStream graph.	2012-10-13 20:10:49 -07:00
Matei Zaharia	8815aeba0c	Take executor environment vars as an arguemnt to SparkContext	2012-10-13 15:31:11 -07:00
Josh Rosen	33cd3a0c12	Remove map-side combining from ShuffleMapTask. This separation of concerns simplifies the ShuffleDependency and ShuffledRDD interfaces. Map-side combining can be performed in a mapPartitions() call prior to shuffling the RDD. I don't anticipate this having much of a performance impact: in both approaches, each tuple is hashed twice: once in the bucket partitioning and once in the combiner's hashtable. The same steps are being performed, but in a different order and through one extra Iterator.	2012-10-13 14:59:20 -07:00
Josh Rosen	10bcd217d2	Remove mapSideCombine field from Aggregator. Instead, the presence or absense of a ShuffleDependency's aggregator will control whether map-side combining is performed.	2012-10-13 14:59:20 -07:00
Josh Rosen	4775c55641	Change ShuffleFetcher to return an Iterator.	2012-10-13 14:59:20 -07:00

1 2 3 4 5 ...

672 commits