ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	37397b73ba	Added comprehensive tests for job cancellation in a variety of environments (local vs cluster, fifo vs fair).	2013-10-10 22:57:43 -07:00
Reynold Xin	80cdbf4f49	Switched to use daemon thread in executor and fixed a bug in job cancellation for fair scheduler.	2013-10-10 22:40:48 -07:00
Matei Zaharia	8f11c36fe1	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache' Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala	2013-10-10 19:34:33 -07:00
Reynold Xin	058508b625	Changed the name of the local cluster executor from local to localhost.	2013-10-10 19:24:00 -07:00
Reynold Xin	ec2e2ed1e1	Use the same Executor in LocalScheduler as in ClusterScheduler.	2013-10-10 18:55:25 -07:00
Matei Zaharia	c71499b779	Merge pull request #19 from aarondav/master-zk Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch `d5a96fe`), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from `d5a96fe`. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.	2013-10-10 17:16:42 -07:00
Harvey Feng	5a99e67894	Add an optional closure parameter to HadoopRDD instantiation to used when creating any local JobConfs.	2013-10-10 16:35:52 -07:00
Reynold Xin	357733d292	Rename kill -> cancel in user facing API / documentation.	2013-10-10 13:27:38 -07:00
Matei Zaharia	001d13f7b9	Merge branch 'master' into fast-map Conflicts: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala	2013-10-10 13:26:43 -07:00
Reynold Xin	ddf64f019f	Support job cancellation in multi-pool scheduler.	2013-10-10 13:20:27 -07:00
Reynold Xin	3bd2890d2b	Fixed the deadlock situation in multi-job actions and added more unit tests.	2013-10-10 12:07:09 -07:00
Aaron Davidson	42d8b8efe6	Address Matei's comments on documentation Updates to the documentation and changing some logError()s to logWarning()s.	2013-10-10 00:33:47 -07:00
Reynold Xin	0353f74a9a	Put the job cancellation handling into the dagscheduler's main event loop.	2013-10-10 00:28:00 -07:00
Reynold Xin	dbae7795ba	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/main/scala/org/apache/spark/CacheManager.scala core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerSource.scala	2013-10-09 22:57:35 -07:00
Reynold Xin	53895f9cde	Implemented FutureAction, FutureJob, CancellablePromise. Implemented more unit tests for async actions.	2013-10-09 22:43:06 -07:00
Reynold Xin	320418f7c8	Merge pull request #49 from mateiz/kryo-fix-2 Fix Chill serialization of Range objects It used to write out each element one by one, creating very large objects.	2013-10-09 16:55:30 -07:00
Reynold Xin	215238cb39	Merge pull request #50 from kayousterhout/SPARK-908 Fix race condition in SparkListenerSuite (fixes SPARK-908).	2013-10-09 16:49:44 -07:00
Matei Zaharia	c84c205289	Fix Chill serialization of Range objects, which used to write out each element, and register user and Spark classes before Chill's serializers to let them override Chill's behavior in general.	2013-10-09 16:23:40 -07:00
Kay Ousterhout	36966f65df	Style fixes	2013-10-09 15:36:34 -07:00
Kay Ousterhout	3f7e9b265c	Fixed comment to use javadoc style	2013-10-09 15:23:04 -07:00
Kay Ousterhout	a34a4e8174	Fix race condition in SparkListenerSuite (fixes SPARK-908).	2013-10-09 15:07:53 -07:00
Patrick Wendell	bd3bcc5f8e	Use standard abbreviations in metrics labels	2013-10-09 11:16:24 -07:00
Patrick Wendell	19d445d37c	Merge pull request #22 from GraceH/metrics-naming SPARK-900 Use coarser grained naming for metrics see SPARK-900 Use coarser grained naming for metrics. Now the new metric name is formatted as {XXX.YYY.ZZZ.COUNTER_UNIT}, XXX.YYY.ZZZ represents the group name, which can group several metrics under the same Ganglia view.	2013-10-09 11:08:34 -07:00
Matei Zaharia	12d593129d	Create fewer function objects in uses of AppendOnlyMap.changeValue	2013-10-08 23:16:51 -07:00
Matei Zaharia	0b35051f19	Address some comments on code clarity	2013-10-08 23:16:17 -07:00
Matei Zaharia	4acbc5afdd	Moved files that were in the wrong directory after package rename	2013-10-08 23:16:17 -07:00
Matei Zaharia	0e40cfabf8	Fix some review comments	2013-10-08 23:16:16 -07:00
Matei Zaharia	b535db7d89	Added a fast and low-memory append-only map implementation for cogroup and parallel reduce operations	2013-10-08 23:14:38 -07:00
Reynold Xin	e67d5b962a	Merge pull request #43 from mateiz/kryo-fix Don't allocate Kryo buffers unless needed I noticed that the Kryo serializer could be slower than the Java one by 2-3x on small shuffles because it spend a lot of time initializing Kryo Input and Output objects. This is because our default buffer size for them is very large. Since the serializer is often used on streams, I made the initialization lazy for that, and used a smaller buffer (auto-managed by Kryo) for input.	2013-10-08 22:57:38 -07:00
Grace Huang	f7628e4033	remove those futile suffixes like number/count	2013-10-09 08:36:41 +08:00
Aaron Davidson	749233b869	Revert change to spark-class Also adds comment about how to configure for FaultToleranceTest.	2013-10-08 11:41:52 -07:00
Grace Huang	22bed59d2d	create metrics name manually.	2013-10-08 18:01:11 +08:00
Grace Huang	188abbf8f1	Revert "SPARK-900 Use coarser grained naming for metrics" This reverts commit `4b68be5f3c`.	2013-10-08 17:45:14 +08:00
Grace Huang	a2af6b543a	Revert "remedy the line-wrap while exceeding 100 chars" This reverts commit `892fb8ffa8`.	2013-10-08 17:44:56 +08:00
Patrick Wendell	8b377718b8	Responses to review	2013-10-07 20:03:35 -07:00
Matei Zaharia	a8725bf8f8	Don't allocate Kryo buffers unless needed	2013-10-07 19:16:35 -07:00
Patrick Wendell	391133f66a	Fix inconsistent and incorrect log messages in shuffle read path	2013-10-07 17:24:18 -07:00
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Kay Ousterhout	fdc52b2f8b	Added back fully qualified class name	2013-10-06 18:45:43 -07:00
Aaron Davidson	718e8c2052	Change url format to spark://host1:port1,host2:port2 This replaces the format of spark://host1:port1,spark://host2:port2 and is more consistent with ZooKeeper's zk:// urls.	2013-10-06 00:02:08 -07:00
Aaron Davidson	e1190229e1	Add end-to-end test for standalone scheduler fault tolerance Docker files drawn mostly from Matt Masse. Some updates from Andre Schumacher.	2013-10-05 23:20:31 -07:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Matei Zaharia	4a25b116d4	Merge pull request #20 from harveyfeng/hadoop-config-cache Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads. Note: originally from https://github.com/mesos/spark/pull/942 Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same. This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now). As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute(). Few other notes: Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv. SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.	2013-10-05 19:28:55 -07:00
Harvey Feng	6a2bbec5e3	Some comments regarding JobConf and InputFormat caching for HadoopRDDs.	2013-10-05 17:53:58 -07:00
Harvey Feng	96929f28bb	Make HadoopRDD object Spark private.	2013-10-05 17:14:19 -07:00
Harvey Feng	b5e93c1227	Fix API changes; lines > 100 chars.	2013-10-05 16:57:08 -07:00
Aaron Davidson	0f070279e7	Address Matei's comments	2013-10-05 15:15:29 -07:00
Matei Zaharia	100222b048	Merge pull request #27 from davidmccauley/master SPARK-920/921 - JSON endpoint updates 920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field. JSON now delivered as: url:spark://127.0.0.1:7077 921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.	2013-10-05 13:38:59 -07:00
Mridul Muralidharan	b5025d90bb	- Allow for finer control of cleaner - Address review comments, move to incubator spark - Also includes a change to speculation - including preventing exceptions in rare cases.	2013-10-06 00:35:51 +05:30
Aaron Davidson	db6f154940	Fix race conditions during recovery One major change was the use of messages instead of raw functions as the parameter of Akka scheduled timers. Since messages are serialized, unlike raw functions, the behavior is easier to think about and doesn't cause race conditions when exceptions are thrown. Another change is to avoid using global pointers that might change without a lock.	2013-10-04 19:54:33 -07:00
Kay Ousterhout	7b5ae23a37	Renamed StandaloneX to CoarseGrainedX. The previous names were confusing because the components weren't just used in Standalone mode -- in fact, the scheduler used for Standalone mode is called SparkDeploySchedulerBackend. So, the previous names were misleading.	2013-10-04 13:56:43 -07:00
Andre Schumacher	c84946fe21	Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.	2013-10-04 11:56:47 -07:00
Reynold Xin	d29e8035a0	Added countAsync and various unit tests for async actions.	2013-10-03 15:13:44 -07:00
tgravescs	0fff4ee852	Adding in the --addJars option to make SparkContext.addJar work on yarn and cleanup the classpaths	2013-10-03 11:52:16 -05:00
Reynold Xin	802bfb870d	- Created AsyncRDDActions. - Make FutureJob a Scala Future instead of Java Future.	2013-10-03 01:22:28 -07:00
Reynold Xin	e8e917f209	Merge branch 'master' into kill Conflicts: core/src/main/scala/org/apache/spark/TaskEndReason.scala core/src/main/scala/org/apache/spark/executor/Executor.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 23:01:34 -07:00
Reynold Xin	1c48ba0d9f	Merge remote-tracking branch 'origin' into kill Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 16:40:44 -07:00
David McCauley	1577b373a9	SPARK-921 - Add Application UI URL to ApplicationInfo Json output	2013-10-02 15:03:41 +01:00
David McCauley	351da54676	SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated	2013-10-02 13:23:38 +01:00
Kay Ousterhout	0dcad2edcb	Added additional unit test for repeated task failures	2013-09-30 23:26:15 -07:00
Kay Ousterhout	dea4677c88	Fixed compilation errors and broken test.	2013-09-30 22:07:01 -07:00
Kay Ousterhout	8deda427bc	Merge remote-tracking branch 'upstream/master' into results_through-bm Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala	2013-09-30 10:16:58 -07:00
Kay Ousterhout	58b764b7c6	Addressed Matei's code review comments	2013-09-30 10:11:59 -07:00
Grace Huang	892fb8ffa8	remedy the line-wrap while exceeding 100 chars	2013-09-30 20:12:55 +08:00
Harvey Feng	7d06bdde1d	Merge HadoopDatasetRDD into HadoopRDD.	2013-09-29 20:08:03 -07:00
Grace Huang	4b68be5f3c	SPARK-900 Use coarser grained naming for metrics	2013-09-27 14:47:38 +08:00
Harvey Feng	417085716a	Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache	2013-09-26 15:49:42 -07:00
Aaron Davidson	42d72308fb	Add license notices	2013-09-26 15:45:20 -07:00
Aaron Davidson	f549ea33d3	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.	2013-09-26 15:04:23 -07:00
Aaron Davidson	d5a96feccb	Standalone Scheduler fault recovery Implements a basic form of Standalone Scheduler fault recovery. In particular, this allows faults to be manually recovered from by means of restarting the Master process on the same machine. This is the majority of the code necessary for general fault tolerance, which will first elect a leader and then recover the Master state. In order to enable fault recovery, the Master will persist a small amount of state related to the registration of Workers and Applications to disk. If the Master is started and sees that this state is still around, it will enter Recovery mode, during which time it will not schedule any new Executors on Workers (but it does accept the registration of new Clients and Workers). At this point, the Master attempts to reconnect to all Workers and Client applications that were registered at the time of failure. After confirming either the existence or nonexistence of all such nodes (within a certain timeout), the Master will exit Recovery mode and resume normal scheduling.	2013-09-26 14:59:35 -07:00
Reynold Xin	70a0b993d4	Merge pull request #14 from kayousterhout/untangle_scheduler Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.	2013-09-26 14:11:54 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Reynold Xin	560ee5c9bb	Merge pull request #7 from wannabeast/memorystore-fixes some minor fixes to MemoryStore This is a repeat of #5, moved to its own branch in my repo. This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.	2013-09-26 11:27:34 -07:00
Patrick Wendell	6566a19b38	Merge pull request #9 from rxin/limit Smarter take/limit implementation.	2013-09-26 08:01:04 -07:00
Kay Ousterhout	d85fe41b2b	Improved organization of scheduling packages. This commit does not change any code -- only file organization. There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it is within the cluster package.	2013-09-25 12:45:46 -07:00
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Holden Karau	0cef683553	Fix formatting :)	2013-09-23 19:39:42 -07:00
Reynold Xin	ff540a015b	Merge branch 'master' of github.com:markhamstra/incubator-spark	2013-09-23 11:55:02 -07:00
Kay Ousterhout	c75eb14fe5	Send Task results through the block manager when larger than Akka frame size. This change requires adding an extra failure mode: tasks can complete successfully, but the result gets lost or flushed from the block manager before it's been fetched.	2013-09-22 21:20:48 -07:00
Holden Karau	7fe0b0ff56	Switch indent from 2 to 4 spaces	2013-09-22 19:44:51 -07:00
Harvey	ef34cfb26c	Move Configuration broadcasts to SparkContext.	2013-09-22 14:43:58 -07:00
Harvey	a6eeb5ffd5	Add a cache for HadoopRDD metadata needed during computation. Currently, the cache is in SparkHadoopUtils, since it's conveniently a member of the SparkEnv.	2013-09-22 03:09:17 -07:00
jerryshao	77e9da1f34	Change Exception to NoSuchElementException and minor style fix	2013-09-22 16:50:08 +08:00
jerryshao	85024acd2e	Remove infix style and others	2013-09-22 14:20:55 +08:00
jerryshao	5850f599dd	Refactor FairSchedulableBuilder: 1. Configuration can be read from classpath if not set explicitly. 2. Add missing close handler.	2013-09-22 14:20:55 +08:00
Reynold Xin	a2ea069a5f	Merge pull request #937 from jerryshao/localProperties-fix Fix PR926 local properties issues in Spark Streaming like scenarios	2013-09-21 23:04:42 -07:00
Harvey	be0fc7246f	Split HadoopRDD into one for general Hadoop datasets and one tailored to Hadoop files, which is a common case. This is the first step to avoiding unnecessary Configuration broadcasts per HadoopRDD instantiation.	2013-09-21 21:14:14 -07:00
jerryshao	aa0c29f747	Add barrier for local properties unit test and fix some styles	2013-09-22 09:53:11 +08:00
Reynold Xin	42571d30d0	Smarter take/limit implementation.	2013-09-20 17:09:53 -07:00
Reynold Xin	1d87616b61	Made output of CoGroup and aggregations interruptible.	2013-09-19 23:31:36 -07:00
Mike	9524b943a4	Synchronize on "entries" the remaining update to "currentMemory". Make "currentMemory" @volatile, so that it's reads in ensureFreeSpace() are atomic and up-to-date--i.e., currentMemory can't increase while putLock is held (though it could decrease, which would only help ensureFreeSpace()).	2013-09-19 23:31:35 -07:00
Ankur Dave	3ebbcaf21c	After unit tests, clear port properties unconditionally	2013-09-19 22:14:38 -07:00
Ankur Dave	026dba6aba	After unit tests, clear port properties unconditionally In MapOutputTrackerSuite, the "remote fetch" test sets spark.driver.port and spark.hostPort, assuming that they will be cleared by LocalSparkContext. However, the test never sets sc, so it remains null, causing LocalSparkContext to skip clearing these properties. Subsequent tests therefore fail with java.net.BindException: "Address already in use". This commit makes LocalSparkContext clear the properties even if sc is null.	2013-09-19 22:05:23 -07:00
Ankur Dave	d3cbde0085	Import appropriate Spark core classes	2013-09-19 19:29:58 -07:00
Ankur Dave	9632ad3b21	Move IndexedRDDSuite to org.apache.spark	2013-09-19 19:25:52 -07:00
Ankur Dave	4c694bd705	Move IndexedRDD and GraphSuite to org.apache.spark	2013-09-19 19:13:07 -07:00
Reynold Xin	c5e40954eb	Wrap around cached data to InterruptibleIterator.	2013-09-19 18:44:38 -07:00
Reynold Xin	c68e72be59	Added comment to InterruptibleIterator.	2013-09-19 18:40:06 -07:00
Reynold Xin	70953810b4	Added task killing iterator to RDDs that take inputs.	2013-09-19 18:33:16 -07:00
Reynold Xin	f19984dafe	More logging changes (task killing for local cluster doesn't work yet).	2013-09-19 18:14:51 -07:00

1 2 3 4 5 ...

2267 commits