ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Matei Zaharia	dfd40e9f6f	Merge pull request #175 from kayousterhout/no_retry_not_serializable Don't retry tasks when they fail due to a NotSerializableException As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.	2013-11-14 19:44:50 -08:00
Matei Zaharia	ed25105fd9	Merge pull request #174 from ahirreddy/master Write Spark UI url to driver file on HDFS This makes the SIMR code path simpler	2013-11-14 19:43:55 -08:00
Kay Ousterhout	29c88e408e	Don't retry tasks when they fail due to a NotSerializableException	2013-11-14 15:15:19 -08:00
Kay Ousterhout	b4546ba9e6	Fix bug where scheduler could hang after task failure. When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.	2013-11-14 13:55:03 -08:00
Reynold Xin	1a4cfbea33	Merge pull request #169 from kayousterhout/mesos_fix Don't ignore spark.cores.max when using Mesos Coarse mode totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.	2013-11-14 10:32:11 -08:00
Lian, Cheng	cc8995c8f4	Fixed a scaladoc typo in HadoopRDD.scala	2013-11-14 18:17:05 +08:00
Kay Ousterhout	5125cd3466	Don't ignore spark.cores.max when using Mesos Coarse mode	2013-11-13 23:06:17 -08:00
Matei Zaharia	2054c61a18	Merge pull request #159 from liancheng/dagscheduler-actor-refine Migrate the daemon thread started by DAGScheduler to Akka actor `DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread). But things become a little complicated when taking existing test code into consideration. Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct. Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896). At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.	2013-11-13 16:49:55 -08:00
Ahir Reddy	0ea1f8b225	Write Spark UI url to driver file on HDFS	2013-11-13 15:23:36 -08:00
Matei Zaharia	39af914b27	Merge pull request #166 from ahirreddy/simr-spark-ui SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ... ...be retrieved by SIMR clients	2013-11-13 08:39:05 -08:00
Matei Zaharia	b8bf04a085	Merge pull request #160 from xiajunluan/JIRA-923 Fix bug JIRA-923 Fix column sort issue in UI for JIRA-923. https://spark-project.atlassian.net/browse/SPARK-923 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala	2013-11-12 16:19:50 -08:00
Ahir Reddy	ccb099e804	SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients	2013-11-12 15:58:41 -08:00
Andrew xia	e13da05424	fix format error	2013-11-11 19:15:45 +08:00
Andrew xia	37d2f3749e	cut lines to less than 100	2013-11-11 15:49:32 +08:00
Andrew xia	b3208063af	Fix bug JIRA-923	2013-11-11 15:39:10 +08:00
Lian, Cheng	e2a43b3dcc	Made some changes according to suggestions from @aarondav	2013-11-11 12:21:54 +08:00
Josh Rosen	ffa5bedf46	Send PySpark commands as bytes insetad of strings.	2013-11-10 16:46:00 -08:00
Josh Rosen	cbb7f04aef	Add custom serializer support to PySpark. For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().	2013-11-10 16:45:38 -08:00
Lian, Cheng	ba55285177	Put the periodical resubmitFailedStages() call into a scheduled task	2013-11-11 01:25:35 +08:00
Reynold Xin	c845611fc3	Moved the Spark internal class registration for Kryo into an object, and added more classes (e.g. MapStatus, BlockManagerId) to the registration.	2013-11-09 23:00:08 -08:00
Reynold Xin	7c5f70d873	Call Kryo setReferences before calling user specified Kryo registrator.	2013-11-09 22:43:36 -08:00
Matei Zaharia	87954d4c85	Merge pull request #154 from soulmachine/ClusterScheduler Replace the thread inside ClusterScheduler.start() with an Akka scheduler Threads are precious resources so that we shouldn't abuse them	2013-11-09 17:53:25 -08:00
Reynold Xin	83bf1920c8	Merge pull request #155 from rxin/jobgroup Don't reset job group when a new job description is set.	2013-11-09 15:40:29 -08:00
Reynold Xin	28f27097cf	Don't reset job group when a new job description is set.	2013-11-09 13:59:31 -08:00
Matei Zaharia	8af99f2356	Merge pull request #149 from tgravescs/fixSecureHdfsAccess Fix secure hdfs access for spark on yarn https://github.com/apache/incubator-spark/pull/23 broke secure hdfs access. Not sure if it works with secure hdfs on standalone. Fixing it at least for spark on yarn. The broadcasting of jobconf change also broke secure hdfs access as it didn't take into account things calling the getPartitions before sparkContext is initialized. The DAGScheduler does this as it tries to getShuffleMapStage.	2013-11-09 13:48:00 -08:00
Matei Zaharia	72a601ec31	Merge pull request #152 from rxin/repl Propagate SparkContext local properties from spark-repl caller thread to the repl execution thread.	2013-11-09 11:55:16 -08:00
soulmachine	28115fa8cb	replace the thread with a Akka scheduler	2013-11-09 22:38:27 +08:00
Lian, Cheng	765ebca04f	Remove unnecessary null checking	2013-11-09 21:13:03 +08:00
Lian, Cheng	2539c06745	Replaced the daemon thread started by DAGScheduler with an actor	2013-11-09 19:05:18 +08:00
Reynold Xin	319299941d	Propagate the SparkContext local property from the thread that calls the spark-repl to the actual execution thread.	2013-11-09 00:32:14 -08:00
Russell Cardullo	ef85a51f85	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-08 16:36:03 -08:00
Aaron Davidson	dd63c548c2	Use SPARK_HOME instead of user.dir in ExecutorRunnerTest	2013-11-08 12:51:05 -08:00
tgravescs	13a19505e4	Don't call the doAs if user is unknown or the same user that is already running	2013-11-08 12:04:09 -06:00
tgravescs	f95cb04e40	Remove the runAsUser as it breaks secure hdfs access	2013-11-08 10:07:15 -06:00
tgravescs	5f9ed51719	Fix access to Secure HDFS	2013-11-08 08:41:57 -06:00
Reynold Xin	3d4ad84b63	Merge pull request #148 from squito/include_appId Include appId in executor cmd line args add the appId back into the executor cmd line args. I also made a pretty lame regression test, just to make sure it doesn't get dropped in the future. not sure it will run on the build server, though, b/c `ExecutorRunner.buildCommandSeq()` expects to be abel to run the scripts in `bin`.	2013-11-07 11:08:27 -08:00
Imran Rashid	ca66f5d5a2	fix formatting	2013-11-07 07:23:59 -06:00
Imran Rashid	8d3cdda9a2	very basic regression test to make sure appId doesnt get dropped in future	2013-11-07 01:35:48 -06:00
Imran Rashid	36e832bff0	include the appid in the cmd line arguments to Executors	2013-11-07 01:11:49 -06:00
jerryshao	12dc385a49	Add Spark multi-user support for standalone mode and Mesos	2013-11-07 11:18:09 +08:00
Reynold Xin	aadeda5e76	Merge pull request #144 from liancheng/runjob-clean Removed unused return value in SparkContext.runJob Return type of this `runJob` version is `Unit`: def runJob[T, U: ClassManifest]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], allowLocal: Boolean, resultHandler: (Int, U) => Unit) { ... } It's obviously unnecessary to "return" `result`.	2013-11-06 13:27:47 -08:00
Aaron Davidson	80e98d2bd7	Attempt to fix SparkListenerSuite breakage Could not reproduce locally, but this test could've been flaky if the build machine was too fast.	2013-11-06 08:03:35 -08:00
Lian, Cheng	a0c4565183	Removed unused return value in SparkContext.runJob	2013-11-06 23:18:59 +08:00
Reynold Xin	a02eed6811	Ignore a task update status if the executor doesn't exist anymore.	2013-11-05 18:46:38 -08:00
Lian, Cheng	8b4c994e8c	Using compact case class pattern matching syntax to simplify code in DAGScheduler.processEvent	2013-11-05 17:18:42 +08:00
Reynold Xin	81065321c0	Merge pull request #139 from aarondav/shuffle-next Never store shuffle blocks in BlockManager After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use within BlockManager (they had a no-arg constructor!). This patch completely eliminates them, saving us around 100-150 bytes per shuffle block. The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding state saved by the MapOutputTracker. Note: This should not be merged directly into 0.8.0 -- see #138	2013-11-04 20:47:14 -08:00
Aaron Davidson	93c90844cb	Never store shuffle blocks in BlockManager After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use within BlockManager (they had a no-arg constructor!). This patch completely eliminates them, saving us around 100-150 bytes per shuffle block. The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding state saved by the MapOutputTracker.	2013-11-04 18:43:42 -08:00
Reynold Xin	0b26a392df	Merge pull request #128 from shimingfei/joblogger-doc add javadoc to JobLogger, and some small fix against Spark-941 add javadoc to JobLogger, output more info for RDD, modify recordStageDepGraph to avoid output duplicate stage dependency information (cherry picked from commit `518cf22eb2`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-11-04 18:22:06 -08:00
Aaron Davidson	1ba11b1c6a	Minor cleanup in ShuffleBlockManager	2013-11-04 17:16:41 -08:00
Aaron Davidson	6201e5e249	Refactor ShuffleBlockManager to reduce public interface - ShuffleBlocks has been removed and replaced by ShuffleWriterGroup. - ShuffleWriterGroup no longer contains a reference to a ShuffleFileGroup. - ShuffleFile has been removed and its contents are now within ShuffleFileGroup. - ShuffleBlockManager.forShuffle has been replaced by a more stateful forMapTask.	2013-11-04 09:41:04 -08:00
Aaron Davidson	b0cf19fe3c	Add javadoc and remove unused code	2013-11-03 22:16:58 -08:00
Aaron Davidson	39d93ed4b9	Clean up test files properly For some reason, even calling java.nio.Files.createTempDirectory().getFile.deleteOnExit() does not delete the directory on exit. Guava's analagous function seems to work, however.	2013-11-03 21:52:59 -08:00
Aaron Davidson	a0bb569a81	use OpenHashMap, remove monotonicity requirement, fix failure bug	2013-11-03 21:34:56 -08:00
Aaron Davidson	8703898d3f	Address Reynold's comments	2013-11-03 21:34:44 -08:00
Aaron Davidson	3ca52309f2	Fix test breakage	2013-11-03 21:34:44 -08:00
Aaron Davidson	1592adfa25	Add documentation and address other comments	2013-11-03 21:34:44 -08:00
Aaron Davidson	7d44dec9bd	Fix weird bug with specialized PrimitiveVector	2013-11-03 21:34:43 -08:00
Aaron Davidson	7453f31181	Address minor comments	2013-11-03 21:34:43 -08:00
Aaron Davidson	84991a1b91	Memory-optimized shuffle file consolidation Overhead of each shuffle block for consolidation has been reduced from >300 bytes to 8 bytes (1 primitive Long). Verified via profiler testing with 1 mil shuffle blocks, net overhead was ~8,400,000 bytes. Despite the memory-optimized implementation incurring extra CPU overhead, the runtime of the shuffle phase in this test was only around 2% slower, while the reduce phase was 40% faster, when compared to not using any shuffle file consolidation.	2013-11-03 21:34:13 -08:00
Reynold Xin	eb5f8a3f97	Code review feedback.	2013-11-03 18:11:44 -08:00
Josh Rosen	7d68a81a8e	Remove Pickle-wrapping of Java objects in PySpark. If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.	2013-11-03 11:03:02 -08:00
Josh Rosen	a48d88d206	Replace magic lengths with constants in PySpark. Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.	2013-11-03 10:54:24 -08:00
Reynold Xin	1e9543b567	Fixed a bug that uses twice amount of memory for the primitive arrays due to a scala compiler bug. Also addressed Matei's code review comment.	2013-11-02 23:19:01 -07:00
Reynold Xin	da6bb0aedd	Merge branch 'master' into hash1	2013-11-02 22:45:15 -07:00
Evan Chan	f3679fd494	Add local: URI support to addFile as well	2013-11-01 11:08:03 -07:00
Matei Zaharia	8f1098a3f0	Merge pull request #117 from stephenh/avoid_concurrent_modification_exception Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.	2013-10-30 20:11:48 -07:00
Matei Zaharia	dc9ce16f6b	Merge pull request #126 from kayousterhout/local_fix Fixed incorrect log message in local scheduler This change is especially relevant at the moment, because some users are seeing this failure, and the log message is misleading/incorrect (because for the tests, the max failures is set to 0, not 4)	2013-10-30 17:01:56 -07:00
Matei Zaharia	33de11c51d	Merge pull request #124 from tgravescs/sparkHadoopUtilFix Pull SparkHadoopUtil out of SparkEnv (jira SPARK-886) Having the logic to initialize the correct SparkHadoopUtil in SparkEnv prevents it from being used until after the SparkContext is initialized. This causes issues like https://spark-project.atlassian.net/browse/SPARK-886. It also makes it hard to use in singleton objects. For instance I want to use it in the security code.	2013-10-30 16:58:27 -07:00
Kay Ousterhout	ff038eb4e0	Fixed incorrect log message in local scheduler	2013-10-30 15:27:23 -07:00
Matei Zaharia	618c1f6cf3	Merge pull request #125 from velvia/2013-10/local-jar-uri Add support for local:// URI scheme for addJars() This PR adds support for a new URI scheme for SparkContext.addJars(): `local://file/path`. The local scheme indicates that the `/file/path` exists on every worker node. The reason for its existence is for big library JARs, which would be really expensive to serve using the standard HTTP fileserver distribution method, especially for big clusters. Today the only inexpensive method (assuming such a file is on every host, via say NFS, rsync, etc.) of doing this is to add the JAR to the SPARK_CLASSPATH, but we want a method where the user does not need to modify the Spark configuration. I would add something to the docs, but it's not obvious where to add it. Oh, and it would be great if this could be merged in time for 0.8.1.	2013-10-30 12:03:44 -07:00
Stephen Haberman	09f3b677cb	Avoid match errors when filtering for spark.hadoop settings.	2013-10-30 12:29:39 -05:00
tgravescs	f231aaa24c	move the hadoopJobMetadata back into SparkEnv	2013-10-30 11:46:12 -05:00
Evan Chan	de0285556a	Add support for local:// URI scheme for addJars() This indicates that a jar is available locally on each worker node.	2013-10-30 09:41:35 -07:00
tgravescs	54d9c6f253	Merge remote-tracking branch 'upstream/master' into sparkHadoopUtilFix	2013-10-30 10:41:21 -05:00
Josh Rosen	cb9c8a922f	Extract BlockInfo classes from BlockManager. This saves space, since the inner classes needed to keep a reference to the enclosing BlockManager.	2013-10-29 18:06:51 -07:00
Stephen Haberman	3a388c320c	Use Properties.clone() instead.	2013-10-29 19:20:40 -05:00
Josh Rosen	846b1cf5ab	Store fewer BlockInfo fields for shuffle blocks.	2013-10-29 15:14:29 -07:00
tgravescs	eeb5f64c67	Remove SparkHadoopUtil stuff from SparkEnv	2013-10-29 17:12:16 -05:00
Josh Rosen	2d7cf6a271	Restructure BlockInfo fields to reduce memory use.	2013-10-27 23:01:03 -07:00
Matei Zaharia	aec9bf9060	Merge pull request #112 from kayousterhout/ui_task_attempt_id Display both task ID and task attempt ID in UI, and rename taskId to taskAttemptId Previously only the task attempt ID was shown in the UI; this was confusing because the job can be shown as complete while there are tasks still running. Showing the task ID in addition to the attempt ID makes it clear which tasks are redundant. This commit also renames taskId to taskAttemptId in TaskInfo and in the local/cluster schedulers. This identifier was used to uniquely identify attempts, not tasks, so the current naming was confusing. The new naming is also more consistent with map reduce.	2013-10-27 19:32:00 -07:00
Stephen Haberman	a6ae2b4832	Handle ConcurrentModificationExceptions in SparkContext init. System.getProperties.toMap will fail-fast when concurrently modified, and it seems like some other thread started by SparkContext does a System.setProperty during it's initialization. Handle this by just looping on ConcurrentModificationException, which seems the safest, since the non-fail-fast methods (Hastable.entrySet) have undefined behavior under concurrent modification.	2013-10-27 14:08:32 -05:00
Aaron Davidson	4261e834cb	Use flag instead of name check.	2013-10-26 23:53:38 -07:00
Aaron Davidson	596f18479e	Eliminate extra memory usage when shuffle file consolidation is disabled Otherwise, we see SPARK-946 even when shuffle file consolidation is disabled. Fixing SPARK-946 is still forthcoming.	2013-10-26 22:35:01 -07:00
Kay Ousterhout	ae22b4dd99	Display both task ID and task index in UI	2013-10-26 22:18:39 -07:00
Matei Zaharia	bab496c120	Merge pull request #108 from alig/master Changes to enable executing by using HDFS as a synchronization point between driver and executors, as well as ensuring executors exit properly.	2013-10-25 18:28:43 -07:00
Matei Zaharia	d307db6e55	Merge pull request #102 from tdas/transform Added new Spark Streaming operations New operations - transformWith which allows arbitrary 2-to-1 DStream transform, added to Scala and Java API - StreamingContext.transform to allow arbitrary n-to-1 DStream - leftOuterJoin and rightOuterJoin between 2 DStreams, added to Scala and Java API - missing variations of join and cogroup added to Scala Java API - missing JavaStreamingContext.union Updated a number of Java and Scala API docs	2013-10-25 17:26:06 -07:00
Ali Ghodsi	eef261c892	fixing comments on PR	2013-10-25 16:48:33 -07:00
Matei Zaharia	85e2cab6f6	Merge pull request #111 from kayousterhout/ui_name Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name. @pwendell -- let me know if this change was intentional	2013-10-25 14:46:06 -07:00
Tathagata Das	dc9570782a	Merge branch 'apache-master' into transform	2013-10-25 14:22:23 -07:00
Kay Ousterhout	a9c8d83aaf	Properly display the name of a stage in the UI. This fixes a bug introduced by the fix for SPARK-940, which changed the UI to display the RDD name rather than the stage name. As a result, no name for the stage was shown when using the Spark shell, which meant that there was no way to click on the stage to see more details (e.g., the running tasks). This commit changes the UI back to using the stage name.	2013-10-25 12:00:09 -07:00
Patrick Wendell	e5f6d5697b	Spacing fix	2013-10-24 22:08:06 -07:00
Patrick Wendell	31e92b72e3	Adding Java versions and associated tests	2013-10-24 21:14:56 -07:00
Patrick Wendell	05ac9940ee	Adding tests	2013-10-24 14:31:34 -07:00
Patrick Wendell	2fda84fe3f	Always use a shuffle	2013-10-24 14:31:34 -07:00
Patrick Wendell	08c1a42d7d	Add a `repartition` operator. This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.	2013-10-24 14:31:33 -07:00
Ali Ghodsi	05a0df2b9e	Makes Spark SIMR ready.	2013-10-24 11:59:51 -07:00
Tathagata Das	0400aba1c0	Merge branch 'apache-master' into transform	2013-10-24 11:05:00 -07:00
Tathagata Das	bacfe5ebca	Added JavaStreamingContext.transform	2013-10-24 10:56:24 -07:00
Matei Zaharia	1dc776b863	Merge pull request #93 from kayousterhout/ui_new_state Show "GETTING_RESULTS" state in UI. This commit adds a set of calls using the SparkListener interface that indicate when a task is remotely fetching results, so that we can display this (potentially time-consuming) phase of execution to users through the UI.	2013-10-23 22:05:52 -07:00
Kay Ousterhout	b45352e373	Clear akka frame size property in tests	2013-10-23 18:23:28 -07:00

1 2 3 4 5 ...

2453 commits