ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Patrick Wendell	b1c6fa1584	Document missing configs and set shuffle consolidation to false.	2013-12-04 18:39:34 -08:00
Patrick Wendell	182f9baeed	Merge pull request #227 from pwendell/master Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.	2013-12-04 15:52:07 -08:00
Patrick Wendell	380b90b9b3	Fix small bug in web UI and minor clean-up. There was a bug where sorting order didn't work correctly for write time metrics. I also cleaned up some earlier code that fixed the same issue for read and write bytes.	2013-12-04 14:41:48 -08:00
Andrew Ash	217611680d	Add missing space after "Serialized" in StorageLevel Current code creates outputs like: scala> res0.getStorageLevel.description res2: String = Serialized1x Replicated	2013-12-04 11:29:20 -08:00
Matei Zaharia	d6e5473872	Merge pull request #223 from rxin/transient Mark partitioner, name, and generator field in RDD as @transient. As part of the effort to reduce serialized task size.	2013-12-04 10:28:50 -08:00
Reynold Xin	974a69d79c	Marked doCheckpointCalled as transient.	2013-12-03 11:34:38 -08:00
Mark Hamstra	403234dd0d	SparkListenerJobStart posted from local jobs	2013-12-03 09:57:32 -08:00
Mark Hamstra	f55d0b935d	Synchronous, inline cleanup after runLocally	2013-12-03 09:57:32 -08:00
Mark Hamstra	c9fcd909d0	Local jobs post SparkListenerJobEnd, and DAGScheduler data structure cleanup always occurs before any posting of SparkListenerJobEnd.	2013-12-03 09:57:32 -08:00
Mark Hamstra	9ae2d094a9	Tightly couple stageIdToJobIds and jobIdToStageIds	2013-12-03 09:57:32 -08:00
Mark Hamstra	27c45e5236	Cleaned up job cancellation handling	2013-12-03 09:57:32 -08:00
Mark Hamstra	686a420ddc	Refactoring to make job removal, stage removal, task cancellation clearer	2013-12-03 09:57:32 -08:00
Mark Hamstra	205566e56e	Improved comment	2013-12-03 09:57:32 -08:00
Mark Hamstra	94087c463b	Removed redundant residual re: reverted refactoring.	2013-12-03 09:57:31 -08:00
Mark Hamstra	982797dcba	Fixed intended side-effects	2013-12-03 09:57:31 -08:00
Mark Hamstra	6f8359b5ad	Actor instead of eventQueue for LocalJobCompleted	2013-12-03 09:57:31 -08:00
Mark Hamstra	51458ab4a1	Added stageId <--> jobId mapping in DAGScheduler ...and make sure that DAGScheduler data structures are cleaned up on job completion. Initial effort and discussion at https://github.com/mesos/spark/pull/842	2013-12-03 09:57:31 -08:00
Raymond Liu	4738818dd6	Fix pom.xml for maven build	2013-12-03 16:36:05 +08:00
Reynold Xin	58d9bbcfec	Merge pull request #217 from aarondav/mesos-urls Re-enable zk:// urls for Mesos SparkContexts This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility. Additionally added a unit test for the creation of all types of TaskSchedulers. Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.	2013-12-02 21:58:53 -08:00
Prashant Sharma	09e8be9a62	Made running SparkActorSystem specific to executors only.	2013-12-03 11:27:45 +05:30
Aaron Davidson	0f24576c08	Cleanup and documentation of SparkActorSystem	2013-12-03 11:05:12 +05:30
Reynold Xin	e34b4693d3	Mark partitioner, name, and generator field in RDD as @transient.	2013-12-02 21:24:44 -08:00
Kay Ousterhout	58b3aff9a8	Fixed problem with scheduler delay	2013-12-02 20:30:03 -08:00
Aaron Davidson	f6c8c1c7b6	Cleanup and documentation of SparkActorSystem	2013-12-02 11:42:53 -08:00
Prashant Sharma	5b11028a04	Made akka capable of tolerating fatal exceptions and moving on.	2013-12-02 10:47:39 +05:30
Reynold Xin	740922f25d	Merge pull request #219 from sundeepn/schedulerexception Scheduler quits when newStage fails The current scheduler thread does not handle exceptions from newStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.	2013-12-01 12:46:58 -08:00
Sundeep Narravula	be3ea2394f	Log exception in scheduler in addition to passing it to the caller. Code Styling changes.	2013-12-01 00:50:34 -08:00
Reynold Xin	9cf7f31e4d	Memoize preferred locations in ZippedPartitionsBaseRDD so preferred location computation doesn't lead to exponential explosion. (cherry picked from commit `e36fe55a03`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-11-30 18:10:52 -08:00
Sundeep Narravula	4d53830eb7	Scheduler quits when createStage fails. The current scheduler thread does not handle exceptions from createStage stage while launching new jobs. The thread fails on any exception that gets triggered at that level, leaving the cluster hanging with no schduler.	2013-11-30 16:18:12 -08:00
Aaron Davidson	96df26be47	Add spaces between tests	2013-11-29 13:20:43 -08:00
Prashant Sharma	5618af6803	Merge branch 'master' into wip-scala-2.10	2013-11-29 13:41:21 +05:30
Prashant Sharma	1bc83ca791	Changed defaults for akka to almost disable failure detector.	2013-11-29 13:41:05 +05:30
Lian, Cheng	4a1d966e26	More comments	2013-11-29 16:02:58 +08:00
Lian, Cheng	1e25086009	Updated some inline comments in DAGScheduler	2013-11-29 15:56:47 +08:00
Aaron Davidson	081a0b6861	Add unit test for SparkContext scheduler creation Since YARN and Mesos are not necessarily available in the system, they are allowed to pass as long as the YARN/Mesos code paths are exercised.	2013-11-28 20:40:57 -08:00
Aaron Davidson	37f161cf6b	Re-enable zk:// urls for Mesos SparkContexts This was broken in PR #71 when we explicitly disallow anything that didn't fit a mesos:// url. Although it is not really clear that a zk:// url should match Mesos, it is what the docs say and it is necessary for backwards compatibility.	2013-11-28 20:37:56 -08:00
Lian, Cheng	18def5d6f2	Bugfix: SPARK-965 & SPARK-966 SPARK-965: https://spark-project.atlassian.net/browse/SPARK-965 SPARK-966: https://spark-project.atlassian.net/browse/SPARK-966 * Add back DAGScheduler.start(), eventProcessActor is created and started here. Notice that function is only called by SparkContext. * Cancel the scheduled stage resubmission task when stopping eventProcessActor * Add a new DAGSchedulerEvent ResubmitFailedStages This event message is sent by the scheduled stage resubmission task to eventProcessActor. In this way, DAGScheduler.resubmitFailedStages is guaranteed to be executed from the same thread that runs DAGScheduler.processEvent. Please refer to discussion in SPARK-966 for details.	2013-11-28 17:46:06 +08:00
Prashant Sharma	3ec5d74766	Fixed the broken build.	2013-11-28 13:02:28 +05:30
Matei Zaharia	743a31a7ca	Merge pull request #210 from haitaoyao/http-timeout add http timeout for httpbroadcast While pulling task bytecode from HttpBroadcast server, there's no timeout value set. This may cause spark executor code hang and other task in the same executor process wait for the lock. I have encountered the issue in my cluster. Here's the stacktrace I captured : https://gist.github.com/haitaoyao/7655830 So add a time out value to ensure the task fail fast.	2013-11-27 18:24:39 -08:00
Prashant Sharma	17987778da	Merge branch 'master' into wip-scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py	2013-11-27 14:44:12 +05:30
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Matei Zaharia	fb6875dd5c	Merge pull request #146 from JoshRosen/pyspark-custom-serializers Custom Serializers for PySpark This pull request adds support for custom serializers to PySpark. For now, all Python-transformed (or parallelize()d RDDs) are serialized with the same serializer that's specified when creating SparkContext. For now, PySpark includes `PickleSerDe` and `MarshalSerDe` classes for using Python's `pickle` and `marshal` serializers. It's pretty easy to add support for other serializers, although I still need to add instructions on this. A few notable changes: - The Scala `PythonRDD` class no longer manipulates Pickled objects; data from `textFile` is written to Python as MUTF-8 strings. The Python code performs the appropriate bookkeeping to track which deserializer should be used when reading an underlying JavaRDD. This mechanism could also be used to support other data exchange formats, such as MsgPack. - Several magic numbers were refactored into constants. - Batching is implemented by wrapping / decorating an unbatched SerDe.	2013-11-26 20:55:40 -08:00
Matei Zaharia	330ada1766	Merge pull request #207 from henrydavidge/master Log a warning if a task's serialized size is very big As per Reynold's instructions, we now create a warning level log entry if a task's serialized size is too big. "Too big" is currently defined as 100kb. This warning message is generated at most once for each stage.	2013-11-26 19:08:33 -08:00
Harvey Feng	afe4fe7f5e	Merge remote-tracking branch 'origin/master' into yarn-2.2 Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala	2013-11-26 15:03:03 -08:00
hhd	57579934f0	Emit warning when task size > 100KB	2013-11-26 16:58:39 -05:00
Mark Hamstra	ed7ecb93ce	[SPARK-963] Wait for SparkListenerBus eventQueue to be empty before checking jobLogger state	2013-11-26 13:30:17 -08:00
Reynold Xin	cb976dfb50	Merge pull request #209 from pwendell/better-docs Improve docs for shuffle instrumentation	2013-11-26 10:23:19 -08:00
Prashant Sharma	560e44a8e1	Restored master address for client.	2013-11-26 18:18:05 +05:30
haitao.yao	db998a6e14	add http timeout for httpbroadcast	2013-11-26 18:23:48 +08:00
Prashant Sharma	d092a8cc6a	Fixed compile time warnings and formatting post merge.	2013-11-26 15:21:50 +05:30
Matei Zaharia	18d6df0e17	Merge pull request #86 from holdenk/master Add histogram functionality to DoubleRDDFunctions This pull request add histogram functionality to the DoubleRDDFunctions.	2013-11-26 00:00:07 -08:00
Patrick Wendell	297c09d4bb	Improve docs for shuffle instrumentation	2013-11-25 22:53:28 -08:00
Holden Karau	7222ee2977	Fix the test	2013-11-25 21:06:42 -08:00
Matei Zaharia	0e2109ddb2	Merge pull request #204 from rxin/hash OpenHashSet fixes Incorporated ideas from pull request #200. - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique. - Remember the grow threshold instead of recomputing it on each insert Also added unit tests for size estimation for specialized hash sets and maps.	2013-11-25 20:48:37 -08:00
Matei Zaharia	14bb465bb3	Merge pull request #201 from rxin/mappartitions Use the proper partition index in mapPartitionsWIthIndex mapPartitionsWithIndex uses TaskContext.partitionId as the partition index. TaskContext.partitionId used to be identical to the partition index in a RDD. However, pull request #186 introduced a scenario (with partition pruning) that the two can be different. This pull request uses the right partition index in all mapPartitionsWithIndex related calls. Also removed the extra MapPartitionsWIthContextRDD and put all the mapPartitions related functionality in MapPartitionsRDD.	2013-11-25 18:50:18 -08:00
Matei Zaharia	eb4296c8f7	Merge pull request #101 from colorant/yarn-client-scheduler For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.	2013-11-25 15:25:29 -08:00
Prashant Sharma	44fd30d3fb	Merge branch 'master' into scala-2.10-wip Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala	2013-11-25 18:10:54 +05:30
Prashant Sharma	489862a657	Remote death watch has a funny bug. https://gist.github.com/ScrapCodes/4805fd84906e40b7b03d	2013-11-25 18:00:02 +05:30
Reynold Xin	466fd06475	Incorporated ideas from pull request #200 . - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert	2013-11-25 18:27:26 +08:00
Reynold Xin	95c55df1c2	Added unit tests for size estimation for specialized hash sets and maps.	2013-11-25 18:27:06 +08:00
Prashant Sharma	77929cfeed	Fine tuning defaults for akka and restored tracking of dissassociated events, for they are delivered when a remote TCP socket is closed. Also made transport failure heartbeats larger interval for it is mostly not needed. As we are using remote death watch instead.	2013-11-25 14:13:21 +05:30
Matei Zaharia	859d62dc2a	Merge pull request #151 from russellcardullo/add-graphite-sink Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-24 16:19:51 -08:00
Matei Zaharia	65de73c7f8	Merge pull request #185 from mkolod/random-number-generator XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.	2013-11-24 15:52:33 -08:00
Reynold Xin	972171b9d9	Merge pull request #197 from aarondav/patrick-fix Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.	2013-11-25 07:50:46 +08:00
Reynold Xin	e9ff13ec72	Consolidated both mapPartitions related RDDs into a single MapPartitionsRDD. Also changed the semantics of the index parameter in mapPartitionsWithIndex from the partition index of the output partition to the partition index in the current RDD.	2013-11-24 17:56:43 +08:00
Matei Zaharia	9837a60234	Some other optimizations to AppendOnlyMap: - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert	2013-11-23 17:38:29 -08:00
Matei Zaharia	7535d7fbcb	Fixes to AppendOnlyMap: - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Use Object.equals() instead of Scala's == to compare keys, because the latter does extra casts for numeric types (see the equals method in https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)	2013-11-23 17:21:37 -08:00
Harvey Feng	4f1c3fa5d7	Hadoop 2.2 YARN API migration for `SPARK_HOME/new-yarn`	2013-11-23 17:08:30 -08:00
Ankur Dave	c1507afc6c	Support preservesPartitioning in RDD.zipPartitions	2013-11-23 03:03:31 -08:00
Aaron Davidson	ccea38b759	Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.	2013-11-21 21:36:08 -08:00
Reynold Xin	f20093c3af	Merge pull request #196 from pwendell/master TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open files when running shuffles.	2013-11-22 10:12:13 +08:00
Raymond Liu	ab3cefde53	Add YarnClientClusterScheduler and Backend. With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.	2013-11-22 09:23:27 +08:00
Patrick Wendell	53b94ef2f5	TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open shuffles after running shuffles.	2013-11-21 17:20:15 -08:00
Kay Ousterhout	fc78f67da2	Added logging of scheduler delays to UI	2013-11-21 16:54:23 -08:00
Prashant Sharma	95d8dbce91	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp Conflicts: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala	2013-11-21 12:34:46 +05:30
Prashant Sharma	199e9cf02d	Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/deploy/client/Client.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala	2013-11-21 11:55:48 +05:30
Reynold Xin	2fead510f7	Merge branch 'master' of github.com:tbfenet/incubator-spark PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.	2013-11-21 07:15:55 +08:00
Marek Kolodziej	22724659db	Make XORShiftRandom explicit in KMeans and roll it back for RDD	2013-11-20 07:03:36 -05:00
Marek Kolodziej	bcc6ed30bf	Formatting and scoping (private[spark]) updates	2013-11-19 20:50:38 -05:00
Henry Saputra	43dfac5132	Merge branch 'master' into removesemicolonscala	2013-11-19 16:57:57 -08:00
Henry Saputra	10be58f251	Another set of changes to remove unnecessary semicolon (;) from Scala code. Passed the sbt/sbt compile and test	2013-11-19 16:56:23 -08:00
Matei Zaharia	f568912f85	Merge pull request #181 from BlackNiuza/fix_tasks_number correct number of tasks in ExecutorsUI Index `a` is not `execId` here	2013-11-19 16:11:31 -08:00
tgravescs	4093e9393a	Impove Spark on Yarn Error handling	2013-11-19 12:44:00 -06:00
Henry Saputra	9c934b640f	Remove the semicolons at the end of Scala code to make it more pure Scala code. Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.	2013-11-19 10:19:03 -08:00
Matthew Taylor	f639b65eab	PartitionPruningRDD is using index from parent(review changes)	2013-11-19 10:48:48 +00:00
Matthew Taylor	13b9bf494b	PartitionPruningRDD is using index from parent	2013-11-19 06:27:33 +00:00
Holden Karau	e163e31c20	Add spaces	2013-11-18 20:13:25 -08:00
Holden Karau	7de180fd13	Remove explicit boxing	2013-11-18 20:05:05 -08:00
Marek Kolodziej	99cfe89c68	Updates to reflect pull request code review	2013-11-18 22:00:36 -05:00
Marek Kolodziej	09bdfe3b16	XORShift RNG with unit tests and benchmark To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000)	2013-11-18 15:21:43 -05:00
Russell Cardullo	1360f62d15	Cleanup GraphiteSink.scala based on feedback * Reorder imports according to the style guide * Consistently use propertyToOption in all places	2013-11-18 08:53:39 -08:00
shiyun.wxm	eda05fa439	use HashSet.empty[Long] instead of Seq[Long]	2013-11-18 13:31:14 +08:00
Aaron Davidson	85763f4942	Add PrimitiveVectorSuite and fix bug in resize()	2013-11-17 18:16:51 -08:00
Reynold Xin	16a2286d6d	Return the vector itself for trim and resize method in PrimitiveVector.	2013-11-17 17:52:02 -08:00
BlackNiuza	ecfbaf2442	rename "a" to "statusId"	2013-11-18 09:51:40 +08:00
Reynold Xin	c30979c7d6	Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.	2013-11-17 17:09:40 -08:00
BlackNiuza	b60839e56a	correct number of tasks in ExecutorsUI	2013-11-17 21:38:57 +08:00
Matei Zaharia	1b5b358309	Merge pull request #178 from hsaputra/simplecleanupcode Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect.	2013-11-16 11:44:10 -08:00
Henry Saputra	c33f802044	Simple cleanup on Spark's Scala code while testing core and yarn modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method call like size that does not have size effect.	2013-11-15 10:32:20 -08:00
Matei Zaharia	96e0fb4630	Merge pull request #173 from kayousterhout/scheduler_hang Fix bug where scheduler could hang after task failure. When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched. This isn't currently unit tested but will be once my pull request for merging the cluster and local schedulers goes in -- at which point many more of the unit tests will exercise the code paths through the cluster scheduler (currently the failure test suite uses the local scheduler, which is why we didn't see this bug before).	2013-11-14 22:29:28 -08:00

1 2 3 4 5 ...

2600 commits