ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	ddf64f019f	Support job cancellation in multi-pool scheduler.	2013-10-10 13:20:27 -07:00
Reynold Xin	3bd2890d2b	Fixed the deadlock situation in multi-job actions and added more unit tests.	2013-10-10 12:07:09 -07:00
Reynold Xin	0353f74a9a	Put the job cancellation handling into the dagscheduler's main event loop.	2013-10-10 00:28:00 -07:00
Reynold Xin	dbae7795ba	Merge branch 'master' of github.com:apache/incubator-spark into kill Conflicts: core/src/main/scala/org/apache/spark/CacheManager.scala core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerSource.scala	2013-10-09 22:57:35 -07:00
Reynold Xin	53895f9cde	Implemented FutureAction, FutureJob, CancellablePromise. Implemented more unit tests for async actions.	2013-10-09 22:43:06 -07:00
Reynold Xin	320418f7c8	Merge pull request #49 from mateiz/kryo-fix-2 Fix Chill serialization of Range objects It used to write out each element one by one, creating very large objects.	2013-10-09 16:55:30 -07:00
Reynold Xin	215238cb39	Merge pull request #50 from kayousterhout/SPARK-908 Fix race condition in SparkListenerSuite (fixes SPARK-908).	2013-10-09 16:49:44 -07:00
Matei Zaharia	c84c205289	Fix Chill serialization of Range objects, which used to write out each element, and register user and Spark classes before Chill's serializers to let them override Chill's behavior in general.	2013-10-09 16:23:40 -07:00
Kay Ousterhout	36966f65df	Style fixes	2013-10-09 15:36:34 -07:00
Kay Ousterhout	3f7e9b265c	Fixed comment to use javadoc style	2013-10-09 15:23:04 -07:00
Kay Ousterhout	a34a4e8174	Fix race condition in SparkListenerSuite (fixes SPARK-908).	2013-10-09 15:07:53 -07:00
Patrick Wendell	bd3bcc5f8e	Use standard abbreviations in metrics labels	2013-10-09 11:16:24 -07:00
Patrick Wendell	19d445d37c	Merge pull request #22 from GraceH/metrics-naming SPARK-900 Use coarser grained naming for metrics see SPARK-900 Use coarser grained naming for metrics. Now the new metric name is formatted as {XXX.YYY.ZZZ.COUNTER_UNIT}, XXX.YYY.ZZZ represents the group name, which can group several metrics under the same Ganglia view.	2013-10-09 11:08:34 -07:00
Reynold Xin	e67d5b962a	Merge pull request #43 from mateiz/kryo-fix Don't allocate Kryo buffers unless needed I noticed that the Kryo serializer could be slower than the Java one by 2-3x on small shuffles because it spend a lot of time initializing Kryo Input and Output objects. This is because our default buffer size for them is very large. Since the serializer is often used on streams, I made the initialization lazy for that, and used a smaller buffer (auto-managed by Kryo) for input.	2013-10-08 22:57:38 -07:00
Grace Huang	f7628e4033	remove those futile suffixes like number/count	2013-10-09 08:36:41 +08:00
Grace Huang	22bed59d2d	create metrics name manually.	2013-10-08 18:01:11 +08:00
Grace Huang	188abbf8f1	Revert "SPARK-900 Use coarser grained naming for metrics" This reverts commit `4b68be5f3c`.	2013-10-08 17:45:14 +08:00
Grace Huang	a2af6b543a	Revert "remedy the line-wrap while exceeding 100 chars" This reverts commit `892fb8ffa8`.	2013-10-08 17:44:56 +08:00
Patrick Wendell	8b377718b8	Responses to review	2013-10-07 20:03:35 -07:00
Matei Zaharia	a8725bf8f8	Don't allocate Kryo buffers unless needed	2013-10-07 19:16:35 -07:00
Patrick Wendell	391133f66a	Fix inconsistent and incorrect log messages in shuffle read path	2013-10-07 17:24:18 -07:00
Reynold Xin	213b70a2db	Merge pull request #31 from sundeepn/branch-0.8 Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit `023e3fdf00`) Signed-off-by: Reynold Xin <rxin@apache.org>	2013-10-07 10:54:22 -07:00
Patrick Wendell	aa9fb84994	Merging build changes in from 0.8	2013-10-05 22:07:00 -07:00
Matei Zaharia	4a25b116d4	Merge pull request #20 from harveyfeng/hadoop-config-cache Allow users to pass broadcasted Configurations and cache InputFormats across Hadoop file reads. Note: originally from https://github.com/mesos/spark/pull/942 Currently motivated by Shark queries on Hive-partitioned tables, where there's a JobConf broadcast for every Hive-partition (i.e., every subdirectory read). The only thing different about those JobConfs is the input path - the Hadoop Configuration that the JobConfs are constructed from remain the same. This PR only modifies the old Hadoop API RDDs, but similar additions to the new API might reduce computation latencies a little bit for high-frequency FileInputDStreams (which only uses the new API right now). As a small bonus, added InputFormats caching, to avoid reflection calls for every RDD#compute(). Few other notes: Added a general soft-reference hashmap in SparkHadoopUtil because I wanted to avoid adding another class to SparkEnv. SparkContext default hadoopConfiguration isn't cached. There's no equals() method for Configuration, so there isn't a good way to determine when configuration properties have changed.	2013-10-05 19:28:55 -07:00
Harvey Feng	6a2bbec5e3	Some comments regarding JobConf and InputFormat caching for HadoopRDDs.	2013-10-05 17:53:58 -07:00
Harvey Feng	96929f28bb	Make HadoopRDD object Spark private.	2013-10-05 17:14:19 -07:00
Harvey Feng	b5e93c1227	Fix API changes; lines > 100 chars.	2013-10-05 16:57:08 -07:00
Matei Zaharia	100222b048	Merge pull request #27 from davidmccauley/master SPARK-920/921 - JSON endpoint updates 920 - Removal of duplicate scheme part of Spark URI, it was appearing as spark://spark//host:port in the JSON field. JSON now delivered as: url:spark://127.0.0.1:7077 921 - Adding the URL of the Main Application UI will allow custom interfaces (that use the JSON output) to redirect from the standalone UI.	2013-10-05 13:38:59 -07:00
Andre Schumacher	c84946fe21	Fixing SPARK-602: PythonPartitioner Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.	2013-10-04 11:56:47 -07:00
Reynold Xin	d29e8035a0	Added countAsync and various unit tests for async actions.	2013-10-03 15:13:44 -07:00
Reynold Xin	802bfb870d	- Created AsyncRDDActions. - Make FutureJob a Scala Future instead of Java Future.	2013-10-03 01:22:28 -07:00
Reynold Xin	e8e917f209	Merge branch 'master' into kill Conflicts: core/src/main/scala/org/apache/spark/TaskEndReason.scala core/src/main/scala/org/apache/spark/executor/Executor.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 23:01:34 -07:00
Reynold Xin	1c48ba0d9f	Merge remote-tracking branch 'origin' into kill Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala	2013-10-02 16:40:44 -07:00
David McCauley	1577b373a9	SPARK-921 - Add Application UI URL to ApplicationInfo Json output	2013-10-02 15:03:41 +01:00
David McCauley	351da54676	SPARK-920 - JSON endpoint URI scheme part (spark://) duplicated	2013-10-02 13:23:38 +01:00
Kay Ousterhout	0dcad2edcb	Added additional unit test for repeated task failures	2013-09-30 23:26:15 -07:00
Kay Ousterhout	dea4677c88	Fixed compilation errors and broken test.	2013-09-30 22:07:01 -07:00
Kay Ousterhout	8deda427bc	Merge remote-tracking branch 'upstream/master' into results_through-bm Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterScheduler.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala	2013-09-30 10:16:58 -07:00
Kay Ousterhout	58b764b7c6	Addressed Matei's code review comments	2013-09-30 10:11:59 -07:00
Grace Huang	892fb8ffa8	remedy the line-wrap while exceeding 100 chars	2013-09-30 20:12:55 +08:00
Harvey Feng	7d06bdde1d	Merge HadoopDatasetRDD into HadoopRDD.	2013-09-29 20:08:03 -07:00
Grace Huang	4b68be5f3c	SPARK-900 Use coarser grained naming for metrics	2013-09-27 14:47:38 +08:00
Harvey Feng	417085716a	Merge remote-tracking branch 'oldsparkme/hadoopRDD-broadcast-change' into hadoop-config-cache	2013-09-26 15:49:42 -07:00
Reynold Xin	70a0b993d4	Merge pull request #14 from kayousterhout/untangle_scheduler Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.	2013-09-26 14:11:54 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Reynold Xin	560ee5c9bb	Merge pull request #7 from wannabeast/memorystore-fixes some minor fixes to MemoryStore This is a repeat of #5, moved to its own branch in my repo. This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.	2013-09-26 11:27:34 -07:00
Patrick Wendell	6566a19b38	Merge pull request #9 from rxin/limit Smarter take/limit implementation.	2013-09-26 08:01:04 -07:00
Kay Ousterhout	d85fe41b2b	Improved organization of scheduling packages. This commit does not change any code -- only file organization. There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it is within the cluster package.	2013-09-25 12:45:46 -07:00
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Holden Karau	0cef683553	Fix formatting :)	2013-09-23 19:39:42 -07:00

1 2 3 4 5 ...

2169 commits