ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	70a0b993d4	Merge pull request #14 from kayousterhout/untangle_scheduler Improved organization of scheduling packages. This commit does not change any code -- only file organization. Please let me know if there was some masterminded strategy behind the existing organization that I failed to understand! There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it seems to be within the cluster package. The one thing about the scheduling code that seems a little funny to me is the naming of the SchedulerBackends. The StandaloneSchedulerBackend is not just for Standalone mode, but instead is used by Mesos coarse grained mode and Yarn, and the backend that is just for Standalone mode is instead called SparkDeploySchedulerBackend. I didn't change this because I wasn't sure if there was a reason for this naming that I'm just not aware of.	2013-09-26 14:11:54 -07:00
Reynold Xin	c514cd1587	Merge pull request #930 from holdenk/master Add mapPartitionsWithIndex	2013-09-26 13:48:20 -07:00
Reynold Xin	560ee5c9bb	Merge pull request #7 from wannabeast/memorystore-fixes some minor fixes to MemoryStore This is a repeat of #5, moved to its own branch in my repo. This makes all updates to on ; it skips on synchronizing the reads where it can get away with it.	2013-09-26 11:27:34 -07:00
Patrick Wendell	6566a19b38	Merge pull request #9 from rxin/limit Smarter take/limit implementation.	2013-09-26 08:01:04 -07:00
Kay Ousterhout	d85fe41b2b	Improved organization of scheduling packages. This commit does not change any code -- only file organization. There are two components of this change: (1) Moving files out of the cluster package, and down a level to the scheduling package. These files are all used by the local scheduler in addition to the cluster scheduler(s), so should not be in the cluster package. As a result of this change, none of the files in the local package reference files in the cluster package. (2) Moving the mesos package to within the cluster package. The mesos scheduling code is for a cluster, and represents a specific case of cluster scheduling (the Mesos-related classes often subclass cluster scheduling classes). Thus, the most logical place for it is within the cluster package.	2013-09-25 12:45:46 -07:00
Patrick Wendell	6079721fa1	Update build version in master	2013-09-24 11:41:51 -07:00
Holden Karau	0cef683553	Fix formatting :)	2013-09-23 19:39:42 -07:00
Reynold Xin	ff540a015b	Merge branch 'master' of github.com:markhamstra/incubator-spark	2013-09-23 11:55:02 -07:00
Holden Karau	7fe0b0ff56	Switch indent from 2 to 4 spaces	2013-09-22 19:44:51 -07:00
jerryshao	77e9da1f34	Change Exception to NoSuchElementException and minor style fix	2013-09-22 16:50:08 +08:00
jerryshao	85024acd2e	Remove infix style and others	2013-09-22 14:20:55 +08:00
jerryshao	5850f599dd	Refactor FairSchedulableBuilder: 1. Configuration can be read from classpath if not set explicitly. 2. Add missing close handler.	2013-09-22 14:20:55 +08:00
Reynold Xin	a2ea069a5f	Merge pull request #937 from jerryshao/localProperties-fix Fix PR926 local properties issues in Spark Streaming like scenarios	2013-09-21 23:04:42 -07:00
jerryshao	aa0c29f747	Add barrier for local properties unit test and fix some styles	2013-09-22 09:53:11 +08:00
Reynold Xin	42571d30d0	Smarter take/limit implementation.	2013-09-20 17:09:53 -07:00
Mike	9524b943a4	Synchronize on "entries" the remaining update to "currentMemory". Make "currentMemory" @volatile, so that it's reads in ensureFreeSpace() are atomic and up-to-date--i.e., currentMemory can't increase while putLock is held (though it could decrease, which would only help ensureFreeSpace()).	2013-09-19 23:31:35 -07:00
Ankur Dave	026dba6aba	After unit tests, clear port properties unconditionally In MapOutputTrackerSuite, the "remote fetch" test sets spark.driver.port and spark.hostPort, assuming that they will be cleared by LocalSparkContext. However, the test never sets sc, so it remains null, causing LocalSparkContext to skip clearing these properties. Subsequent tests therefore fail with java.net.BindException: "Address already in use". This commit makes LocalSparkContext clear the properties even if sc is null.	2013-09-19 22:05:23 -07:00
jerryshao	ffa5f8e11d	Fix issue when local properties pass from parent to child thread	2013-09-18 17:33:24 +08:00
Holden Karau	bfcddf4700	Make mapPartitionsWithIndex work with JavaRDD's	2013-09-14 15:53:42 -07:00
Holden Karau	74f710f6cd	Start of working on SPARK-615	2013-09-11 22:35:58 -07:00
Mike	d34672f668	Set currentMemory to 0 in clear(). Remove unnecessary entries.get() call.	2013-09-11 18:01:19 -07:00
Kay Ousterhout	93c4253275	Changed localProperties to use ThreadLocal (not DynamicVariable). The fact that DynamicVariable uses an InheritableThreadLocal can cause problems where the properties end up being shared across threads in certain circumstances.	2013-09-11 13:01:39 -07:00
Patrick Wendell	91a59e6b10	Merge pull request #919 from mateiz/jets3t Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-11 10:21:48 -07:00
Patrick Wendell	b9128d34bf	Merge pull request #922 from pwendell/port-change Change default port number from 3030 to 4030.	2013-09-11 10:03:06 -07:00
Patrick Wendell	bddf135670	Change port from 3030 to 4040	2013-09-11 10:01:38 -07:00
David McCauley	5dd875c5b5	SPARK-894 - Not all WebUI fields delivered VIA JSON	2013-09-11 10:46:37 +01:00
Mike	293c758cc0	Remove MemoryStore$Entry.dropPending, unused as of `42e0a68082`.	2013-09-10 00:24:35 -07:00
Matei Zaharia	f117dc6d0d	Add explicit jets3t dependency, which is excluded in hadoop-client	2013-09-10 06:39:25 +00:00
Matei Zaharia	c81377b9ed	Merge pull request #915 from ooyala/master Get rid of / improve ugly NPE when Utils.deleteRecursively() fails	2013-09-09 20:16:19 -07:00
Evan Chan	fdb8b0eec3	Style fix: put body of if within curly braces	2013-09-09 14:29:32 -07:00
Matei Zaharia	a85758c200	Merge pull request #907 from stephenh/document_coalesce_shuffle Add better docs for coalesce.	2013-09-09 13:45:40 -07:00
Evan Chan	27726079e4	Print out more friendly error if listFiles() fails listFiles() could return null if the I/O fails, and this currently results in an ugly NPE which is hard to diagnose.	2013-09-09 12:58:12 -07:00
$Y.CORP.YAHOO.COM\tgraves$ Y.CORP.YAHOO.COM\tgraves	2186d93285	Add metrics-ganglia to core pom file	2013-09-09 12:37:33 -05:00
Stephen Haberman	59003d387d	Use a set since shuffle could change order.	2013-09-09 11:45:03 -05:00
Stephen Haberman	6471bfec73	Reword 'evenly distributed' to 'distributed with a hash partitioner.	2013-09-09 11:44:15 -05:00
Matei Zaharia	bf984e2745	Merge pull request #890 from mridulm/master Fix hash bug	2013-09-08 23:50:24 -07:00
Reynold Xin	e9d4f44a7a	Merge pull request #909 from mateiz/exec-id-fix Fix an instance where full standalone mode executor IDs were passed to	2013-09-08 23:36:48 -07:00
Matei Zaharia	7d3204b056	Merge pull request #905 from mateiz/docs2 Job scheduling and cluster mode docs	2013-09-08 21:39:12 -07:00
Patrick Wendell	f68848d95d	Merge pull request #906 from pwendell/ganglia-sink Clean-up of Metrics Code/Docs and Add Ganglia Sink	2013-09-08 18:32:16 -07:00
Matei Zaharia	f9b7f58de2	Fix an instance where full standalone mode executor IDs were passed to StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in https://github.com/clearstorydata/spark/pull/9. Also fixed some messages that said slave instead of executor.	2013-09-08 18:27:50 -07:00
Matei Zaharia	170b3869ee	Fix unit test failure due to changed default	2013-09-08 17:51:27 -07:00
Patrick Wendell	b4e382c210	Adding sc name in metrics source	2013-09-08 16:06:49 -07:00
Patrick Wendell	c190b48bf5	Adding more docs and some code cleanup	2013-09-08 13:46:28 -07:00
Stephen Haberman	df5fd35273	Add better docs for coalesce. Include the useful tip that if shuffle=true, coalesce can actually increase the number of partitions. This makes coalesce more like a generic `RDD.repartition` operation. (Ideally this `RDD.repartition` could automatically choose either a coalesce or a shuffle if numPartitions was either less than or greater than, respectively, the current number of partitions.)	2013-09-08 15:39:04 -05:00
Matei Zaharia	04cfb3aa9d	Merge pull request #898 from ilikerps/660 SPARK-660: Add StorageLevel support in Python	2013-09-08 10:33:20 -07:00
Patrick Wendell	8de8ee5d3c	Ganglia sink	2013-09-08 10:08:18 -07:00
Matei Zaharia	651a96adf7	More fair scheduler docs and property names. Also changed uses of "job" terminology to "application" when they referred to an entire Spark program, to avoid confusion.	2013-09-08 00:29:11 -07:00
Matei Zaharia	98fb69822c	Work in progress: - Add job scheduling docs - Rename some fair scheduler properties - Organize intro page better - Link to Apache wiki for "contributing to Spark"	2013-09-08 00:29:11 -07:00
Aaron Davidson	c1cc8c4da2	Export StorageLevel and refactor	2013-09-07 14:41:31 -07:00
Aaron Davidson	8001687af5	Remove reflection, hard-code StorageLevels The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python	2013-09-07 09:34:07 -07:00

1 2 3 4 5 ...

2110 commits