ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	11107c9de5	Merge pull request #244 from leftnoteasy/master Added SPARK-968 implementation for review Added SPARK-968 implementation for review	2013-12-23 10:38:20 -08:00
wangda.tan	2f689ba97b	SPARK-968, added executor address showing in aggregated metrics by executors table	2013-12-23 15:03:45 +08:00
Tor Myklebust	cbb2811189	Release JVM reference to the ALSModel when done.	2013-12-22 15:03:58 -05:00
Kay Ousterhout	b7bfae1afe	Correctly merged in maxTaskFailures fix	2013-12-22 07:34:44 -08:00
wangda.tan	c979eecdf6	added changes according to comments from rxin	2013-12-22 21:43:15 +08:00
Kay Ousterhout	b8ae096a40	Fix build error in test	2013-12-21 23:28:48 -08:00
Tor Myklebust	20f85eca3d	Java stubs for ALSModel.	2013-12-21 14:54:13 -05:00
Tor Myklebust	076fc16221	Python stubs for ALSModel.	2013-12-21 14:54:01 -05:00
Tathagata Das	3ddbdbfbc7	Minor updated based on comments on PR 277.	2013-12-20 19:51:37 -08:00
Kay Ousterhout	30186aa264	Renamed ClusterScheduler to TaskSchedulerImpl	2013-12-20 14:58:04 -08:00
Kay Ousterhout	c06945cfe0	Merge remote branch 'upstream/master' into consolidate_schedulers Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala	2013-12-20 14:39:30 -08:00
Patrick Wendell	0bc57c5767	Merge pull request #280 from aarondav/minor Minor cleanup for standalone scheduler See commit messages	2013-12-20 11:56:54 -08:00
Tor Myklebust	b454fdc2eb	Javadocs; also, declare some things private.	2013-12-20 02:10:21 -05:00
Tor Myklebust	0b494c2167	Un-semicolon mllib.py.	2013-12-20 02:05:55 -05:00
Tor Myklebust	0a5cacb961	Change some docstrings and add some others.	2013-12-20 02:05:15 -05:00
Tor Myklebust	b835ddf3df	Licence notice.	2013-12-20 01:55:03 -05:00
Tor Myklebust	d89cc1e28a	Whitespace.	2013-12-20 01:50:42 -05:00
Tor Myklebust	319520b9bb	Remove gigantic endian-specific test and exception tests.	2013-12-20 01:48:44 -05:00
Tor Myklebust	2940201ad8	Tests for the Python side of the mllib bindings.	2013-12-20 01:33:32 -05:00
Kay Ousterhout	9228ec847e	Merge pull request #1 from aarondav/127 Merge master into 127	2013-12-19 21:37:15 -08:00
Tor Myklebust	73e17064c6	Python stubs for classification and clustering.	2013-12-20 00:12:48 -05:00
Tor Myklebust	f99970e8cd	Scala classification and clustering stubs; matrix serialization/deserialization.	2013-12-20 00:12:22 -05:00
Tor Myklebust	2328bdd00f	Python side of python bindings for linear, Lasso, and ridge regression	2013-12-19 22:45:16 -05:00
Tor Myklebust	ded67ee90c	Bindings for linear, Lasso, and ridge regression.	2013-12-19 22:42:12 -05:00
Tor Myklebust	2a41c9aad3	Un-semicolon PythonMLLibAPI.	2013-12-19 21:27:11 -05:00
Patrick Wendell	eca68d4425	Merge pull request #272 from tmyklebu/master Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.	2013-12-19 18:12:22 -08:00
Aaron Davidson	6613ab663d	Fix compiler warning in SparkZooKeeperSession	2013-12-19 17:56:13 -08:00
Aaron Davidson	4d74b899b7	Remove firstApp from the standalone scheduler Master As a lonely child with no one to care for it... we had to put it down.	2013-12-19 17:53:41 -08:00
Aaron Davidson	1ab031eaff	Extraordinarily minor code/comment cleanup	2013-12-19 17:51:29 -08:00
Aaron Davidson	0647ec9757	Clean up shuffle files once their metadata is gone Previously, we would only clean the in-memory metadata for consolidated shuffle files. Additionally, fixes a bug where the Metadata Cleaner was ignoring type- specific TTLs.	2013-12-19 15:40:48 -08:00
Reynold Xin	7990c56375	Merge pull request #276 from shivaram/collectPartition Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.	2013-12-19 13:35:09 -08:00
Shivaram Venkataraman	9cc3a6d3c0	Add comment explaining collectPartitions's use	2013-12-19 11:49:17 -08:00
Shivaram Venkataraman	d3234f9726	Make collectPartitions take an array of partitions Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.	2013-12-19 11:40:34 -08:00
Matei Zaharia	440e531a5e	Merge pull request #278 from MLnick/java-python-tostring Add toString to Java RDD, and __repr__ to Python RDD Addresses [SPARK-992](https://spark-project.atlassian.net/browse/SPARK-992)	2013-12-19 10:38:56 -08:00
Nick Pentreath	a76f53416c	Add toString to Java RDD, and __repr__ to Python RDD	2013-12-19 14:38:20 +02:00
Tor Myklebust	bf20591a00	Incorporate most of Josh's style suggestions. I don't want to deal with the type and length checking errors until we've got at least one working stub that we're all happy with.	2013-12-19 03:40:57 -05:00
Reynold Xin	d8d3f3e60d	Merge pull request #183 from aarondav/spark-959 [SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See [bug](https://spark-project.atlassian.net/browse/SPARK-959) for more details. Note that this may not be the best solution, as I do not understand the root cause of why this only happens for some people. However, it is reported to work.	2013-12-19 00:06:43 -08:00
Tathagata Das	ec71b445ad	Minor changes.	2013-12-18 23:39:28 -08:00
Aaron Davidson	eaf6a269b1	[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See bug for more details. (This is probably also the beginning of the slow death of our recently prettified dependencies. Form follow function.)	2013-12-18 23:37:31 -08:00
Tor Myklebust	bf491bb3c0	The rest of the Python side of those bindings.	2013-12-19 01:29:51 -05:00
Tor Myklebust	95915f8b3b	First cut at python mllib bindings. Only LinearRegression is supported.	2013-12-19 01:29:09 -05:00
Reynold Xin	bfba5323be	Merge pull request #247 from aarondav/minor Increase spark.akka.askTimeout default to 30 seconds In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.	2013-12-18 22:22:21 -08:00
Aaron Davidson	293a0af5a1	In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.	2013-12-18 21:42:29 -08:00
Tathagata Das	e93b391d75	Merge branch 'apache-master' into scheduler-update Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala	2013-12-18 17:51:14 -08:00
Reynold Xin	c64a53a48b	Merge pull request #267 from JoshRosen/cygwin Fix Cygwin support in several scripts. This allows the spark-shell, spark-class, run-example, make-distribution.sh, and ./bin/start-* scripts to work under Cygwin. Note that this doesn't support PySpark under Cygwin, since that requires many additional `cygpath` calls from within Python and will be non-trivial to implement. This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).	2013-12-18 16:56:26 -08:00
Tathagata Das	b80ec05635	Added StatsReportListener to generate processing time statistics across multiple batches.	2013-12-18 15:35:24 -08:00
Reynold Xin	5ea187277c	Merge pull request #274 from azuryy/master Fixed the example link in the Scala programing guid. The old link cannot access, I changed to the new one.	2013-12-18 15:27:24 -08:00
Shivaram Venkataraman	af0cd6bd27	Add collectPartition to JavaRDD interface. Also remove takePartition from PythonRDD and use collectPartition in rdd.py.	2013-12-18 11:40:07 -08:00
Tor Myklebust	d3b1af4b6c	Add a serialisation time column to the StagePage.	2013-12-18 14:25:56 -05:00
Takuya UESHIN	d5b260e7dd	Change the order of CLASSPATH.	2013-12-19 02:17:08 +09:00

1 2 3 4 5 ...

4987 commits