ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Hossein Falaki	acb0323053	minor improvements	2013-12-31 15:34:26 -08:00
Hossein Falaki	d6cded7155	Added Java unit tests for countApproxDistinct and countApproxDistinctByKey	2013-12-30 19:32:05 -08:00
Hossein Falaki	c3073b6cf2	Added Java API for countApproxDistinct	2013-12-30 19:31:06 -08:00
Hossein Falaki	ed06500d30	Added Java API for countApproxDistinctByKey	2013-12-30 19:30:42 -08:00
Hossein Falaki	a7de8e9b1c	Renamed countDistinct and countDistinctByKey methods to include Approx	2013-12-30 19:28:03 -08:00
Hossein Falaki	d50ccc5ca9	Using origin version	2013-12-30 15:08:34 -08:00
Matei Zaharia	23a9ae6be3	Merge pull request #277 from tdas/scheduler-update Refactored the streaming scheduler and added StreamingListener interface - Refactored the streaming scheduler for cleaner code. Specifically, the JobManager was renamed to JobScheduler, as it does the actual scheduling of Spark jobs to the SparkContext. The earlier Scheduler was renamed to JobGenerator, as it actually generates the jobs from the DStreams. The JobScheduler starts the JobGenerator. Also, moved all the scheduler related code from spark.streaming to spark.streaming.scheduler package. - Implemented the StreamingListener interface, similar to SparkListener. The streaming version of StatusReportListener prints the batch processing time statistics (for now). Added StreamingListernerSuite to test it. - Refactored streaming TestSuiteBase for deduping code in the other streaming testsuites.	2013-12-24 00:08:48 -05:00
Reynold Xin	11107c9de5	Merge pull request #244 from leftnoteasy/master Added SPARK-968 implementation for review Added SPARK-968 implementation for review	2013-12-23 10:38:20 -08:00
wangda.tan	2f689ba97b	SPARK-968, added executor address showing in aggregated metrics by executors table	2013-12-23 15:03:45 +08:00
wangda.tan	c979eecdf6	added changes according to comments from rxin	2013-12-22 21:43:15 +08:00
Patrick Wendell	0bc57c5767	Merge pull request #280 from aarondav/minor Minor cleanup for standalone scheduler See commit messages	2013-12-20 11:56:54 -08:00
Patrick Wendell	eca68d4425	Merge pull request #272 from tmyklebu/master Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.	2013-12-19 18:12:22 -08:00
Aaron Davidson	6613ab663d	Fix compiler warning in SparkZooKeeperSession	2013-12-19 17:56:13 -08:00
Aaron Davidson	4d74b899b7	Remove firstApp from the standalone scheduler Master As a lonely child with no one to care for it... we had to put it down.	2013-12-19 17:53:41 -08:00
Aaron Davidson	1ab031eaff	Extraordinarily minor code/comment cleanup	2013-12-19 17:51:29 -08:00
Reynold Xin	7990c56375	Merge pull request #276 from shivaram/collectPartition Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.	2013-12-19 13:35:09 -08:00
Shivaram Venkataraman	9cc3a6d3c0	Add comment explaining collectPartitions's use	2013-12-19 11:49:17 -08:00
Shivaram Venkataraman	d3234f9726	Make collectPartitions take an array of partitions Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.	2013-12-19 11:40:34 -08:00
Nick Pentreath	a76f53416c	Add toString to Java RDD, and __repr__ to Python RDD	2013-12-19 14:38:20 +02:00
Tathagata Das	ec71b445ad	Minor changes.	2013-12-18 23:39:28 -08:00
Aaron Davidson	293a0af5a1	In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.	2013-12-18 21:42:29 -08:00
Tathagata Das	e93b391d75	Merge branch 'apache-master' into scheduler-update Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala	2013-12-18 17:51:14 -08:00
Tathagata Das	b80ec05635	Added StatsReportListener to generate processing time statistics across multiple batches.	2013-12-18 15:35:24 -08:00
Shivaram Venkataraman	af0cd6bd27	Add collectPartition to JavaRDD interface. Also remove takePartition from PythonRDD and use collectPartition in rdd.py.	2013-12-18 11:40:07 -08:00
Tor Myklebust	d3b1af4b6c	Add a serialisation time column to the StagePage.	2013-12-18 14:25:56 -05:00
Tor Myklebust	717c7fddb2	objectSer -> valueSer in a test.	2013-12-17 23:02:21 -05:00
Reynold Xin	9a6864d016	Fixed a performance problem in RDD.top and BoundedPriorityQueue (size in BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion).	2013-12-17 18:44:39 -08:00
wangda.tan	59e53fa21c	spark-968, changes for avoid a NPE	2013-12-17 17:57:27 +08:00
wangda.tan	36060f4f50	spark-898, changes according to review comments	2013-12-17 17:55:38 +08:00
Patrick Wendell	c1fec89895	Cleanup	2013-12-16 21:56:21 -08:00
Patrick Wendell	c6f95e603e	Attempt with extra repositories	2013-12-16 21:53:51 -08:00
Tor Myklebust	b2f0329511	Missed a spot; had an objectSer here too.	2013-12-17 00:18:46 -05:00
Tor Myklebust	25fa976580	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 23:48:37 -05:00
Tor Myklebust	963d6f065a	Incorporate pwendell's code review suggestions.	2013-12-16 23:14:52 -05:00
Reynold Xin	883e034aeb	Merge pull request #245 from gregakespret/task-maxfailures-fix Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.	2013-12-16 14:16:02 -08:00
Tor Myklebust	882d544856	UI to display serialisation time of a stage.	2013-12-16 13:27:03 -05:00
Tor Myklebust	8a397a959b	Track task value serialisation time in TaskMetrics.	2013-12-16 12:07:39 -05:00
wangda.tan	8ab8c6a526	Merge branch 'master' of git://github.com/apache/incubator-spark	2013-12-16 21:45:43 +08:00
Mark Hamstra	09ed7ddfa0	Use scala.binary.version in POMs	2013-12-15 12:39:58 -08:00
Josh Rosen	2fd781d347	Merge pull request #249 from ngbinh/partitionInJavaSortByKey Expose numPartitions parameter in JavaPairRDD.sortByKey() This change makes Java and Scala API on sortByKey() the same.	2013-12-14 12:59:37 -08:00
Prashant Sharma	1ae3c0fc5e	Added a comment about ActorRef and ActorSelection difference.	2013-12-14 10:44:24 +05:30
Prashant Sharma	a854cc536d	Review comments on the PR for scala 2.10 migration.	2013-12-13 15:19:51 +05:30
Tathagata Das	097e120c0c	Refactored streaming scheduler and added listener interface. - Refactored Scheduler + JobManager to JobGenerator + JobScheduler and added JobSet for cleaner code. Moved scheduler related code to streaming.scheduler package. - Added StreamingListener trait (similar to SparkListener) to enable gathering to streaming stats like processing times and delays. StreamingContext.addListener() to added listeners. - Deduped some code in streaming tests by modifying TestSuiteBase, and added StreamingListenerSuite.	2013-12-12 20:48:02 -08:00
Prashant Sharma	603af51bb5	Merge branch 'master' into akka-bug-fix Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala	2013-12-11 10:21:53 +05:30
Hossein Falaki	49bf47e1b7	Removed superfluous abs call from test cases.	2013-12-10 19:50:50 -08:00
Binh Nguyen	0b494f7db4	Hook directly to Scala API	2013-12-10 11:17:52 -08:00
Binh Nguyen	e85af50767	Leave default value of numPartitions to Scala code.	2013-12-10 11:04:14 -08:00
Grega Kespret	558af87334	Fix tests.	2013-12-10 11:43:42 +01:00
Binh Nguyen	c82d4f079b	Use braces to shorten the line.	2013-12-10 01:04:52 -08:00
Binh Nguyen	5013fb64b2	Expose numPartitions parameter in JavaPairRDD.sortByKey() This change make Java and Scala API on sortByKey() the same.	2013-12-10 00:38:16 -08:00

1 2 3 4 5 ...

2628 commits