ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Prashant Sharma	54862af5ee	Improvements from the review comments and followed Boy Scout Rule.	2013-11-27 14:26:28 +05:30
Prashant Sharma	dca946ff67	Documenting the newly added spark properties.	2013-11-26 20:47:38 +05:30
Prashant Sharma	560e44a8e1	Restored master address for client.	2013-11-26 18:18:05 +05:30
Prashant Sharma	d092a8cc6a	Fixed compile time warnings and formatting post merge.	2013-11-26 15:21:50 +05:30
Prashant Sharma	44fd30d3fb	Merge branch 'master' into scala-2.10-wip Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala	2013-11-25 18:10:54 +05:30
Prashant Sharma	489862a657	Remote death watch has a funny bug. https://gist.github.com/ScrapCodes/4805fd84906e40b7b03d	2013-11-25 18:00:02 +05:30
Prashant Sharma	77929cfeed	Fine tuning defaults for akka and restored tracking of dissassociated events, for they are delivered when a remote TCP socket is closed. Also made transport failure heartbeats larger interval for it is mostly not needed. As we are using remote death watch instead.	2013-11-25 14:13:21 +05:30
Reynold Xin	62889c419c	Merge pull request #203 from witgo/master Fix Maven build for metrics-graphite	2013-11-25 11:27:45 +08:00
LiGuoqiang	989203604e	Fix Maven build for metrics-graphite	2013-11-25 11:23:11 +08:00
Matei Zaharia	859d62dc2a	Merge pull request #151 from russellcardullo/add-graphite-sink Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.	2013-11-24 16:19:51 -08:00
Matei Zaharia	65de73c7f8	Merge pull request #185 from mkolod/random-number-generator XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.	2013-11-24 15:52:33 -08:00
Reynold Xin	972171b9d9	Merge pull request #197 from aarondav/patrick-fix Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.	2013-11-25 07:50:46 +08:00
Reynold Xin	718cc803f7	Merge pull request #200 from mateiz/hash-fix AppendOnlyMap fixes - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`) - Some other small optimizations throughout (see commit comments)	2013-11-24 11:02:02 +08:00
Matei Zaharia	9837a60234	Some other optimizations to AppendOnlyMap: - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert	2013-11-23 17:38:29 -08:00
Matei Zaharia	7535d7fbcb	Fixes to AppendOnlyMap: - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Use Object.equals() instead of Scala's == to compare keys, because the latter does extra casts for numeric types (see the equals method in https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)	2013-11-23 17:21:37 -08:00
Reynold Xin	51aa9d6e99	Merge pull request #198 from ankurdave/zipPartitions-preservesPartitioning Support preservesPartitioning in RDD.zipPartitions In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.	2013-11-23 19:46:46 +08:00
Ankur Dave	c1507afc6c	Support preservesPartitioning in RDD.zipPartitions	2013-11-23 03:03:31 -08:00
Aaron Davidson	ccea38b759	Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.	2013-11-21 21:36:08 -08:00
Reynold Xin	086b097e33	Merge pull request #193 from aoiwelle/patch-1 Fix Kryo Serializer buffer documentation inconsistency The documentation here is inconsistent with the coded default and other documentation.	2013-11-22 10:26:39 +08:00
Reynold Xin	f20093c3af	Merge pull request #196 from pwendell/master TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open files when running shuffles.	2013-11-22 10:12:13 +08:00
Patrick Wendell	53b94ef2f5	TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open shuffles after running shuffles.	2013-11-21 17:20:15 -08:00
Prashant Sharma	95d8dbce91	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp Conflicts: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala	2013-11-21 12:34:46 +05:30
Prashant Sharma	199e9cf02d	Merge branch 'scala210-master' of github.com:colorant/incubator-spark into scala-2.10 Conflicts: core/src/main/scala/org/apache/spark/deploy/client/Client.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala	2013-11-21 11:55:48 +05:30
Neal Wiggins	21b5478ed6	Fix Kryo Serializer buffer inconsistency The documentation here is inconsistent with the coded default and other documentation.	2013-11-20 16:19:25 -08:00
Reynold Xin	2fead510f7	Merge branch 'master' of github.com:tbfenet/incubator-spark PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.	2013-11-21 07:15:55 +08:00
Matei Zaharia	4b895013cc	Merge pull request #191 from hsaputra/removesemicolonscala Cleanup to remove semicolons (;) from Scala code -) The main reason for this PR is to remove semicolons from single statements of Scala code. -) Remove unused imports as I see them -) Fix ASF comment header from some of files (bad copy paste I suppose)	2013-11-20 10:36:10 -08:00
Marek Kolodziej	22724659db	Make XORShiftRandom explicit in KMeans and roll it back for RDD	2013-11-20 07:03:36 -05:00
Marek Kolodziej	bcc6ed30bf	Formatting and scoping (private[spark]) updates	2013-11-19 20:50:38 -05:00
Henry Saputra	43dfac5132	Merge branch 'master' into removesemicolonscala	2013-11-19 16:57:57 -08:00
Henry Saputra	10be58f251	Another set of changes to remove unnecessary semicolon (;) from Scala code. Passed the sbt/sbt compile and test	2013-11-19 16:56:23 -08:00
Matei Zaharia	f568912f85	Merge pull request #181 from BlackNiuza/fix_tasks_number correct number of tasks in ExecutorsUI Index `a` is not `execId` here	2013-11-19 16:11:31 -08:00
Matei Zaharia	aa638ed9c1	Merge pull request #189 from tgravescs/sparkYarnErrorHandling Impove Spark on Yarn Error handling Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it. I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn.	2013-11-19 16:05:44 -08:00
Matei Zaharia	55925805fc	Merge pull request #187 from aarondav/example-bcast-test Enable the Broadcast examples to work in a cluster setting Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally. This issue came up on the mailing lists [here](http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3C2013111909591557147628%40ict.ac.cn%3E).	2013-11-19 16:04:01 -08:00
tgravescs	4093e9393a	Impove Spark on Yarn Error handling	2013-11-19 12:44:00 -06:00
Henry Saputra	9c934b640f	Remove the semicolons at the end of Scala code to make it more pure Scala code. Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.	2013-11-19 10:19:03 -08:00
Matthew Taylor	f639b65eab	PartitionPruningRDD is using index from parent(review changes)	2013-11-19 10:48:48 +00:00
Aaron Davidson	50fd8d98c0	Enable the Broadcast examples to work in a cluster setting Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally.	2013-11-18 22:51:35 -08:00
Matthew Taylor	13b9bf494b	PartitionPruningRDD is using index from parent	2013-11-19 06:27:33 +00:00
Marek Kolodziej	99cfe89c68	Updates to reflect pull request code review	2013-11-18 22:00:36 -05:00
Marek Kolodziej	09bdfe3b16	XORShift RNG with unit tests and benchmark To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000)	2013-11-18 15:21:43 -05:00
Russell Cardullo	1360f62d15	Cleanup GraphiteSink.scala based on feedback * Reorder imports according to the style guide * Consistently use propertyToOption in all places	2013-11-18 08:53:39 -08:00
shiyun.wxm	eda05fa439	use HashSet.empty[Long] instead of Seq[Long]	2013-11-18 13:31:14 +08:00
Reynold Xin	e2ebc3a9d8	Merge pull request #182 from rxin/vector Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.	2013-11-17 18:42:18 -08:00
Reynold Xin	26f616d73a	Merge pull request #3 from aarondav/pv-test Add PrimitiveVectorSuite and fix bug in resize()	2013-11-17 18:18:16 -08:00
Aaron Davidson	85763f4942	Add PrimitiveVectorSuite and fix bug in resize()	2013-11-17 18:16:51 -08:00
Reynold Xin	16a2286d6d	Return the vector itself for trim and resize method in PrimitiveVector.	2013-11-17 17:52:02 -08:00
BlackNiuza	ecfbaf2442	rename "a" to "statusId"	2013-11-18 09:51:40 +08:00
Reynold Xin	c30979c7d6	Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.	2013-11-17 17:09:40 -08:00
BlackNiuza	b60839e56a	correct number of tasks in ExecutorsUI	2013-11-17 21:38:57 +08:00
Matei Zaharia	1b5b358309	Merge pull request #178 from hsaputra/simplecleanupcode Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect.	2013-11-16 11:44:10 -08:00

1 2 3 4 5 ...

4682 commits