ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Neal Wiggins	21b5478ed6	Fix Kryo Serializer buffer inconsistency The documentation here is inconsistent with the coded default and other documentation.	2013-11-20 16:19:25 -08:00
Reynold Xin	2fead510f7	Merge branch 'master' of github.com:tbfenet/incubator-spark PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.	2013-11-21 07:15:55 +08:00
Matei Zaharia	4b895013cc	Merge pull request #191 from hsaputra/removesemicolonscala Cleanup to remove semicolons (;) from Scala code -) The main reason for this PR is to remove semicolons from single statements of Scala code. -) Remove unused imports as I see them -) Fix ASF comment header from some of files (bad copy paste I suppose)	2013-11-20 10:36:10 -08:00
Marek Kolodziej	22724659db	Make XORShiftRandom explicit in KMeans and roll it back for RDD	2013-11-20 07:03:36 -05:00
Marek Kolodziej	bcc6ed30bf	Formatting and scoping (private[spark]) updates	2013-11-19 20:50:38 -05:00
Henry Saputra	43dfac5132	Merge branch 'master' into removesemicolonscala	2013-11-19 16:57:57 -08:00
Henry Saputra	10be58f251	Another set of changes to remove unnecessary semicolon (;) from Scala code. Passed the sbt/sbt compile and test	2013-11-19 16:56:23 -08:00
Matei Zaharia	f568912f85	Merge pull request #181 from BlackNiuza/fix_tasks_number correct number of tasks in ExecutorsUI Index `a` is not `execId` here	2013-11-19 16:11:31 -08:00
Matei Zaharia	aa638ed9c1	Merge pull request #189 from tgravescs/sparkYarnErrorHandling Impove Spark on Yarn Error handling Improve cli error handling and only allow a certain number of worker failures before failing the application. This will help prevent users from doing foolish things and their jobs running forever. For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries. The number of tries is configurable. Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it. I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems? I couldn't think of any and testing on standalone cluster as well as yarn.	2013-11-19 16:05:44 -08:00
Matei Zaharia	55925805fc	Merge pull request #187 from aarondav/example-bcast-test Enable the Broadcast examples to work in a cluster setting Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally. This issue came up on the mailing lists [here](http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3C2013111909591557147628%40ict.ac.cn%3E).	2013-11-19 16:04:01 -08:00
tgravescs	4093e9393a	Impove Spark on Yarn Error handling	2013-11-19 12:44:00 -06:00
Henry Saputra	9c934b640f	Remove the semicolons at the end of Scala code to make it more pure Scala code. Also remove unused imports as I found them along the way. Remove return statements when returning value in the Scala code. Passing compile and tests.	2013-11-19 10:19:03 -08:00
Matthew Taylor	f639b65eab	PartitionPruningRDD is using index from parent(review changes)	2013-11-19 10:48:48 +00:00
Aaron Davidson	50fd8d98c0	Enable the Broadcast examples to work in a cluster setting Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally.	2013-11-18 22:51:35 -08:00
Matthew Taylor	13b9bf494b	PartitionPruningRDD is using index from parent	2013-11-19 06:27:33 +00:00
Holden Karau	e163e31c20	Add spaces	2013-11-18 20:13:25 -08:00
Holden Karau	7de180fd13	Remove explicit boxing	2013-11-18 20:05:05 -08:00
Marek Kolodziej	99cfe89c68	Updates to reflect pull request code review	2013-11-18 22:00:36 -05:00
Marek Kolodziej	09bdfe3b16	XORShift RNG with unit tests and benchmark To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000)	2013-11-18 15:21:43 -05:00
Russell Cardullo	1360f62d15	Cleanup GraphiteSink.scala based on feedback * Reorder imports according to the style guide * Consistently use propertyToOption in all places	2013-11-18 08:53:39 -08:00
Harvey Feng	a98f5a0ebb	Misc style changes in the 'yarn' package.	2013-11-17 22:35:56 -08:00
shiyun.wxm	eda05fa439	use HashSet.empty[Long] instead of Seq[Long]	2013-11-18 13:31:14 +08:00
Reynold Xin	e2ebc3a9d8	Merge pull request #182 from rxin/vector Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.	2013-11-17 18:42:18 -08:00
Reynold Xin	26f616d73a	Merge pull request #3 from aarondav/pv-test Add PrimitiveVectorSuite and fix bug in resize()	2013-11-17 18:18:16 -08:00
Aaron Davidson	85763f4942	Add PrimitiveVectorSuite and fix bug in resize()	2013-11-17 18:16:51 -08:00
Reynold Xin	16a2286d6d	Return the vector itself for trim and resize method in PrimitiveVector.	2013-11-17 17:52:02 -08:00
BlackNiuza	ecfbaf2442	rename "a" to "statusId"	2013-11-18 09:51:40 +08:00
Reynold Xin	c30979c7d6	Slightly enhanced PrimitiveVector: 1. Added trim() method 2. Added size method. 3. Renamed getUnderlyingArray to array. 4. Minor documentation update.	2013-11-17 17:09:40 -08:00
BlackNiuza	b60839e56a	correct number of tasks in ExecutorsUI	2013-11-17 21:38:57 +08:00
Matei Zaharia	1b5b358309	Merge pull request #178 from hsaputra/simplecleanupcode Simple cleanup on Spark's Scala code Simple cleanup on Spark's Scala code while testing some modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method calls like size that does not have size effect.	2013-11-16 11:44:10 -08:00
Henry Saputra	c33f802044	Simple cleanup on Spark's Scala code while testing core and yarn modules: -) Remove some of unused imports as I found them -) Remove ";" in the imports statements -) Remove () at the end of method call like size that does not have size effect.	2013-11-15 10:32:20 -08:00
Raymond Liu	f6b2e590b1	Merge pull request #1 from aarondav/scala210-master Various merge corrections	2013-11-14 23:04:55 -08:00
Aaron Davidson	ce1d2af7e4	Use Kafka 2.10 (again)	2013-11-14 23:00:39 -08:00
Matei Zaharia	96e0fb4630	Merge pull request #173 from kayousterhout/scheduler_hang Fix bug where scheduler could hang after task failure. When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched. This isn't currently unit tested but will be once my pull request for merging the cluster and local schedulers goes in -- at which point many more of the unit tests will exercise the code paths through the cluster scheduler (currently the failure test suite uses the local scheduler, which is why we didn't see this bug before).	2013-11-14 22:29:28 -08:00
Aaron Davidson	f629ba95b6	Various merge corrections I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.	2013-11-14 22:13:09 -08:00
Matei Zaharia	dfd40e9f6f	Merge pull request #175 from kayousterhout/no_retry_not_serializable Don't retry tasks when they fail due to a NotSerializableException As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.	2013-11-14 19:44:50 -08:00
Matei Zaharia	ed25105fd9	Merge pull request #174 from ahirreddy/master Write Spark UI url to driver file on HDFS This makes the SIMR code path simpler	2013-11-14 19:43:55 -08:00
Raymond Liu	d4cd32330e	Some fixes for previous master merge commits	2013-11-15 10:22:31 +08:00
Kay Ousterhout	29c88e408e	Don't retry tasks when they fail due to a NotSerializableException	2013-11-14 15:15:19 -08:00
Kay Ousterhout	b4546ba9e6	Fix bug where scheduler could hang after task failure. When a task fails, we need to call reviveOffers() so that the task can be rescheduled on a different machine. In the current code, the state in ClusterTaskSetManager indicating which tasks are pending may be updated after revive offers is called (there's a race condition here), so when revive offers is called, the task set manager does not yet realize that there are failed tasks that need to be relaunched.	2013-11-14 13:55:03 -08:00
Reynold Xin	1a4cfbea33	Merge pull request #169 from kayousterhout/mesos_fix Don't ignore spark.cores.max when using Mesos Coarse mode totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.	2013-11-14 10:32:11 -08:00
Reynold Xin	5a4f483652	Merge pull request #170 from liancheng/hadooprdd-doc-typo Fixed a scaladoc typo in HadoopRDD.scala	2013-11-14 10:30:36 -08:00
Reynold Xin	d76f5203af	Merge pull request #171 from RIA-pierre-borckmans/master Fixed typos in the CDH4 distributions version codes. Nothing important, but annoying when doing a copy/paste...	2013-11-14 10:25:48 -08:00
RIA-pierre-borckmans	bef398e572	Fixed typos in the CDH4 distributions version codes.	2013-11-14 11:33:48 +01:00
Lian, Cheng	cc8995c8f4	Fixed a scaladoc typo in HadoopRDD.scala	2013-11-14 18:17:05 +08:00
Kay Ousterhout	5125cd3466	Don't ignore spark.cores.max when using Mesos Coarse mode	2013-11-13 23:06:17 -08:00
Raymond Liu	a60620b76a	Merge branch 'master' into scala-2.10	2013-11-14 12:44:19 +08:00
Matei Zaharia	2054c61a18	Merge pull request #159 from liancheng/dagscheduler-actor-refine Migrate the daemon thread started by DAGScheduler to Akka actor `DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread). But things become a little complicated when taking existing test code into consideration. Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct. Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896). At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.	2013-11-13 16:49:55 -08:00
Matei Zaharia	9290e5bcd2	Merge pull request #165 from NathanHowell/kerberos-master spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.	2013-11-13 16:48:44 -08:00
Ahir Reddy	0ea1f8b225	Write Spark UI url to driver file on HDFS	2013-11-13 15:23:36 -08:00

... 3 4 5 6 7 ...

4872 commits