Commit graph

4982 commits

Author SHA1 Message Date
Joseph E. Gonzalez 2093a17ff3 Adding triangle count code 2013-11-19 21:35:03 -08:00
Joseph E. Gonzalez 8719ba83c8 Modifying graph loaders to create initial vertex sets more efficiently and load undirected graphs. 2013-11-19 21:35:02 -08:00
Joseph E. Gonzalez 288ae310e7 adding test for collectNeighborIds 2013-11-19 21:03:00 -08:00
Joseph E. Gonzalez 2fc6f5bd47 Switching collectNeighborIds to use mapReduceTriplets directly 2013-11-19 21:03:00 -08:00
Joseph E. Gonzalez b12b2ccde8 Addressing bug in open hash set where getPos on a full open hash set could loop forever. 2013-11-19 21:03:00 -08:00
Dan Crankshaw 96fafdbd4b Removed sleep from pagerank in Analytics. 2013-11-19 20:39:34 -08:00
Marek Kolodziej bcc6ed30bf Formatting and scoping (private[spark]) updates 2013-11-19 20:50:38 -05:00
Henry Saputra 43dfac5132 Merge branch 'master' into removesemicolonscala 2013-11-19 16:57:57 -08:00
Henry Saputra 10be58f251 Another set of changes to remove unnecessary semicolon (;) from Scala code.
Passed the sbt/sbt compile and test
2013-11-19 16:56:23 -08:00
Ankur Dave 74ade9e035 Merge pull request #62 from dcrankshaw/partitioners
Allow user to choose a partitioner at runtime
2013-11-19 16:53:58 -08:00
Dan Crankshaw 34bcf1b32b Re-added slaves file for compatibility with Spark 2013-11-19 16:46:25 -08:00
Dan Crankshaw 37a524d91c Addressed code review comments. 2013-11-19 16:39:39 -08:00
Matei Zaharia f568912f85 Merge pull request #181 from BlackNiuza/fix_tasks_number
correct number of tasks in ExecutorsUI

Index `a` is not `execId` here
2013-11-19 16:11:31 -08:00
Matei Zaharia aa638ed9c1 Merge pull request #189 from tgravescs/sparkYarnErrorHandling
Impove Spark on Yarn Error handling

Improve cli error handling and only allow a certain number of worker failures before failing the application.  This will help prevent users from doing foolish things and their jobs running forever.  For instance using 32 bit java but trying to allocate 8G containers. This loops forever without this change, now it errors out after a certain number of retries.  The number of tries is configurable.  Also increase the frequency we ping the RM to increase speed at which we get containers if they die. The Yarn MR app defaults to pinging the RM every 1 seconds, so the default of 5 seconds here is fine. But that is configurable as well in case people want to change it.

I do want to make sure there aren't any cases that calling stopExecutors in CoarseGrainedSchedulerBackend would cause problems?  I couldn't think of any and testing on standalone cluster as well as yarn.
2013-11-19 16:05:44 -08:00
Matei Zaharia 55925805fc Merge pull request #187 from aarondav/example-bcast-test
Enable the Broadcast examples to work in a cluster setting

Since they rely on println to display results, we need to first collect those results to the driver to have them actually display locally.

This issue came up on the mailing lists [here](http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3C2013111909591557147628%40ict.ac.cn%3E).
2013-11-19 16:04:01 -08:00
tgravescs 4093e9393a Impove Spark on Yarn Error handling 2013-11-19 12:44:00 -06:00
Henry Saputra 9c934b640f Remove the semicolons at the end of Scala code to make it more pure Scala code.
Also remove unused imports as I found them along the way.
Remove return statements when returning value in the Scala code.

Passing compile and tests.
2013-11-19 10:19:03 -08:00
Matthew Taylor f639b65eab PartitionPruningRDD is using index from parent(review changes) 2013-11-19 10:48:48 +00:00
Aaron Davidson 50fd8d98c0 Enable the Broadcast examples to work in a cluster setting
Since they rely on println to display results, we need to first collect
those results to the driver to have them actually display locally.
2013-11-18 22:51:35 -08:00
Matthew Taylor 13b9bf494b PartitionPruningRDD is using index from parent 2013-11-19 06:27:33 +00:00
Dan Crankshaw 639e27a396 Merge branch 'master' into partitioners 2013-11-18 22:18:59 -08:00
Dan Crankshaw 87dbb3cb1e Merge branch 'master' of github.com:dcrankshaw/graphx 2013-11-18 22:18:24 -08:00
Dan Crankshaw 5f3ee53751 Added accessVertexAttr func which somehow got lost in a merge. 2013-11-18 19:34:02 -08:00
Dan Crankshaw 8a460e1811 Added partitioner to GraphImpl constructor args. 2013-11-18 19:32:03 -08:00
Marek Kolodziej 99cfe89c68 Updates to reflect pull request code review 2013-11-18 22:00:36 -05:00
Dan Crankshaw 1022e9bf17 Fixed code review changes. 2013-11-18 18:08:32 -08:00
Marek Kolodziej 09bdfe3b16 XORShift RNG with unit tests and benchmark
To run unit test, start SBT console and type:
compile
test-only org.apache.spark.util.XORShiftRandomSuite
To run benchmark, type:
project core
console
Once the Scala console starts, type:
org.apache.spark.util.XORShiftRandom.benchmark(100000000)
2013-11-18 15:21:43 -05:00
Russell Cardullo 1360f62d15 Cleanup GraphiteSink.scala based on feedback
* Reorder imports according to the style guide
* Consistently use propertyToOption in all places
2013-11-18 08:53:39 -08:00
shiyun.wxm eda05fa439 use HashSet.empty[Long] instead of Seq[Long] 2013-11-18 13:31:14 +08:00
Dan Crankshaw 2aaa095687 Merge branch 'master' of github.com:amplab/graphx 2013-11-17 19:35:43 -08:00
Reynold Xin e2ebc3a9d8 Merge pull request #182 from rxin/vector
Slightly enhanced PrimitiveVector:

1. Added trim() method
2. Added size method.
3. Renamed getUnderlyingArray to array.
4. Minor documentation update.
2013-11-17 18:42:18 -08:00
Reynold Xin 26f616d73a Merge pull request #3 from aarondav/pv-test
Add PrimitiveVectorSuite and fix bug in resize()
2013-11-17 18:18:16 -08:00
Aaron Davidson 85763f4942 Add PrimitiveVectorSuite and fix bug in resize() 2013-11-17 18:16:51 -08:00
Reynold Xin 16a2286d6d Return the vector itself for trim and resize method in PrimitiveVector. 2013-11-17 17:52:02 -08:00
BlackNiuza ecfbaf2442 rename "a" to "statusId" 2013-11-18 09:51:40 +08:00
Reynold Xin c30979c7d6 Slightly enhanced PrimitiveVector:
1. Added trim() method
2. Added size method.
3. Renamed getUnderlyingArray to array.
4. Minor documentation update.
2013-11-17 17:09:40 -08:00
BlackNiuza b60839e56a correct number of tasks in ExecutorsUI 2013-11-17 21:38:57 +08:00
Matei Zaharia 1b5b358309 Merge pull request #178 from hsaputra/simplecleanupcode
Simple cleanup on Spark's Scala code

Simple cleanup on Spark's Scala code while testing some modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method calls like size that does not have size effect.
2013-11-16 11:44:10 -08:00
Ankur Dave 62a2a71c37 Merge pull request #65 from amplab/varenc
Use variable encoding for ints, longs, and doubles in the specialized serializers.
2013-11-15 13:12:07 -08:00
Henry Saputra c33f802044 Simple cleanup on Spark's Scala code while testing core and yarn modules:
-) Remove some of unused imports as I found them
-) Remove ";" in the imports statements
-) Remove () at the end of method call like size that does not have size effect.
2013-11-15 10:32:20 -08:00
Matei Zaharia 96e0fb4630 Merge pull request #173 from kayousterhout/scheduler_hang
Fix bug where scheduler could hang after task failure.

When a task fails, we need to call reviveOffers() so that the
task can be rescheduled on a different machine. In the current code,
the state in ClusterTaskSetManager indicating which tasks are
pending may be updated after revive offers is called (there's a
race condition here), so when revive offers is called, the task set
manager does not yet realize that there are failed tasks that need
to be relaunched.

This isn't currently unit tested but will be once my pull request for
merging the cluster and local schedulers goes in -- at which point
many more of the unit tests will exercise the code paths through
the cluster scheduler (currently the failure test suite uses the local
scheduler, which is why we didn't see this bug before).
2013-11-14 22:29:28 -08:00
Matei Zaharia dfd40e9f6f Merge pull request #175 from kayousterhout/no_retry_not_serializable
Don't retry tasks when they fail due to a NotSerializableException

As with my previous pull request, this will be unit tested once the Cluster and Local schedulers get merged.
2013-11-14 19:44:50 -08:00
Matei Zaharia ed25105fd9 Merge pull request #174 from ahirreddy/master
Write Spark UI url to driver file on HDFS

This makes the SIMR code path simpler
2013-11-14 19:43:55 -08:00
Kay Ousterhout 29c88e408e Don't retry tasks when they fail due to a NotSerializableException 2013-11-14 15:15:19 -08:00
Kay Ousterhout b4546ba9e6 Fix bug where scheduler could hang after task failure.
When a task fails, we need to call reviveOffers() so that the
task can be rescheduled on a different machine. In the current code,
the state in ClusterTaskSetManager indicating which tasks are
pending may be updated after revive offers is called (there's a
race condition here), so when revive offers is called, the task set
manager does not yet realize that there are failed tasks that need
to be relaunched.
2013-11-14 13:55:03 -08:00
Reynold Xin 1a4cfbea33 Merge pull request #169 from kayousterhout/mesos_fix
Don't ignore spark.cores.max when using Mesos Coarse mode

totalCoresAcquired is decremented but never incremented, causing Spark to effectively ignore spark.cores.max in coarse grained Mesos mode.
2013-11-14 10:32:11 -08:00
Reynold Xin 5a4f483652 Merge pull request #170 from liancheng/hadooprdd-doc-typo
Fixed a scaladoc typo in HadoopRDD.scala
2013-11-14 10:30:36 -08:00
Reynold Xin d76f5203af Merge pull request #171 from RIA-pierre-borckmans/master
Fixed typos in the CDH4 distributions version codes.

Nothing important, but annoying when doing a copy/paste...
2013-11-14 10:25:48 -08:00
RIA-pierre-borckmans bef398e572 Fixed typos in the CDH4 distributions version codes. 2013-11-14 11:33:48 +01:00
Lian, Cheng cc8995c8f4 Fixed a scaladoc typo in HadoopRDD.scala 2013-11-14 18:17:05 +08:00