Commit graph

4985 commits

Author SHA1 Message Date
Reynold Xin d76f5203af Merge pull request #171 from RIA-pierre-borckmans/master
Fixed typos in the CDH4 distributions version codes.

Nothing important, but annoying when doing a copy/paste...
2013-11-14 10:25:48 -08:00
RIA-pierre-borckmans bef398e572 Fixed typos in the CDH4 distributions version codes. 2013-11-14 11:33:48 +01:00
Lian, Cheng cc8995c8f4 Fixed a scaladoc typo in HadoopRDD.scala 2013-11-14 18:17:05 +08:00
Reynold Xin 2c39d809d6 Merge pull request #69 from jegonzal/MissingVertices
Addressing issue in Graph creation
2013-11-13 23:28:01 -08:00
Kay Ousterhout 5125cd3466 Don't ignore spark.cores.max when using Mesos Coarse mode 2013-11-13 23:06:17 -08:00
Joey 33b2deafe6 Merge pull request #1 from ankurdave/MissingVertices
During graph creation, create eTable earlier
2013-11-13 17:55:58 -08:00
Ankur Dave 3558e8bda1 During graph creation, create eTable earlier 2013-11-13 17:07:23 -08:00
Matei Zaharia 2054c61a18 Merge pull request #159 from liancheng/dagscheduler-actor-refine
Migrate the daemon thread started by DAGScheduler to Akka actor

`DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`.  This is a classical actor use case.  By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread).

But things become a little complicated when taking existing test code into consideration.

Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`.  To minimize code change, I chose to let the actor to delegate messages to `processEvent`.  Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct.

Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null.  But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`.  What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?).  I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold.  That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896).

At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.
2013-11-13 16:49:55 -08:00
Matei Zaharia 9290e5bcd2 Merge pull request #165 from NathanHowell/kerberos-master
spark-assembly.jar fails to authenticate with YARN ResourceManager

The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.
2013-11-13 16:48:44 -08:00
Dan Crankshaw fa8a53619b Added conf file to gitignore. 2013-11-14 00:17:00 +00:00
Dan Crankshaw 958d7213a5 Merge branch 'master' of https://github.com/amplab/graphx 2013-11-13 23:31:14 +00:00
Ahir Reddy 0ea1f8b225 Write Spark UI url to driver file on HDFS 2013-11-13 15:23:36 -08:00
Joseph E. Gonzalez 5a9b07ead2 Fixing documentation 2013-11-13 10:45:25 -08:00
Joseph E. Gonzalez 266eb01ce8 Addressing issue in Graph creation where a graph created with a vertex set that does not span all of the vertices in the edges will crash on triplet construction. 2013-11-13 10:45:25 -08:00
Reynold Xin a81fcb749d Merge pull request #68 from jegonzal/BitSetSetUntilBug
Addressing bug in BitSet.setUntil(ind)
2013-11-13 10:41:01 -08:00
Joseph E. Gonzalez f0ef75c7a4 Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64 could lead to an index out of bounds error. 2013-11-13 10:35:23 -08:00
Matei Zaharia 39af914b27 Merge pull request #166 from ahirreddy/simr-spark-ui
SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ...

...be retrieved by SIMR clients
2013-11-13 08:39:05 -08:00
Matei Zaharia f49ea28d25 Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase
Allow spark on yarn to be run from HDFS.

Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs.  Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate.  Also add a bit of error handling for missing arguments.
2013-11-12 19:13:39 -08:00
Reynold Xin 882d069189 Fixed the bug in variable encoding for longs. 2013-11-12 18:50:03 -08:00
Matei Zaharia 87f2f4e5c2 Merge pull request #153 from ankurdave/stop-spot-cluster
Enable stopping and starting a spot cluster

Clusters launched using `--spot-price` contain an on-demand master and spot slaves. Because EC2 does not support stopping spot instances, the spark-ec2 script previously could only destroy such clusters.

This pull request makes it possible to stop and restart a spot cluster.
* The `stop` command works as expected for a spot cluster: the master is stopped and the slaves are terminated.
* To start a stopped spot cluster, the user must invoke `launch --use-existing-master`. This launches fresh spot slaves but resumes the existing master.
2013-11-12 16:26:09 -08:00
Matei Zaharia b8bf04a085 Merge pull request #160 from xiajunluan/JIRA-923
Fix bug JIRA-923

Fix column sort issue in UI for JIRA-923.
https://spark-project.atlassian.net/browse/SPARK-923

Conflicts:
	core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
	core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala
2013-11-12 16:19:50 -08:00
Ahir Reddy ccb099e804 SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients 2013-11-12 15:58:41 -08:00
Reynold Xin 1e5c17812d Use variable encoding for ints, longs, and doubles in the specialized serializers. 2013-11-12 15:30:27 -08:00
Nathan Howell 48eac0bcbf Upgrade to sbt-assembly 0.9.2 2013-11-12 13:29:25 -08:00
Nathan Howell 23146a6705 spark-assembly.jar fails to authenticate with YARN ResourceManager
sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement:

    DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
    DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
    ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583)
    WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
    ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

This previously would just contain a single class:

$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo
Archive:  assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
  inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo

    org.apache.hadoop.security.AnnotatedSecurityInfo

And now has the full list of classes:

$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive:  assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
  inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo

    org.apache.hadoop.security.AnnotatedSecurityInfo
    org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo
    org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo
    org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo
    org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo
    org.apache.hadoop.yarn.security.SchedulerSecurityInfo
    org.apache.hadoop.yarn.security.admin.AdminSecurityInfo
    org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
2013-11-12 13:27:50 -08:00
Matei Zaharia dfd1ebc2d1 Merge pull request #164 from tdas/kafka-fix
Made block generator thread safe to fix Kafka bug.

This is a very important bug fix. Data can and was being lost in the kafka due to this.
2013-11-12 09:10:05 -08:00
Tathagata Das 7ccbbdacb9 Made block generator thread safe to fix Kafka bug. 2013-11-12 00:10:45 -08:00
Dan Crankshaw a13460bb64 Updated documentation 2013-11-11 23:42:02 -08:00
Dan Crankshaw 7c573a8b43 Added PartitionStrategy option 2013-11-11 23:42:01 -08:00
Dan Crankshaw 8d8056da14 Fixed issue with canonical edge partitioner. 2013-11-11 23:40:23 -08:00
Dan Crankshaw 4a670ef0ba Merge branch 'master' of github.com:amplab/graphx 2013-11-11 21:42:08 -08:00
Dan Crankshaw d19f2e8f3e Removed slaves from git 2013-11-12 05:21:34 +00:00
Joey 143c01dbd6 Update README.md
Changing image references to master branch.
2013-11-11 19:37:16 -08:00
Ankur Dave bc9f7eacb9 Enable stopping and starting a spot cluster 2013-11-11 17:50:31 -08:00
Reynold Xin 2e8d45032d Merge pull request #63 from jegonzal/VertexSetCleanup
Cleanup of VertexSetRDD
2013-11-11 17:34:09 -08:00
Joseph E. Gonzalez 577092080c Cleanning up documentation of VertexSetRDD.scala 2013-11-11 17:29:22 -08:00
Reynold Xin b8e294a21b Merge pull request #61 from ankurdave/pid2vid
Shuffle replicated vertex attributes efficiently in columnar format
2013-11-11 16:25:42 -08:00
Reynold Xin 3d7277ccbe Merge pull request #55 from ankurdave/aggregateNeighbors-variants
Specialize mapReduceTriplets for accessing subsets of vertex attributes
2013-11-11 15:49:28 -08:00
Matei Zaharia 23b53efccd Merge pull request #156 from haoyuan/master
add tachyon module
2013-11-11 12:30:02 -08:00
tgravescs 17bb9a27b2 Add mockito to the sbt build 2013-11-11 10:01:23 -06:00
Andrew xia e13da05424 fix format error 2013-11-11 19:15:45 +08:00
Andrew xia 37d2f3749e cut lines to less than 100 2013-11-11 15:49:32 +08:00
Andrew xia b3208063af Fix bug JIRA-923 2013-11-11 15:39:10 +08:00
Ankur Dave bee1015620 Handle ClassNotFoundException from ByteCodeUtils
ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws
a ClassNotFoundException when called with a closure defined in the
console. This commit catches the exception and conservatively assumes
the closure references all edge attributes.
2013-11-10 23:00:37 -08:00
Lian, Cheng e2a43b3dcc Made some changes according to suggestions from @aarondav 2013-11-11 12:21:54 +08:00
Haoyuan Li 6f455553c9 expose UI port only 2013-11-10 16:00:09 -08:00
Dan Crankshaw 60db25bded Fixed merge conflicts. 2013-11-10 15:45:55 -08:00
Ankur Dave d1ff1b7222 Build pid2vid structures only once, in Vid2Pid 2013-11-10 14:47:39 -08:00
Ankur Dave 502c511711 Use pid2vid for creating VTableReplicatedValues 2013-11-10 14:36:14 -08:00
Ankur Dave 53d24a973e Fix typo 2013-11-10 14:24:38 -08:00