sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement:
DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null
ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583)
WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
This previously would just contain a single class:
$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo
Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo
org.apache.hadoop.security.AnnotatedSecurityInfo
And now has the full list of classes:
$ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar
inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo
org.apache.hadoop.security.AnnotatedSecurityInfo
org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo
org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo
org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo
org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo
org.apache.hadoop.yarn.security.SchedulerSecurityInfo
org.apache.hadoop.yarn.security.admin.AdminSecurityInfo
org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws
a ClassNotFoundException when called with a closure defined in the
console. This commit catches the exception and conservatively assumes
the closure references all edge attributes.
3 Kryo related changes.
1. Call Kryo setReferences before calling user specified Kryo registrator. This is done so the user specified registrator can override the default setting.
2. Register more internal classes (MapStatus, BlockManagerId).
3. Slightly refactored the internal class registration to allocate less memory.
Add spark-tools assembly to spark-class'ss classpath
This commit adds an assembly for `spark-tools` and adds it to `spark-class`'s classpath, allowing the JavaAPICompletenessChecker to be run against Spark 0.8+ with
./spark-class org.apache.spark.tools.JavaAPICompletenessChecker
Previously, this tool was run through the `run` script. I chose to add this to `run-example` because I didn't want to duplicate code in a `run-tool` script.
Fix secure hdfs access for spark on yarn
https://github.com/apache/incubator-spark/pull/23 broke secure hdfs access. Not sure if it works with secure hdfs on standalone. Fixing it at least for spark on yarn.
The broadcasting of jobconf change also broke secure hdfs access as it didn't take into account things calling the getPartitions before sparkContext is initialized. The DAGScheduler does this as it tries to getShuffleMapStage.