ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Reynold Xin	d76f5203af	Merge pull request #171 from RIA-pierre-borckmans/master Fixed typos in the CDH4 distributions version codes. Nothing important, but annoying when doing a copy/paste...	2013-11-14 10:25:48 -08:00
RIA-pierre-borckmans	bef398e572	Fixed typos in the CDH4 distributions version codes.	2013-11-14 11:33:48 +01:00
Lian, Cheng	cc8995c8f4	Fixed a scaladoc typo in HadoopRDD.scala	2013-11-14 18:17:05 +08:00
Reynold Xin	2c39d809d6	Merge pull request #69 from jegonzal/MissingVertices Addressing issue in Graph creation	2013-11-13 23:28:01 -08:00
Kay Ousterhout	5125cd3466	Don't ignore spark.cores.max when using Mesos Coarse mode	2013-11-13 23:06:17 -08:00
Joey	33b2deafe6	Merge pull request #1 from ankurdave/MissingVertices During graph creation, create eTable earlier	2013-11-13 17:55:58 -08:00
Ankur Dave	3558e8bda1	During graph creation, create eTable earlier	2013-11-13 17:07:23 -08:00
Matei Zaharia	2054c61a18	Merge pull request #159 from liancheng/dagscheduler-actor-refine Migrate the daemon thread started by DAGScheduler to Akka actor `DAGScheduler` adopts an event queue and a daemon thread polling the it to process events sent to a `DAGScheduler`. This is a classical actor use case. By migrating this thread to Akka actor, we may benefit from both cleaner code and better performance (context switching cost of Akka actor is much less than that of a native thread). But things become a little complicated when taking existing test code into consideration. Code in `DAGSchedulerSuite` is somewhat tightly coupled with `DAGScheduler`, and directly calls `DAGScheduler.processEvent` instead of posting event messages to `DAGScheduler`. To minimize code change, I chose to let the actor to delegate messages to `processEvent`. Maybe this doesn't follow conventional actor usage, but I tried to make it apparently correct. Another tricky part is that, since `DAGScheduler` depends on the `ActorSystem` provided by its field `env`, `env` cannot be null. But the `dagScheduler` field created in `DAGSchedulerSuite.before` was given a null `env`. What's more, `BlockManager.blockIdsToBlockManagers` checks whether `env` is null to determine whether to run the production code or the test code (bad smell here, huh?). I went through all callers of `BlockManager.blockIdsToBlockManagers`, and made sure that if `env != null` holds, then `blockManagerMaster == null` must also hold. That's the logic behind `BlockManager.scala` [line 896](https://github.com/liancheng/incubator-spark/compare/dagscheduler-actor-refine?expand=1#diff-2b643ea78c1add0381754b1f47eec132L896). At last, since `DAGScheduler` instances are always `start()`ed after creation, I removed the `start()` method, and starts the `eventProcessActor` within the constructor.	2013-11-13 16:49:55 -08:00
Matei Zaharia	9290e5bcd2	Merge pull request #165 from NathanHowell/kerberos-master spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.	2013-11-13 16:48:44 -08:00
Dan Crankshaw	fa8a53619b	Added conf file to gitignore.	2013-11-14 00:17:00 +00:00
Dan Crankshaw	958d7213a5	Merge branch 'master' of https://github.com/amplab/graphx	2013-11-13 23:31:14 +00:00
Ahir Reddy	0ea1f8b225	Write Spark UI url to driver file on HDFS	2013-11-13 15:23:36 -08:00
Joseph E. Gonzalez	5a9b07ead2	Fixing documentation	2013-11-13 10:45:25 -08:00
Joseph E. Gonzalez	266eb01ce8	Addressing issue in Graph creation where a graph created with a vertex set that does not span all of the vertices in the edges will crash on triplet construction.	2013-11-13 10:45:25 -08:00
Reynold Xin	a81fcb749d	Merge pull request #68 from jegonzal/BitSetSetUntilBug Addressing bug in BitSet.setUntil(ind)	2013-11-13 10:41:01 -08:00
Joseph E. Gonzalez	f0ef75c7a4	Addressing bug in BitSet.setUntil(ind) where if invoked with a multiple of 64 could lead to an index out of bounds error.	2013-11-13 10:35:23 -08:00
Matei Zaharia	39af914b27	Merge pull request #166 from ahirreddy/simr-spark-ui SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to ... ...be retrieved by SIMR clients	2013-11-13 08:39:05 -08:00
Matei Zaharia	f49ea28d25	Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments.	2013-11-12 19:13:39 -08:00
Reynold Xin	882d069189	Fixed the bug in variable encoding for longs.	2013-11-12 18:50:03 -08:00
Matei Zaharia	87f2f4e5c2	Merge pull request #153 from ankurdave/stop-spot-cluster Enable stopping and starting a spot cluster Clusters launched using `--spot-price` contain an on-demand master and spot slaves. Because EC2 does not support stopping spot instances, the spark-ec2 script previously could only destroy such clusters. This pull request makes it possible to stop and restart a spot cluster. * The `stop` command works as expected for a spot cluster: the master is stopped and the slaves are terminated. * To start a stopped spot cluster, the user must invoke `launch --use-existing-master`. This launches fresh spot slaves but resumes the existing master.	2013-11-12 16:26:09 -08:00
Matei Zaharia	b8bf04a085	Merge pull request #160 from xiajunluan/JIRA-923 Fix bug JIRA-923 Fix column sort issue in UI for JIRA-923. https://spark-project.atlassian.net/browse/SPARK-923 Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/ui/jobs/StageTable.scala	2013-11-12 16:19:50 -08:00
Ahir Reddy	ccb099e804	SIMR Backend Scheduler will now write Spark UI URL to HDFS, which is to be retrieved by SIMR clients	2013-11-12 15:58:41 -08:00
Reynold Xin	1e5c17812d	Use variable encoding for ints, longs, and doubles in the specialized serializers.	2013-11-12 15:30:27 -08:00
Nathan Howell	48eac0bcbf	Upgrade to sbt-assembly 0.9.2	2013-11-12 13:29:25 -08:00
Nathan Howell	23146a6705	spark-assembly.jar fails to authenticate with YARN ResourceManager sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement: DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583) WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] This previously would just contain a single class: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo And now has the full list of classes: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo org.apache.hadoop.yarn.security.SchedulerSecurityInfo org.apache.hadoop.yarn.security.admin.AdminSecurityInfo org.apache.hadoop.yarn.server.RMNMSecurityInfoClass	2013-11-12 13:27:50 -08:00
Matei Zaharia	dfd1ebc2d1	Merge pull request #164 from tdas/kafka-fix Made block generator thread safe to fix Kafka bug. This is a very important bug fix. Data can and was being lost in the kafka due to this.	2013-11-12 09:10:05 -08:00
Tathagata Das	7ccbbdacb9	Made block generator thread safe to fix Kafka bug.	2013-11-12 00:10:45 -08:00
Dan Crankshaw	a13460bb64	Updated documentation	2013-11-11 23:42:02 -08:00
Dan Crankshaw	7c573a8b43	Added PartitionStrategy option	2013-11-11 23:42:01 -08:00
Dan Crankshaw	8d8056da14	Fixed issue with canonical edge partitioner.	2013-11-11 23:40:23 -08:00
Dan Crankshaw	4a670ef0ba	Merge branch 'master' of github.com:amplab/graphx	2013-11-11 21:42:08 -08:00
Dan Crankshaw	d19f2e8f3e	Removed slaves from git	2013-11-12 05:21:34 +00:00
Joey	143c01dbd6	Update README.md Changing image references to master branch.	2013-11-11 19:37:16 -08:00
Ankur Dave	bc9f7eacb9	Enable stopping and starting a spot cluster	2013-11-11 17:50:31 -08:00
Reynold Xin	2e8d45032d	Merge pull request #63 from jegonzal/VertexSetCleanup Cleanup of VertexSetRDD	2013-11-11 17:34:09 -08:00
Joseph E. Gonzalez	577092080c	Cleanning up documentation of VertexSetRDD.scala	2013-11-11 17:29:22 -08:00
Reynold Xin	b8e294a21b	Merge pull request #61 from ankurdave/pid2vid Shuffle replicated vertex attributes efficiently in columnar format	2013-11-11 16:25:42 -08:00
Reynold Xin	3d7277ccbe	Merge pull request #55 from ankurdave/aggregateNeighbors-variants Specialize mapReduceTriplets for accessing subsets of vertex attributes	2013-11-11 15:49:28 -08:00
Matei Zaharia	23b53efccd	Merge pull request #156 from haoyuan/master add tachyon module	2013-11-11 12:30:02 -08:00
tgravescs	17bb9a27b2	Add mockito to the sbt build	2013-11-11 10:01:23 -06:00
Andrew xia	e13da05424	fix format error	2013-11-11 19:15:45 +08:00
Andrew xia	37d2f3749e	cut lines to less than 100	2013-11-11 15:49:32 +08:00
Andrew xia	b3208063af	Fix bug JIRA-923	2013-11-11 15:39:10 +08:00
Ankur Dave	bee1015620	Handle ClassNotFoundException from ByteCodeUtils ByteCodeUtils.invokedMethod(), which we use in mapReduceTriplets, throws a ClassNotFoundException when called with a closure defined in the console. This commit catches the exception and conservatively assumes the closure references all edge attributes.	2013-11-10 23:00:37 -08:00
Lian, Cheng	e2a43b3dcc	Made some changes according to suggestions from @aarondav	2013-11-11 12:21:54 +08:00
Haoyuan Li	6f455553c9	expose UI port only	2013-11-10 16:00:09 -08:00
Dan Crankshaw	60db25bded	Fixed merge conflicts.	2013-11-10 15:45:55 -08:00
Ankur Dave	d1ff1b7222	Build pid2vid structures only once, in Vid2Pid	2013-11-10 14:47:39 -08:00
Ankur Dave	502c511711	Use pid2vid for creating VTableReplicatedValues	2013-11-10 14:36:14 -08:00
Ankur Dave	53d24a973e	Fix typo	2013-11-10 14:24:38 -08:00

... 2 3 4 5 6 ...

4985 commits