Commit graph

4967 commits

Author SHA1 Message Date
Josh Rosen a37ff0f1db Add spark-tools assembly to spark-class classpath.
This allows the JavaAPICompletenessChecker to be
run with Spark 0.8+.
2013-11-09 13:42:45 -08:00
Matei Zaharia 72a601ec31 Merge pull request #152 from rxin/repl
Propagate SparkContext local properties from spark-repl caller thread to the repl execution thread.
2013-11-09 11:55:16 -08:00
soulmachine 28115fa8cb replace the thread with a Akka scheduler 2013-11-09 22:38:27 +08:00
Lian, Cheng 765ebca04f Remove unnecessary null checking 2013-11-09 21:13:03 +08:00
Lian, Cheng 2539c06745 Replaced the daemon thread started by DAGScheduler with an actor 2013-11-09 19:05:18 +08:00
Reynold Xin 319299941d Propagate the SparkContext local property from the thread that calls the spark-repl to the actual execution thread. 2013-11-09 00:32:14 -08:00
Russell Cardullo ef85a51f85 Add graphite sink for metrics
This adds a metrics sink for graphite.  The sink must
be configured with the host and port of a graphite node
and optionally may be configured with a prefix that will
be prepended to all metrics that are sent to graphite.
2013-11-08 16:36:03 -08:00
Joseph E. Gonzalez 6083e4350f Adding unit tests to reproduce error. 2013-11-08 15:39:30 -08:00
Aaron Davidson dd63c548c2 Use SPARK_HOME instead of user.dir in ExecutorRunnerTest 2013-11-08 12:51:05 -08:00
tgravescs 13a19505e4 Don't call the doAs if user is unknown or the same user that is already running 2013-11-08 12:04:09 -06:00
tgravescs f95cb04e40 Remove the runAsUser as it breaks secure hdfs access 2013-11-08 10:07:15 -06:00
tgravescs 5f9ed51719 Fix access to Secure HDFS 2013-11-08 08:41:57 -06:00
Joseph E. Gonzalez 161784d0e6 Fixing tests 2013-11-07 20:40:21 -08:00
Joseph E. Gonzalez e523f0d2fb merged and debugged 2013-11-07 20:19:49 -08:00
Joseph E. Gonzalez 908e606473 Additional optimizations 2013-11-07 19:47:30 -08:00
Reynold Xin bac7be30cd Made more specialized messages. 2013-11-07 19:39:48 -08:00
Reynold Xin 64ad3b18d9 Merge branch 'master' into rxin
Conflicts:
	graph/src/main/scala/org/apache/spark/graph/impl/GraphImpl.scala
2013-11-07 19:23:42 -08:00
Reynold Xin 2406bf33e4 Use custom serializer for aggregation messages when the data type is int/double. 2013-11-07 19:18:58 -08:00
Ankur Dave 6ee05be1c8 Merge pull request #49 from jegonzal/graphxshell
GraphX Console with Logo Text
2013-11-07 19:12:41 -08:00
Ankur Dave a9f96b54e4 Merge pull request #56 from jegonzal/PregelAPIChanges
Changing Pregel API to use mapReduceTriplets instead of aggregateNeighbors
2013-11-07 18:56:56 -08:00
Joseph E. Gonzalez e9308e0e75 Changing Pregel API to operate directly on edge triplets in SendMessage rather than (Vid, EdgeTriplet) pairs. 2013-11-07 18:04:06 -08:00
Reynold Xin 5907137d11 Merge pull request #54 from amplab/rxin
Converted for loops to while loops in EdgePartition.
2013-11-07 16:58:31 -08:00
Reynold Xin 6fadff2b92 Converted for loops to while loops in EdgePartition. 2013-11-07 16:54:33 -08:00
Reynold Xin edf41647f4 Merge pull request #53 from amplab/rxin
Added GraphX to classpath.
2013-11-07 16:22:43 -08:00
Reynold Xin 95f1f5315e Added GraphX to classpath. 2013-11-07 16:22:05 -08:00
Reynold Xin c379e10455 Merge pull request #51 from jegonzal/VertexSetRDD
Reverting to Array based (materialized) output in VertexSetRDD
2013-11-07 16:01:47 -08:00
Reynold Xin 3d4ad84b63 Merge pull request #148 from squito/include_appId
Include appId in executor cmd line args

add the appId back into the executor cmd line args.

I also made a pretty lame regression test, just to make sure it doesn't get dropped in the future.  not sure it will run on the build server, though, b/c `ExecutorRunner.buildCommandSeq()` expects to be abel to run the scripts in `bin`.
2013-11-07 11:08:27 -08:00
Imran Rashid ca66f5d5a2 fix formatting 2013-11-07 07:23:59 -06:00
Imran Rashid 8d3cdda9a2 very basic regression test to make sure appId doesnt get dropped in future 2013-11-07 01:35:48 -06:00
Reynold Xin be7e8da98a Merge pull request #23 from jerryshao/multi-user
Add Spark multi-user support for standalone mode and Mesos

This PR add multi-user support for Spark both standalone mode and Mesos (coarse and fine grained ) mode, user can specify the user name who submit app through environment variable `SPARK_USER` or use default one. Executor will communicate with Hadoop using  specified user name.

Also I fixed one bug in JobLogger when different user wrote job log to specified folder which has no right file  permission.

I separate previous [PR750](https://github.com/mesos/spark/pull/750) into two PRs, in this PR I only solve multi-user support problem. I will try to solve security auth problem in subsequent PR because security auth is a complicated problem especially for Shark Server like long-run app (both Kerberos TGT and HDFS delegation token should be renewed or re-created through app's run time).
2013-11-06 23:22:47 -08:00
Imran Rashid 36e832bff0 include the appid in the cmd line arguments to Executors 2013-11-07 01:11:49 -06:00
Dan Crankshaw 384befb208 Merge branch 'master' of github.com:amplab/graphx 2013-11-06 19:50:55 -08:00
jerryshao 12dc385a49 Add Spark multi-user support for standalone mode and Mesos 2013-11-07 11:18:09 +08:00
Reynold Xin aadeda5e76 Merge pull request #144 from liancheng/runjob-clean
Removed unused return value in SparkContext.runJob

Return type of this `runJob` version is `Unit`:

    def runJob[T, U: ClassManifest](
        rdd: RDD[T],
        func: (TaskContext, Iterator[T]) => U,
        partitions: Seq[Int],
        allowLocal: Boolean,
        resultHandler: (Int, U) => Unit) {
        ...
    }

It's obviously unnecessary to "return" `result`.
2013-11-06 13:27:47 -08:00
Reynold Xin 951024feea Merge pull request #145 from aarondav/sls-fix
Attempt to fix SparkListenerSuite breakage

Could not reproduce locally, but this test could've been flaky if the build machine was too fast, due to typo. (index 0 is intentionally slowed down to ensure total time is >= 1 ms)

This should be merged into branch-0.8 as well.
2013-11-06 09:36:14 -08:00
Aaron Davidson 80e98d2bd7 Attempt to fix SparkListenerSuite breakage
Could not reproduce locally, but this test could've been flaky if the
build machine was too fast.
2013-11-06 08:03:35 -08:00
Lian, Cheng a0c4565183 Removed unused return value in SparkContext.runJob 2013-11-06 23:18:59 +08:00
Reynold Xin bf4e6131cc Merge pull request #143 from rxin/scheduler-hang
Ignore a task update status if the executor doesn't exist anymore.

Otherwise if the scheduler receives a task update message when the executor's been removed, the scheduler would hang.

It is pretty hard to add unit tests for these right now because it is hard to mock the cluster scheduler. We should do that once @kayousterhout finishes merging the local scheduler and the cluster scheduler.
2013-11-05 23:14:09 -08:00
Reynold Xin a02eed6811 Ignore a task update status if the executor doesn't exist anymore. 2013-11-05 18:46:38 -08:00
Reynold Xin 9f7b9bb1cd Merge pull request #142 from liancheng/dagscheduler-pattern-matching
Using case class deep match to simplify code in DAGScheduler.processEvent

Since all `XxxEvent` pushed in `DAGScheduler.eventQueue` are case classes, deep pattern matching is more convenient to extract event object components.
2013-11-05 10:42:19 -08:00
Joseph E. Gonzalez 8ac15e8e43 Merge branch 'master' of https://github.com/amplab/graphx into graphxshell 2013-11-05 01:37:12 -08:00
Joseph E. Gonzalez 3e504938c2 merging upstream changes 2013-11-05 01:36:48 -08:00
Joey ca44b5134a Merge pull request #50 from amplab/mergemerge
Merge Spark master into graphx
2013-11-05 01:32:55 -08:00
Lian, Cheng 8b4c994e8c Using compact case class pattern matching syntax to simplify code in DAGScheduler.processEvent 2013-11-05 17:18:42 +08:00
Joseph E. Gonzalez 2dc9ec2387 Reverting to Array based (materialized) output of all VertexSetRDD operations. 2013-11-05 01:15:12 -08:00
Reynold Xin 551a43fd3d Merge branch 'master' of github.com:apache/incubator-spark into mergemerge
Conflicts:
	README.md
	core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala
	core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala
	core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
2013-11-04 21:02:36 -08:00
Reynold Xin 81065321c0 Merge pull request #139 from aarondav/shuffle-next
Never store shuffle blocks in BlockManager

After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use
within BlockManager (they had a no-arg constructor!). This patch completely eliminates
them, saving us around 100-150 bytes per shuffle block.
The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding
state saved by the MapOutputTracker.

Note: This should *not* be merged directly into 0.8.0 -- see #138
2013-11-04 20:47:14 -08:00
Joseph E. Gonzalez 3c37928fab This commit adds a new graphx-shell which is essentially the same as
the spark shell but with GraphX packages automatically imported
and with Kryo serialization enabled for GraphX types.

In addition the graphx-shell has a nifty new logo.

To make these changes minimally invasive in the SparkILoop.scala
I added some additional environment variables:

   SPARK_BANNER_TEXT: If set this string is displayed instead
   of the spark logo

   SPARK_SHELL_INIT_BLOCK: if set this expression is evaluated in the
   spark shell after the spark context is created.
2013-11-04 20:10:15 -08:00
Aaron Davidson 93c90844cb Never store shuffle blocks in BlockManager
After the BlockId refactor (PR #114), it became very clear that ShuffleBlocks are of no use
within BlockManager (they had a no-arg constructor!). This patch completely eliminates
them, saving us around 100-150 bytes per shuffle block.
The total, system-wide overhead per shuffle block is now a flat 8 bytes, excluding
state saved by the MapOutputTracker.
2013-11-04 18:43:42 -08:00
Reynold Xin 0b26a392df Merge pull request #128 from shimingfei/joblogger-doc
add javadoc to JobLogger, and some small fix

against Spark-941

add javadoc to JobLogger, output more info for RDD, modify recordStageDepGraph to avoid output duplicate stage dependency information

(cherry picked from commit 518cf22eb2)
Signed-off-by: Reynold Xin <rxin@apache.org>
2013-11-04 18:22:06 -08:00