Commit graph

5416 commits

Author SHA1 Message Date
Prashant Sharma bc311bb826 Restored the previously removed test 2014-01-03 14:52:37 +05:30
Prashant Sharma 94f2fffa23 fixed review comments 2014-01-03 14:43:37 +05:30
Luca Rosellini 87248bddac Merge pull request #1 from apache/master
Merge latest Spark changes
2014-01-03 00:45:31 -08:00
liguoqiang 8ddbd531a4 merge upstream/master 2014-01-03 16:06:34 +08:00
liguoqiang b27b75f1c5 Modify spark on yarn to create SparkConf process 2014-01-03 15:34:24 +08:00
Patrick Wendell 30b9db0abe Merge pull request #285 from colorant/yarn-refactor
Yarn refactor
2014-01-02 23:15:55 -08:00
liguoqiang 010e72c079 Modify spark on yarn to create SparkConf process 2014-01-03 15:01:38 +08:00
Prashant Sharma b4bb80002b Merge branch 'master' into spark-1002-remove-jars 2014-01-03 12:12:04 +05:30
Raymond Liu f442afc22e fix docs for yarn 2014-01-03 14:14:35 +08:00
Andrew Or df413e996f Merge remote-tracking branch 'spark/master'
Conflicts:
	core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
2014-01-02 20:51:23 -08:00
Raymond Liu 18b3633e54 minor fix for loginfo 2014-01-03 12:14:38 +08:00
Raymond Liu c59029402d move duplicate pom config into parent pom 2014-01-03 12:14:38 +08:00
Raymond Liu ebdfa6bb97 Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2 2014-01-03 12:14:38 +08:00
Raymond Liu a47ebf7228 Add yarn/common/src/test dir in building script 2014-01-03 12:14:38 +08:00
Raymond Liu ddc5054b35 Fix yarn/README.md 2014-01-03 12:14:38 +08:00
Raymond Liu 79b6b4ddc2 Clean up unused files for yarn 2014-01-03 12:14:38 +08:00
Raymond Liu 7c96faee74 Fix pom for build yarn/2.x with yarn/common into one jar 2014-01-03 12:14:38 +08:00
Raymond Liu d1a6f7aabc Use unmanaged source dir to include common yarn code 2014-01-03 12:14:37 +08:00
Raymond Liu c5422e02b8 merge yarn/scheduler yarn/common code into one directory 2014-01-03 12:14:37 +08:00
Raymond Liu ad60710010 Need to send dummy hello message to actually estabilish akka connection. 2014-01-03 12:14:37 +08:00
Raymond Liu dd6d347f4f A few clean up for yarn 2.0 code 2014-01-03 12:14:37 +08:00
Raymond Liu 7815a3ace9 Update maven build documentation 2014-01-03 12:12:38 +08:00
Raymond Liu be343d2a56 Fix yarn/README.md and update docs/running-on-yarn.md 2014-01-03 12:12:38 +08:00
Raymond Liu 67cd752e74 Add README for yarn modules 2014-01-03 12:12:38 +08:00
Raymond Liu e867e31145 some code clean up for Yarn 2.2 2014-01-03 12:12:37 +08:00
Raymond Liu 8818661721 Fix pom file for scala binary version 2014-01-03 12:12:37 +08:00
Raymond Liu 96e25e567c Fix yarn/assemble pom file 2014-01-03 12:12:37 +08:00
Raymond Liu aec96dd108 Change profile name new-yarn to hadoop2.2-yarn 2014-01-03 12:12:37 +08:00
Raymond Liu d1528c7c8c Fix pom for yarn code reorgnaize commit 2014-01-03 12:12:37 +08:00
Raymond Liu 3dc379ce5a Reorganize yarn related codes into sub projects to remove duplicate files. 2014-01-03 12:12:37 +08:00
Tathagata Das a1b8dd53e3 Added StreamingContext.getOrCreate to for automatic recovery, and added RecoverableNetworkWordCount example to use it. 2014-01-02 19:07:22 -08:00
Patrick Wendell 498a5f0a1c Merge pull request #323 from tgravescs/sparkconf_yarn_fix
fix spark on yarn after the sparkConf changes

This fixes it so that spark on yarn now compiles and works after the sparkConf changes.

There are also other issues I discovered along the way that are broken:
- mvn builds for yarn don't assemble correctly
- unset SPARK_EXAMPLES_JAR isn't handled properly anymore
- I'm pretty sure spark.conf doesn't actually work as its not distributed with yarn

those things can be fixed in separate pr unless others disagree.
2014-01-02 19:06:40 -08:00
Hossein Falaki 81989e2664 Commented the last part of collaborative filtering examples that lead to errors 2014-01-02 16:22:13 -08:00
Hossein Falaki c189c8362c Added Scala and Python examples for mllib 2014-01-02 15:22:20 -08:00
Reynold Xin 0475ca8f81 Merge pull request #320 from kayousterhout/erroneous_failed_msg
Remove erroneous FAILED state for killed tasks.

Currently, when tasks are killed, the Executor first sends a
status update for the task with a "KILLED" state, and then
sends a second status update with a "FAILED" state saying that
the task failed due to an exception. The second FAILED state is
misleading/unncessary, and occurs due to a NonLocalReturnControl
Exception that gets thrown due to the way we kill tasks. This
commit eliminates that problem.

I'm not at all sure that this is the best way to fix this problem,
so alternate suggestions welcome. @rxin guessing you're the right
person to look at this.
2014-01-02 15:17:08 -08:00
Thomas Graves fced7885cb fix yarn-client 2014-01-02 17:11:16 -06:00
Thomas Graves c6de982be6 Fix yarn build after sparkConf changes 2014-01-02 16:50:35 -06:00
Aaron Davidson 8831923219 TempBlockId takes UUID and is explicitly non-serializable 2014-01-02 13:52:35 -08:00
Patrick Wendell 588a1695f4 Merge pull request #297 from tdas/window-improvement
Improvements to DStream window ops and refactoring of Spark's CheckpointSuite

- Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located.
- Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads.
- Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary.
- Added mapSideCombine option to combineByKeyAndWindow.
2014-01-02 13:20:54 -08:00
Matei Zaharia 7bafb68d77 Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-spark 2014-01-02 15:57:28 -05:00
Reynold Xin 5e67cdc801 Merge pull request #319 from kayousterhout/remove_error_method
Removed redundant TaskSetManager.error() function.

This function was leftover from a while ago, and now just
passes all calls through to the abort() function, so this
commit deletes it.
2014-01-02 12:56:28 -08:00
Matei Zaharia ca67909cd4 Merge pull request #311 from tmyklebu/master
SPARK-991: Report information gleaned from a Python stacktrace in the UI

Scala:

- Added setCallSite/clearCallSite to SparkContext and JavaSparkContext.
  These functions mutate a LocalProperty called "externalCallSite."
- Add a wrapper, getCallSite, that checks for an externalCallSite and, if
  none is found, calls the usual Utils.formatSparkCallSite.
- Change everything that calls Utils.formatSparkCallSite to call
  getCallSite instead. Except getCallSite.
- Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext.

Python:

- Add a gruesome hack to rdd.py that inspects the traceback and guesses
  what you want to see in the UI.
- Add a RAII wrapper around said gruesome hack that calls
  setCallSite/clearCallSite as appropriate.
- Wire said RAII wrapper up around three calls into the Scala code.
  I'm not sure that I hit all the spots with the RAII wrapper. I'm also
  not sure that my gruesome hack does exactly what we want.

One could also approach this change by refactoring
runJob/submitJob/runApproximateJob to take a call site, then threading
that parameter through everything that needs to know it.

One might object to the pointless-looking wrappers in JavaSparkContext.
Unfortunately, I can't directly access the SparkContext from
Python---or, if I can, I don't know how---so I need to wrap everything
that matters in JavaSparkContext.

Conflicts:
	core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
2014-01-02 15:54:54 -05:00
Kay Ousterhout a1b438d94d Remove erroneous FAILED state for killed tasks.
Currently, when tasks are killed, the Executor first sends a
status update for the task with a "KILLED" state, and then
sends a second status update with a "FAILED" state saying that
the task failed due to an exception. The second FAILED state is
misleading/unncessary, and occurs due to a NonLocalReturnControl
Exception that gets thrown due to the way we kill tasks. This
commit eliminates that problem.
2014-01-02 12:34:46 -08:00
Kay Ousterhout 5a3c00c958 Removed redundant TaskSetManager.error() function.
This function was leftover from a while ago, and now just
passes all calls through to the abort() function, so this
commit deletes it.
2014-01-02 11:13:58 -08:00
Prashant Sharma 59e8009b8d a few left over document change 2014-01-02 21:48:44 +05:30
Sean Owen 66d501276b Suggested small changes to Java code for slightly more standard style, encapsulation and in some cases performance 2014-01-02 16:17:57 +00:00
Prashant Sharma a3f90a2ecf pyspark -> bin/pyspark 2014-01-02 18:50:12 +05:30
Prashant Sharma 94b7a7fe37 run-example -> bin/run-example 2014-01-02 18:41:21 +05:30
Prashant Sharma b810a85cdd spark-shell -> bin/spark-shell 2014-01-02 18:37:40 +05:30
Prashant Sharma 980afd280a Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts
Conflicts:
	bin/spark-shell
	core/pom.xml
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	python/run-tests
	sbin/compute-classpath.sh
	sbin/spark-class
	sbin/stop-slaves.sh
2014-01-02 17:55:21 +05:30