Commit graph

6326 commits

Author SHA1 Message Date
Reynold Xin 749f842827 Merge pull request #489 from ash211/patch-6
Clarify spark.default.parallelism

It's the task count across the cluster, not per worker, per machine, per core, or anything else.
2014-01-21 14:53:49 -08:00
Andrew Ash 069bb94206 Clarify spark.default.parallelism
It's the task count across the cluster, not per worker, per machine, per core, or anything else.
2014-01-21 14:49:35 -08:00
Reynold Xin f8544981a6 Merge pull request #469 from ajtulloch/use-local-spark-context-in-tests-for-mllib
[MLlib] Use a LocalSparkContext trait in test suites

Replaces the 9 instances of

```scala
class XXXSuite extends FunSuite with BeforeAndAfterAll {
  @transient private var sc: SparkContext = _

  override def beforeAll() {
    sc = new SparkContext("local", "test")
  }

  override def afterAll() {
    sc.stop()
    System.clearProperty("spark.driver.port")
  }
```

with

```scala
class XXXSuite extends FunSuite with LocalSparkContext {
```
2014-01-21 10:49:54 -08:00
Andrew Tulloch 3a067b4a76 Fixed import order 2014-01-21 13:36:53 +00:00
Sandy Ryza adf42611f1 Incorporate Tom's comments - update doc and code to reflect that core requests may not always be honored 2014-01-21 00:38:02 -08:00
Patrick Wendell 77b986f661 Merge pull request #480 from pwendell/0.9-fixes
Handful of 0.9 fixes

This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate.

@mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.
2014-01-21 00:09:42 -08:00
Patrick Wendell a9bcc980b6 Style clean-up 2014-01-21 00:05:28 -08:00
Patrick Wendell c67d3d8beb Merge pull request #484 from tdas/run-example-fix
Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.

bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script.

Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts.

@matei Please check this soon I want this to go in 0.9-rc4
2014-01-20 23:34:35 -08:00
Tathagata Das 65869f843d Removed SPARK_MEM from run-examples. 2014-01-20 23:15:28 -08:00
Patrick Wendell a917a87e02 Adding small code comment 2014-01-20 23:11:45 -08:00
Reynold Xin 6b4eed779b Merge pull request #449 from CrazyJvm/master
SPARK-1028 : fix "set MASTER automatically fails" bug.

spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem.
The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
2014-01-20 22:35:45 -08:00
Patrick Wendell 0367981d47 Merge pull request #482 from tdas/streaming-example-fix
Added StreamingContext.awaitTermination to streaming examples

StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn.

This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait.

The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.
2014-01-20 22:25:50 -08:00
Reynold Xin 7373ffb5e7 Merge pull request #483 from pwendell/gitignore
Restricting /lib to top level directory in .gitignore

This patch was proposed by Sean Mackrory.
2014-01-20 21:44:29 -08:00
Tathagata Das e0b741d07a Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM. 2014-01-20 20:48:59 -08:00
Patrick Wendell e437069dce Restricting /lib to top level directory in .gitignore
This patch was proposed by Sean Mackrory.
2014-01-20 20:39:30 -08:00
Tathagata Das 2e95174c45 Added StreamingContext.awaitTermination to streaming examples. 2014-01-20 20:25:04 -08:00
Patrick Wendell d46df96de3 Avoid matching attempt files in the checkpoint 2014-01-20 20:03:23 -08:00
Patrick Wendell de526ad527 Remove shuffle files if they are still present on a machine. 2014-01-20 19:11:22 -08:00
Patrick Wendell f84400e86c Fixing speculation bug 2014-01-20 19:05:03 -08:00
Patrick Wendell c324ac10ee Force use of LZF when spilling data 2014-01-20 19:00:48 -08:00
Patrick Wendell 1b299142a8 Bug fix for reporting of spill output 2014-01-20 18:34:00 -08:00
Patrick Wendell 54867e9566 Minor fixes 2014-01-20 18:33:21 -08:00
Patrick Wendell cdb003e376 Removing docs on akka options 2014-01-20 16:40:58 -08:00
Sandy Ryza 3e85b87d90 SPARK-1033. Ask for cores in Yarn container requests 2014-01-20 14:42:32 -08:00
CodingCat 29f4b6a2d9 fix for SPARK-1027
change TestClient & Worker to Some("xxx")

kill manager if it is started

remove unnecessary .get when fetch "SPARK_HOME" values
2014-01-20 02:50:30 -05:00
CodingCat f9a95d6736 executor creation failed should not make the worker restart 2014-01-20 02:50:30 -05:00
Patrick Wendell 792d9084e2 Merge pull request #470 from tgravescs/fix_spark_examples_yarn
Only log error on missing jar to allow spark examples to jar.

Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs.  To make that nicer  so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
2014-01-19 11:33:11 -08:00
Patrick Wendell 256a3553c4 Merge pull request #458 from tdas/docs-update
Updated java API docs for streaming, along with very minor changes in the code examples.

Docs updated for:
Scala: StreamingContext, DStream, PairDStreamFunctions
Java: JavaStreamingContext, JavaDStream, JavaPairDStream

Example updated:
JavaQueueStream: Not use deprecated method
ActorWordCount: Use the public interface the right way.
2014-01-19 10:29:54 -08:00
Thomas Graves dd56b2125e update comment 2014-01-19 12:21:39 -06:00
Thomas Graves ceb79a3931 Only log error on missing jar to allow spark examples to jar. 2014-01-19 12:16:58 -06:00
Andrew Tulloch 720836a761 LocalSparkContext for MLlib 2014-01-19 17:51:00 +00:00
Yinan Li 584323c6b1 Addressed comments from Reynold
Signed-off-by: Yinan Li <liyinan926@gmail.com>
2014-01-18 21:28:17 -08:00
Patrick Wendell fe8a3546f4 Merge pull request #459 from srowen/UpdaterL2Regularization
Correct L2 regularized weight update with canonical form

Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
2014-01-18 16:29:23 -08:00
Patrick Wendell 73dfd42fba Merge pull request #437 from mridulm/master
Minor api usability changes

- Expose checkpoint directory - since it is autogenerated now
- null check for jars
- Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
2014-01-18 16:23:56 -08:00
Patrick Wendell 4c16f79ce4 Merge pull request #426 from mateiz/py-ml-tests
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)

We disabled these earlier because Jenkins didn't have these versions.
2014-01-18 16:21:43 -08:00
Patrick Wendell bf5699543b Merge pull request #462 from mateiz/conf-file-fix
Remove Typesafe Config usage and conf files to fix nested property names

With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

This PR is for branch 0.9 but should be added into master too.
(cherry picked from commit 34e911ce9a)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
2014-01-18 16:20:00 -08:00
Yinan Li fd833e7ab1 Allow files added through SparkContext.addFile() to be overwritten
This is useful for the cases when a file needs to be refreshed and downloaded
by the executors periodically.

Signed-off-by: Yinan Li <liyinan926@gmail.com>
2014-01-18 15:26:59 -08:00
Patrick Wendell aa981e4e97 Merge pull request #461 from pwendell/master
Use renamed shuffle spill config in CoGroupedRDD.scala

This one got missed when it was renamed.
2014-01-18 12:49:21 -08:00
Patrick Wendell 5316bcac3c Use renamed shuffle spill config in CoGroupedRDD.scala 2014-01-18 11:58:42 -08:00
Sean Owen e91ad3f164 Correct L2 regularized weight update with canonical form 2014-01-18 12:53:01 +00:00
Reza Zadeh 85b95d039d rename to MatrixSVD 2014-01-17 14:40:51 -08:00
Reza Zadeh fa3299835b rename to MatrixSVD 2014-01-17 14:39:30 -08:00
Reza Zadeh caf97a25a2 Merge remote-tracking branch 'upstream/master' into sparsesvd 2014-01-17 14:34:03 -08:00
Reza Zadeh 4e96757793 make example 0-indexed 2014-01-17 14:33:03 -08:00
Reza Zadeh 5c639d70df 0index docs 2014-01-17 14:31:39 -08:00
Reza Zadeh c9b4845bc1 prettify 2014-01-17 14:14:29 -08:00
Reza Zadeh dbec69bbf4 add rename computeSVD 2014-01-17 13:59:05 -08:00
Reza Zadeh eb2d8c431f replace this.type with SVD 2014-01-17 13:57:27 -08:00
Reza Zadeh cb13b15a60 use 0-indexing 2014-01-17 13:55:42 -08:00
Reza Zadeh d28bf41827 changes from PR 2014-01-17 13:39:40 -08:00