Commit graph

6300 commits

Author SHA1 Message Date
CodingCat 29f4b6a2d9 fix for SPARK-1027
change TestClient & Worker to Some("xxx")

kill manager if it is started

remove unnecessary .get when fetch "SPARK_HOME" values
2014-01-20 02:50:30 -05:00
CodingCat f9a95d6736 executor creation failed should not make the worker restart 2014-01-20 02:50:30 -05:00
Patrick Wendell 792d9084e2 Merge pull request #470 from tgravescs/fix_spark_examples_yarn
Only log error on missing jar to allow spark examples to jar.

Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs.  To make that nicer  so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
2014-01-19 11:33:11 -08:00
Patrick Wendell 256a3553c4 Merge pull request #458 from tdas/docs-update
Updated java API docs for streaming, along with very minor changes in the code examples.

Docs updated for:
Scala: StreamingContext, DStream, PairDStreamFunctions
Java: JavaStreamingContext, JavaDStream, JavaPairDStream

Example updated:
JavaQueueStream: Not use deprecated method
ActorWordCount: Use the public interface the right way.
2014-01-19 10:29:54 -08:00
Thomas Graves dd56b2125e update comment 2014-01-19 12:21:39 -06:00
Thomas Graves ceb79a3931 Only log error on missing jar to allow spark examples to jar. 2014-01-19 12:16:58 -06:00
Andrew Tulloch 720836a761 LocalSparkContext for MLlib 2014-01-19 17:51:00 +00:00
Patrick Wendell fe8a3546f4 Merge pull request #459 from srowen/UpdaterL2Regularization
Correct L2 regularized weight update with canonical form

Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
2014-01-18 16:29:23 -08:00
Patrick Wendell 73dfd42fba Merge pull request #437 from mridulm/master
Minor api usability changes

- Expose checkpoint directory - since it is autogenerated now
- null check for jars
- Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
2014-01-18 16:23:56 -08:00
Patrick Wendell 4c16f79ce4 Merge pull request #426 from mateiz/py-ml-tests
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)

We disabled these earlier because Jenkins didn't have these versions.
2014-01-18 16:21:43 -08:00
Patrick Wendell bf5699543b Merge pull request #462 from mateiz/conf-file-fix
Remove Typesafe Config usage and conf files to fix nested property names

With Typesafe Config we had the subtle problem of no longer allowing
nested property names, which are used for a few of our properties:
http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html

This PR is for branch 0.9 but should be added into master too.
(cherry picked from commit 34e911ce9a)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
2014-01-18 16:20:00 -08:00
Patrick Wendell aa981e4e97 Merge pull request #461 from pwendell/master
Use renamed shuffle spill config in CoGroupedRDD.scala

This one got missed when it was renamed.
2014-01-18 12:49:21 -08:00
Patrick Wendell 5316bcac3c Use renamed shuffle spill config in CoGroupedRDD.scala 2014-01-18 11:58:42 -08:00
Sean Owen e91ad3f164 Correct L2 regularized weight update with canonical form 2014-01-18 12:53:01 +00:00
Reza Zadeh 85b95d039d rename to MatrixSVD 2014-01-17 14:40:51 -08:00
Reza Zadeh fa3299835b rename to MatrixSVD 2014-01-17 14:39:30 -08:00
Reza Zadeh caf97a25a2 Merge remote-tracking branch 'upstream/master' into sparsesvd 2014-01-17 14:34:03 -08:00
Reza Zadeh 4e96757793 make example 0-indexed 2014-01-17 14:33:03 -08:00
Reza Zadeh 5c639d70df 0index docs 2014-01-17 14:31:39 -08:00
Reza Zadeh c9b4845bc1 prettify 2014-01-17 14:14:29 -08:00
Reza Zadeh dbec69bbf4 add rename computeSVD 2014-01-17 13:59:05 -08:00
Reza Zadeh eb2d8c431f replace this.type with SVD 2014-01-17 13:57:27 -08:00
Reza Zadeh cb13b15a60 use 0-indexing 2014-01-17 13:55:42 -08:00
Reza Zadeh d28bf41827 changes from PR 2014-01-17 13:39:40 -08:00
Mridul Muralidharan b690e11d9c Address review comment 2014-01-17 18:28:55 +05:30
Patrick Wendell d749d472b3 Merge pull request #451 from Qiuzhuang/master
Fixed Window spark shell launch script error.

 JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
2014-01-16 23:18:15 -08:00
Patrick Wendell d4fd89e3c8 Merge pull request #438 from ScrapCodes/clone-records-java-api
Clone records java api
2014-01-16 23:17:30 -08:00
Prashant Sharma fcb4fc653d adding clone records field to equivaled java apis 2014-01-17 11:16:03 +05:30
Tathagata Das 11e6534d92 Updated java API docs for streaming, along with very minor changes in the code examples. 2014-01-16 14:44:02 -08:00
Mridul Muralidharan edd82c58a2 Use method, not variable 2014-01-16 17:26:42 +05:30
Mridul Muralidharan 1a0da89277 Address review comments 2014-01-16 17:23:25 +05:30
Qiuzhuang Lian 4e510b0b0c Fixed Window spark shell launch script error.
JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
2014-01-16 16:09:10 +08:00
Reynold Xin c06a307ca2 Merge pull request #445 from kayousterhout/exec_lost
Fail rather than hanging if a task crashes the JVM.

Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefinitely rather than failing the job after maxFailures. Eventually,
this makes the job hang, because the Standalone Scheduler removes
the application after 10 works have failed, and then the app is left
in a state where it's disconnected from the master and waiting to reconnect.
This commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.
2014-01-15 23:47:25 -08:00
Kay Ousterhout 718a13c179 Updated unit test comment 2014-01-15 23:46:14 -08:00
Reynold Xin 84595ea3e2 Merge pull request #414 from soulmachine/code-style
Code clean up for mllib

* Removed unnecessary parentheses
* Removed unused imports
* Simplified `filter...size()` to `count ...`
* Removed obsoleted parameters' comments
2014-01-15 20:15:29 -08:00
CrazyJvm 8400536456 fix some format problem. 2014-01-16 11:57:46 +08:00
CrazyJvm 7a0c5b5a23 fix "set MASTER automatically fails" bug.
spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. 
The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
2014-01-16 11:45:02 +08:00
Reynold Xin 0675ca50f3 Merge pull request #439 from CrazyJvm/master
SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide

remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
2014-01-15 16:09:03 -08:00
Kay Ousterhout a268d63411 Fail rather than hanging if a task crashes the JVM.
Prior to this commit, if a task crashes the JVM, the task (and
all other tasks running on that executor) is marked at KILLED rather
than FAILED.  As a result, the TaskSetManager will retry the task
indefiniteily rather than failing the job after maxFailures. This
commit fixes that problem by marking tasks as FAILED rather than
killed when an executor is lost.

The downside of this commit is that if task A fails because another
task running on the same executor caused the VM to crash, the failure
will incorrectly be counted as a failure of task A. This should not
be an issue because we typically set maxFailures to 3, and it is
unlikely that a task will be co-located with a JVM-crashing task
multiple times.
2014-01-15 16:03:40 -08:00
Patrick Wendell 4f0c361b0e Merge pull request #444 from mateiz/py-version
Clarify that Python 2.7 is only needed for MLlib
2014-01-15 14:25:45 -08:00
Matei Zaharia 2ffdaefbcb Clarify that Python 2.7 is only needed for MLlib 2014-01-15 14:20:39 -08:00
Patrick Wendell 59f475c79f Merge pull request #442 from pwendell/standalone
Workers should use working directory as spark home if it's not specified

If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.
2014-01-15 13:55:14 -08:00
Patrick Wendell 2a05403a7c Merge pull request #443 from tdas/filestream-fix
Made some classes private[stremaing] and deprecated a method in JavaStreamingContext.

Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming].

Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.
2014-01-15 13:54:45 -08:00
Tathagata Das 9e6375349e Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. 2014-01-15 12:15:46 -08:00
Patrick Wendell 5fecd2516d Merge pull request #441 from pwendell/graphx-build
GraphX shouldn't list Spark as provided.

I noticed this when building an application against GraphX to audit the released artifacts.
2014-01-15 11:15:07 -08:00
Patrick Wendell 00a3f7eec5 Workers should use working directory as spark home if it's not specified 2014-01-15 11:05:36 -08:00
Patrick Wendell 9259d706be GraphX shouldn't list Spark as provided 2014-01-15 10:46:37 -08:00
Patrick Wendell 494d3c0774 Merge pull request #433 from markhamstra/debFix
Updated Debian packaging
2014-01-15 10:00:50 -08:00
Thomas Graves cef2af9c7d Merge pull request #366 from colorant/yarn-dev
More yarn code refactor

Try to retrive common code in yarn alpha/stable for  client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them.

Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way.

So just make it run for these two files firstly.
2014-01-15 10:06:17 -06:00
CrazyJvm 263933da97 remove "-XX:+UseCompressedStrings" option
remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
2014-01-15 22:26:15 +08:00