Commit graph

16 commits

Author SHA1 Message Date
Neville Li ebcd2d6889 Fix spark-submit path in spark-shell & pyspark
Author: Neville Li <neville@spotify.com>

Closes #812 from nevillelyh/neville/v1.0 and squashes the following commits:

0dc33ed [Neville Li] Fix spark-submit path in pyspark
becec64 [Neville Li] Fix spark-submit path in spark-shell
2014-05-18 13:37:46 -07:00
Andrew Or 4b8ec6fcfd [SPARK-1808] Route bin/pyspark through Spark submit
**Problem.** For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`.

**Solution.** Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent.

**Details.** `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest.

For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case.

This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too.

Author: Andrew Or <andrewor14@gmail.com>

Closes #799 from andrewor14/pyspark-submit and squashes the following commits:

bf37e36 [Andrew Or] Minor changes
01066fa [Andrew Or] bin/pyspark for Windows
c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes)
1866f85 [Andrew Or] Windows is not cooperating
456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set
7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
b7ba0d8 [Andrew Or] Address a few comments (minor)
06eb138 [Andrew Or] Use shlex instead of writing our own parser
05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly
6fba412 [Andrew Or] Deal with quotes + address various comments
fe4c8a7 [Andrew Or] Update --help for bin/pyspark
afe47bf [Andrew Or] Fix spark shell
f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit
a371d26 [Andrew Or] Route bin/pyspark through Spark submit
2014-05-16 22:34:38 -07:00
Patrick Wendell 949e393101 SPARK-1654 and SPARK-1653: Fixes in spark-submit.
Deals with two issues:
1. Spark shell didn't correctly pass quoted arguments to spark-submit.
```./bin/spark-shell --driver-java-options "-Dfoo=f -Dbar=b"```
2. Spark submit used deprecated environment variables (SPARK_CLASSPATH)
   which triggered warnings. Now we use new, more narrowly scoped,
   variables.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #576 from pwendell/spark-submit and squashes the following commits:

67004c9 [Patrick Wendell] SPARK-1654 and SPARK-1653: Fixes in spark-submit.
2014-04-28 17:29:22 -07:00
Patrick Wendell dc3b640a0a SPARK-1619 Launch spark-shell with spark-submit
This simplifies the shell a bunch and passes all arguments through to spark-submit.

There is a tiny incompatibility from 0.9.1 which is that you can't put `-c` _or_ `--cores`, only `--cores`. However, spark-submit will give a good error message in this case, I don't think many people used this, and it's a trivial change for users.

Author: Patrick Wendell <pwendell@gmail.com>

Closes #542 from pwendell/spark-shell and squashes the following commits:

9eb3e6f [Patrick Wendell] Updating Spark docs
b552459 [Patrick Wendell] Andrew's feedback
97720fa [Patrick Wendell] Review feedback
aa2900b [Patrick Wendell] SPARK-1619 Launch spark-shell with spark-submit
2014-04-24 23:59:16 -07:00
Aaron Davidson 0307db0f55 SPARK-1099: Introduce local[*] mode to infer number of cores
This is the default mode for running spark-shell and pyspark, intended to allow users running spark for the first time to see the performance benefits of using multiple cores, while not breaking backwards compatibility for users who use "local" mode and expect exactly 1 core.

Author: Aaron Davidson <aaron@databricks.com>

Closes #182 from aarondav/110 and squashes the following commits:

a88294c [Aaron Davidson] Rebased changes for new spark-shell
a9f393e [Aaron Davidson] SPARK-1099: Introduce local[*] mode to infer number of cores
2014-04-07 13:06:30 -07:00
Aaron Davidson 01cf4c402b SPARK-1404: Always upgrade spark-env.sh vars to environment vars
This was broken when spark-env.sh was made idempotent, as the idempotence check is an environment variable, but the spark-env.sh variables may not have been.

Tested in zsh, bash, and sh.

Author: Aaron Davidson <aaron@databricks.com>

Closes #310 from aarondav/SPARK-1404 and squashes the following commits:

c3406a5 [Aaron Davidson] Add extra export in spark-shell
6a0e340 [Aaron Davidson] SPARK-1404: Always upgrade spark-env.sh vars to environment vars
2014-04-04 09:50:24 -07:00
Bernardo Gomez Palacio fda86d8b46 [SPARK-1186] : Enrich the Spark Shell to support additional arguments.
Enrich the Spark Shell functionality to support the following options.

```
Usage: spark-shell [OPTIONS]

OPTIONS:
    -h  --help              : Print this help information.
    -c  --cores             : The maximum number of cores to be used by the Spark Shell.
    -em --executor-memory   : The memory used by each executor of the Spark Shell, the number
                              is followed by m for megabytes or g for gigabytes, e.g. "1g".
    -dm --driver-memory     : The memory used by the Spark Shell, the number is followed
                              by m for megabytes or g for gigabytes, e.g. "1g".
    -m  --master            : A full string that describes the Spark Master, defaults to "local"
                              e.g. "spark://localhost:7077".
    --log-conf              : Enables logging of the supplied SparkConf as INFO at start of the
                              Spark Context.

e.g.
    spark-shell -m spark://localhost:7077 -c 4 -dm 512m -em 2g
```

**Note**: this commit reflects the changes applied to _master_ based on [5d98cfc1].

[ticket: SPARK-1186] : Enrich the Spark Shell to support additional arguments.
                        https://spark-project.atlassian.net/browse/SPARK-1186

Author      : bernardo.gomezpalcio@gmail.com

Author: Bernardo Gomez Palacio <bernardo.gomezpalacio@gmail.com>

Closes #116 from berngp/feature/enrich-spark-shell and squashes the following commits:

c5f455f [Bernardo Gomez Palacio] [SPARK-1186] : Enrich the Spark Shell to support additional arguments.
2014-03-29 19:49:22 -07:00
Aaron Davidson 007a733434 SPARK-1286: Make usage of spark-env.sh idempotent
Various spark scripts load spark-env.sh. This can cause growth of any variables that may be appended to (SPARK_CLASSPATH, SPARK_REPL_OPTS) and it makes the precedence order for options specified in spark-env.sh less clear.

One use-case for the latter is that we want to set options from the command-line of spark-shell, but these options will be overridden by subsequent loading of spark-env.sh. If we were to load the spark-env.sh first and then set our command-line options, we could guarantee correct precedence order.

Note that we use SPARK_CONF_DIR if available to support the sbin/ scripts, which always set this variable from sbin/spark-config.sh. Otherwise, we default to the ../conf/ as usual.

Author: Aaron Davidson <aaron@databricks.com>

Closes #184 from aarondav/idem and squashes the following commits:

e291f91 [Aaron Davidson] Use "private" variables in load-spark-env.sh
8da8360 [Aaron Davidson] Add .sh extension to load-spark-env.sh
93a2471 [Aaron Davidson] SPARK-1286: Make usage of spark-env.sh idempotent
2014-03-24 22:24:21 -07:00
Aaron Davidson 52834d761b SPARK-929: Fully deprecate usage of SPARK_MEM
(Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615)

This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables:
SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY

The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public.

SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property.

SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory.

Other memory considerations:
- The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY.
- run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class).

This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however.

Author: Aaron Davidson <aaron@databricks.com>

Closes #99 from aarondav/sparkmem and squashes the following commits:

9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM
2014-03-09 11:08:39 -07:00
CodingCat e0d49ad229 [SPARK-1090] improvement on spark_shell (help information, configure memory)
https://spark-project.atlassian.net/browse/SPARK-1090

spark-shell should print help information about parameters and should allow user to configure exe memory
there is no document about hot to set --cores/-c in spark-shell

and also

users should be able to set executor memory through command line options

In this PR I also check the format of the options passed by the user

Author: CodingCat <zhunansjtu@gmail.com>

Closes #599 from CodingCat/spark_shell_improve and squashes the following commits:

de5aa38 [CodingCat] add parameter to set driver memory
915cbf8 [CodingCat] improvement on spark_shell (help information, configure memory)
2014-02-17 15:12:52 -08:00
CrazyJvm 8400536456 fix some format problem. 2014-01-16 11:57:46 +08:00
CrazyJvm 7a0c5b5a23 fix "set MASTER automatically fails" bug.
spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. 
The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
2014-01-16 11:45:02 +08:00
Prashant Sharma 74ba97fcf7 sbin/spark-class* -> bin/spark-class* 2014-01-03 15:08:01 +05:30
Prashant Sharma 980afd280a Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into spark-915-segregate-scripts
Conflicts:
	bin/spark-shell
	core/pom.xml
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala
	core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala
	core/src/test/scala/org/apache/spark/DriverSuite.scala
	python/run-tests
	sbin/compute-classpath.sh
	sbin/spark-class
	sbin/stop-slaves.sh
2014-01-02 17:55:21 +05:30
shane-huang 1d53792a0a add scripts in bin
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-23 16:13:46 +08:00
shane-huang 1d1a625800 moved user scripts to bin folder
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
2013-09-23 12:46:48 +08:00
Renamed from spark-shell (Browse further)