spark-instrumented-optimizer/bin/pyspark2.cmd

@echo off

rem
rem Licensed to the Apache Software Foundation (ASF) under one or more
rem contributor license agreements.  See the NOTICE file distributed with
rem this work for additional information regarding copyright ownership.
rem The ASF licenses this file to You under the Apache License, Version 2.0
rem (the "License"); you may not use this file except in compliance with
rem the License.  You may obtain a copy of the License at
rem
rem    http://www.apache.org/licenses/LICENSE-2.0
rem
rem Unless required by applicable law or agreed to in writing, software
rem distributed under the License is distributed on an "AS IS" BASIS,
rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
rem See the License for the specific language governing permissions and
rem limitations under the License.
rem

set SCALA_VERSION=2.10

rem Figure out where the Spark framework is installed
set FWDIR=%~dp0..\

rem Export this as SPARK_HOME
set SPARK_HOME=%FWDIR%

rem Test whether the user has built Spark
if exist "%FWDIR%RELEASE" goto skip_build_test
set FOUND_JAR=0
for %%d in ("%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar") do (
  set FOUND_JAR=1
)
if [%FOUND_JAR%] == [0] (
  echo Failed to find Spark assembly JAR.
  echo You need to build Spark before running this program.
  goto exit
)
:skip_build_test

rem Load environment variables from conf\spark-env.cmd, if it exists
if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"

rem Figure out which Python to use.
if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python

set PYTHONPATH=%FWDIR%python;%PYTHONPATH%
set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%

set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%
set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py
set PYSPARK_SUBMIT_ARGS=%*

echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH%

rem Check whether the argument is a file
for /f %%i in ('echo %1^| findstr /R "\.py"') do (
  set PYTHON_FILE=%%i
)

if [%PYTHON_FILE%] == [] (
  set PYSPARK_SHELL=1
  if [%IPYTHON%] == [1] (
	ipython %IPYTHON_OPTS%
  ) else (
	%PYSPARK_PYTHON%
  ) 
) else (
  echo.
  echo WARNING: Running python applications through ./bin/pyspark.cmd is deprecated as of Spark 1.0.
  echo Use ./bin/spark-submit ^<python file^>
  echo.
  "%FWDIR%\bin\spark-submit.cmd" %PYSPARK_SUBMIT_ARGS%
)

:exit
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00			`@echo off`

			`rem`
			`rem Licensed to the Apache Software Foundation (ASF) under one or more`
			`rem contributor license agreements. See the NOTICE file distributed with`
			`rem this work for additional information regarding copyright ownership.`
			`rem The ASF licenses this file to You under the Apache License, Version 2.0`
			`rem (the "License"); you may not use this file except in compliance with`
			`rem the License. You may obtain a copy of the License at`
			`rem`
			`rem http://www.apache.org/licenses/LICENSE-2.0`
			`rem`
			`rem Unless required by applicable law or agreed to in writing, software`
			`rem distributed under the License is distributed on an "AS IS" BASIS,`
			`rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`rem See the License for the specific language governing permissions and`
			`rem limitations under the License.`
			`rem`

version changed 2.9.3 -> 2.10 in shell script. 2013-09-15 03:17:20 -04:00			`set SCALA_VERSION=2.10`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00
			`rem Figure out where the Spark framework is installed`
add scripts in bin Signed-off-by: shane-huang <shengsheng.huang@intel.com> 2013-09-23 04:13:46 -04:00			`set FWDIR=%~dp0..\`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00
			`rem Export this as SPARK_HOME`
			`set SPARK_HOME=%FWDIR%`

			`rem Test whether the user has built Spark`
			`if exist "%FWDIR%RELEASE" goto skip_build_test`
			`set FOUND_JAR=0`
			`for %%d in ("%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assemblyhadoop.jar") do (`
			`set FOUND_JAR=1`
			`)`
[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit 2014-05-17 01:34:38 -04:00			`if [%FOUND_JAR%] == [0] (`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00			`echo Failed to find Spark assembly JAR.`
[SPARK-3775] Not suitable error message in spark-shell.cmd Modified some sentence of error message in bin\*.cmd. Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp> Closes #2640 from tsudukim/feature/SPARK-3775 and squashes the following commits: 3458afb [Masayoshi TSUZUKI] [SPARK-3775] Not suitable error message in spark-shell.cmd 2014-10-03 16:09:48 -04:00			`echo You need to build Spark before running this program.`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00			`goto exit`
			`)`
			`:skip_build_test`

			`rem Load environment variables from conf\spark-env.cmd, if it exists`
			`if exist "%FWDIR%conf\spark-env.cmd" call "%FWDIR%conf\spark-env.cmd"`

			`rem Figure out which Python to use.`
[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit 2014-05-17 01:34:38 -04:00			`if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00
			`set PYTHONPATH=%FWDIR%python;%PYTHONPATH%`
[SPARK-2305] [PySpark] Update Py4J to version 0.8.2.1 Author: Josh Rosen <joshrosen@apache.org> Closes #1626 from JoshRosen/SPARK-2305 and squashes the following commits: 03fb283 [Josh Rosen] Update Py4J to version 0.8.2.1. 2014-07-29 22:02:06 -04:00			`set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00
			`set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%`
			`set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py`
[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit 2014-05-17 01:34:38 -04:00			`set PYSPARK_SUBMIT_ARGS=%*`
Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00
			`echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH%`

[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit 2014-05-17 01:34:38 -04:00			`rem Check whether the argument is a file`
			`for /f %%i in ('echo %1^\| findstr /R "\.py"') do (`
			`set PYTHON_FILE=%%i`
			`)`

			`if [%PYTHON_FILE%] == [] (`
[SPARK-4065] Add check for IPython on Windows This issue employs logic similar to the bash launcher (pyspark) to check if IPTYHON=1, and if so launch ipython with options in IPYTHON_OPTS. This fix assumes that ipython is available in the system Path, and can be invoked with a plain "ipython" command. Author: Michael Griffiths <msjgriffiths@gmail.com> Closes #2910 from msjgriffiths/pyspark-windows and squashes the following commits: ef34678 [Michael Griffiths] Change build message to comply with [SPARK-3775] 361e3d8 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows 9ce72d1 [Michael Griffiths] [SPARK-4065] Add check for IPython on Windows 2014-10-28 15:47:21 -04:00			`set PYSPARK_SHELL=1`
			`if [%IPYTHON%] == [1] (`
			`ipython %IPYTHON_OPTS%`
			`) else (`
			`%PYSPARK_PYTHON%`
			`)`
[SPARK-1808] Route bin/pyspark through Spark submit Problem. For `bin/pyspark`, there is currently no other way to specify Spark configuration properties other than through `SPARK_JAVA_OPTS` in `conf/spark-env.sh`. However, this mechanism is supposedly deprecated. Instead, it needs to pick up configurations explicitly specified in `conf/spark-defaults.conf`. Solution. Have `bin/pyspark` invoke `bin/spark-submit`, like all of its counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This has the additional benefit of making the invocation of all the user facing Spark scripts consistent. Details. `bin/pyspark` inherently handles two cases: (1) running python applications and (2) running the python shell. For (1), Spark submit already handles running python applications. For cases in which `bin/pyspark` is given a python file, we can simply call pass the file directly to Spark submit and let it handle the rest. For case (2), `bin/pyspark` starts a python process as before, which launches the JVM as a sub-process. The existing code already provides a code path to do this. All we needed to change is to use `bin/spark-submit` instead of `spark-class` to launch the JVM. This requires modifications to Spark submit to handle the pyspark shell as a special case. This has been tested locally (OSX and Windows 7), on a standalone cluster, and on a YARN cluster. Running IPython also works as before, except now it takes in Spark submit arguments too. Author: Andrew Or <andrewor14@gmail.com> Closes #799 from andrewor14/pyspark-submit and squashes the following commits: bf37e36 [Andrew Or] Minor changes 01066fa [Andrew Or] bin/pyspark for Windows c8cb3bf [Andrew Or] Handle perverse app names (with escaped quotes) 1866f85 [Andrew Or] Windows is not cooperating 456d844 [Andrew Or] Guard against shlex hanging if PYSPARK_SUBMIT_ARGS is not set 7eebda8 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit b7ba0d8 [Andrew Or] Address a few comments (minor) 06eb138 [Andrew Or] Use shlex instead of writing our own parser 05879fa [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a823661 [Andrew Or] Fix --die-on-broken-pipe not propagated properly 6fba412 [Andrew Or] Deal with quotes + address various comments fe4c8a7 [Andrew Or] Update --help for bin/pyspark afe47bf [Andrew Or] Fix spark shell f04aaa4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into pyspark-submit a371d26 [Andrew Or] Route bin/pyspark through Spark submit 2014-05-17 01:34:38 -04:00			`) else (`
			`echo.`
			`echo WARNING: Running python applications through ./bin/pyspark.cmd is deprecated as of Spark 1.0.`
			`echo Use ./bin/spark-submit ^<python file^>`
			`echo.`
			`"%FWDIR%\bin\spark-submit.cmd" %PYSPARK_SUBMIT_ARGS%`
			`)`

Further fixes to get PySpark to work on Windows 2013-09-01 21:19:29 -04:00			`:exit`