pyspark -> bin/pyspark
This commit is contained in:
parent
94b7a7fe37
commit
a3f90a2ecf
|
@ -21,7 +21,7 @@ Once you've built Spark, the easiest way to start using it is the shell:
|
|||
|
||||
./bin/spark-shell
|
||||
|
||||
Or, for the Python API, the Python shell (`./pyspark`).
|
||||
Or, for the Python API, the Python shell (`./bin/pyspark`).
|
||||
|
||||
Spark also comes with several sample programs in the `examples` directory.
|
||||
To run one of them, use `./bin/run-example <class> <params>`. For example:
|
||||
|
|
|
@ -35,7 +35,7 @@ or `local` to run locally with one thread, or `local[N]` to run locally with N t
|
|||
`local` for testing.
|
||||
|
||||
Finally, you can run Spark interactively through modified versions of the Scala shell (`./bin/spark-shell`) or
|
||||
Python interpreter (`./pyspark`). These are a great way to learn the framework.
|
||||
Python interpreter (`./bin/pyspark`). These are a great way to learn the framework.
|
||||
|
||||
# Launching on a Cluster
|
||||
|
||||
|
|
|
@ -47,7 +47,7 @@ PySpark will automatically ship these functions to workers, along with any objec
|
|||
Instances of classes will be serialized and shipped to workers by PySpark, but classes themselves cannot be automatically distributed to workers.
|
||||
The [Standalone Use](#standalone-use) section describes how to ship code dependencies to workers.
|
||||
|
||||
In addition, PySpark fully supports interactive use---simply run `./pyspark` to launch an interactive shell.
|
||||
In addition, PySpark fully supports interactive use---simply run `./bin/pyspark` to launch an interactive shell.
|
||||
|
||||
|
||||
# Installing and Configuring PySpark
|
||||
|
@ -60,17 +60,17 @@ By default, PySpark requires `python` to be available on the system `PATH` and u
|
|||
|
||||
All of PySpark's library dependencies, including [Py4J](http://py4j.sourceforge.net/), are bundled with PySpark and automatically imported.
|
||||
|
||||
Standalone PySpark applications should be run using the `pyspark` script, which automatically configures the Java and Python environment using the settings in `conf/spark-env.sh` or `.cmd`.
|
||||
The script automatically adds the `pyspark` package to the `PYTHONPATH`.
|
||||
Standalone PySpark applications should be run using the `bin/pyspark` script, which automatically configures the Java and Python environment using the settings in `conf/spark-env.sh` or `.cmd`.
|
||||
The script automatically adds the `bin/pyspark` package to the `PYTHONPATH`.
|
||||
|
||||
|
||||
# Interactive Use
|
||||
|
||||
The `pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:
|
||||
The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:
|
||||
|
||||
{% highlight bash %}
|
||||
$ sbt/sbt assembly
|
||||
$ ./pyspark
|
||||
$ ./bin/pyspark
|
||||
{% endhighlight %}
|
||||
|
||||
The Python shell can be used explore data interactively and is a simple way to learn the API:
|
||||
|
@ -82,35 +82,35 @@ The Python shell can be used explore data interactively and is a simple way to l
|
|||
>>> help(pyspark) # Show all pyspark functions
|
||||
{% endhighlight %}
|
||||
|
||||
By default, the `pyspark` shell creates SparkContext that runs applications locally on a single core.
|
||||
By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on a single core.
|
||||
To connect to a non-local cluster, or use multiple cores, set the `MASTER` environment variable.
|
||||
For example, to use the `pyspark` shell with a [standalone Spark cluster](spark-standalone.html):
|
||||
For example, to use the `bin/pyspark` shell with a [standalone Spark cluster](spark-standalone.html):
|
||||
|
||||
{% highlight bash %}
|
||||
$ MASTER=spark://IP:PORT ./pyspark
|
||||
$ MASTER=spark://IP:PORT ./bin/pyspark
|
||||
{% endhighlight %}
|
||||
|
||||
Or, to use four cores on the local machine:
|
||||
|
||||
{% highlight bash %}
|
||||
$ MASTER=local[4] ./pyspark
|
||||
$ MASTER=local[4] ./bin/pyspark
|
||||
{% endhighlight %}
|
||||
|
||||
|
||||
## IPython
|
||||
|
||||
It is also possible to launch PySpark in [IPython](http://ipython.org), the enhanced Python interpreter.
|
||||
To do this, set the `IPYTHON` variable to `1` when running `pyspark`:
|
||||
To do this, set the `IPYTHON` variable to `1` when running `bin/pyspark`:
|
||||
|
||||
{% highlight bash %}
|
||||
$ IPYTHON=1 ./pyspark
|
||||
$ IPYTHON=1 ./bin/pyspark
|
||||
{% endhighlight %}
|
||||
|
||||
Alternatively, you can customize the `ipython` command by setting `IPYTHON_OPTS`. For example, to launch
|
||||
the [IPython Notebook](http://ipython.org/notebook.html) with PyLab graphing support:
|
||||
|
||||
{% highlight bash %}
|
||||
$ IPYTHON_OPTS="notebook --pylab inline" ./pyspark
|
||||
$ IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark
|
||||
{% endhighlight %}
|
||||
|
||||
IPython also works on a cluster or on multiple cores if you set the `MASTER` environment variable.
|
||||
|
@ -118,7 +118,7 @@ IPython also works on a cluster or on multiple cores if you set the `MASTER` env
|
|||
|
||||
# Standalone Programs
|
||||
|
||||
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `pyspark`.
|
||||
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `bin/pyspark`.
|
||||
The Quick Start guide includes a [complete example](quick-start.html#a-standalone-app-in-python) of a standalone Python application.
|
||||
|
||||
Code dependencies can be deployed by listing them in the `pyFiles` option in the SparkContext constructor:
|
||||
|
@ -153,6 +153,6 @@ Many of the methods also contain [doctests](http://docs.python.org/2/library/doc
|
|||
PySpark also includes several sample programs in the [`python/examples` folder](https://github.com/apache/incubator-spark/tree/master/python/examples).
|
||||
You can run them by passing the files to `pyspark`; e.g.:
|
||||
|
||||
./pyspark python/examples/wordcount.py
|
||||
./bin/pyspark python/examples/wordcount.py
|
||||
|
||||
Each program prints usage help when run without arguments.
|
||||
|
|
|
@ -277,11 +277,11 @@ We can pass Python functions to Spark, which are automatically serialized along
|
|||
For applications that use custom classes or third-party libraries, we can add those code dependencies to SparkContext to ensure that they will be available on remote machines; this is described in more detail in the [Python programming guide](python-programming-guide.html).
|
||||
`SimpleApp` is simple enough that we do not need to specify any code dependencies.
|
||||
|
||||
We can run this application using the `pyspark` script:
|
||||
We can run this application using the `bin/pyspark` script:
|
||||
|
||||
{% highlight python %}
|
||||
$ cd $SPARK_HOME
|
||||
$ ./pyspark SimpleApp.py
|
||||
$ ./bin/pyspark SimpleApp.py
|
||||
...
|
||||
Lines with a: 46, Lines with b: 23
|
||||
{% endhighlight python %}
|
||||
|
|
|
@ -47,7 +47,7 @@ print "Spark context available as sc."
|
|||
if add_files != None:
|
||||
print "Adding files: [%s]" % ", ".join(add_files)
|
||||
|
||||
# The ./pyspark script stores the old PYTHONSTARTUP value in OLD_PYTHONSTARTUP,
|
||||
# The ./bin/pyspark script stores the old PYTHONSTARTUP value in OLD_PYTHONSTARTUP,
|
||||
# which allows us to execute the user's PYTHONSTARTUP file:
|
||||
_pythonstartup = os.environ.get('OLD_PYTHONSTARTUP')
|
||||
if _pythonstartup and os.path.isfile(_pythonstartup):
|
||||
|
|
Loading…
Reference in a new issue