Commit graph

4 commits

Author SHA1 Message Date
Andrew Ray 1995417696 [SPARK-20769][DOC] Incorrect documentation for using Jupyter notebook
## What changes were proposed in this pull request?

SPARK-13973 incorrectly removed the required PYSPARK_DRIVER_PYTHON_OPTS=notebook from documentation to use pyspark with Jupyter notebook. This patch corrects the documentation error.

## How was this patch tested?

Tested invocation locally with
```bash
PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook ./bin/pyspark
```

Author: Andrew Ray <ray.andrew@gmail.com>

Closes #18001 from aray/patch-1.
2017-05-17 10:06:01 +01:00
Steve Loughran 2cf83c4783 [SPARK-7481][BUILD] Add spark-hadoop-cloud module to pull in object store access.
## What changes were proposed in this pull request?

Add a new `spark-hadoop-cloud` module and maven profile to pull in object store support from `hadoop-openstack`, `hadoop-aws` and `hadoop-azure` (Hadoop 2.7+) JARs, along with their dependencies, fixing up the dependencies so that everything works, in particular Jackson.

It restores `s3n://` access to S3, adds its `s3a://` replacement, OpenStack `swift://` and azure `wasb://`.

There's a documentation page, `cloud_integration.md`, which covers the basic details of using Spark with object stores, referring the reader to the supplier's own documentation, with specific warnings on security and the possible mismatch between a store's behavior and that of a filesystem. In particular, users are advised be very cautious when trying to use an object store as the destination of data, and to consult the documentation of the storage supplier and the connector.

(this is the successor to #12004; I can't re-open it)

## How was this patch tested?

Downstream tests exist in [https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples](https://github.com/steveloughran/spark-cloud-examples/tree/master/cloud-examples)

Those verify that the dependencies are sufficient to allow downstream applications to work with s3a, azure wasb and swift storage connectors, and perform basic IO & dataframe operations thereon. All seems well.

Manually clean build & verify that assembly contains the relevant aws-* hadoop-* artifacts on Hadoop 2.6; azure on a hadoop-2.7 profile.

SBT build: `build/sbt -Phadoop-cloud -Phadoop-2.7 package`
maven build `mvn install -Phadoop-cloud -Phadoop-2.7`

This PR *does not* update `dev/deps/spark-deps-hadoop-2.7` or `dev/deps/spark-deps-hadoop-2.6`, because unless the hadoop-cloud profile is enabled, no extra JARs show up in the dependency list. The dependency check in Jenkins isn't setting the property, so the new JARs aren't visible.

Author: Steve Loughran <stevel@apache.org>
Author: Steve Loughran <stevel@hortonworks.com>

Closes #17834 from steveloughran/cloud/SPARK-7481-current.
2017-05-07 10:15:31 +01:00
Yong Tang 8f0490e22b [SPARK-17979][SPARK-14453] Remove deprecated SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS
This fix removes deprecated support for config `SPARK_YARN_USER_ENV`, as is mentioned in SPARK-17979.
This fix also removes deprecated support for the following:
```
SPARK_YARN_USER_ENV
SPARK_JAVA_OPTS
SPARK_CLASSPATH
SPARK_WORKER_INSTANCES
```

Related JIRA:
[SPARK-14453]: https://issues.apache.org/jira/browse/SPARK-14453
[SPARK-12344]: https://issues.apache.org/jira/browse/SPARK-12344
[SPARK-15781]: https://issues.apache.org/jira/browse/SPARK-15781

Existing tests should pass.

Author: Yong Tang <yong.tang.github@outlook.com>

Closes #17212 from yongtang/SPARK-17979.
2017-03-10 13:34:01 -08:00
Wenchen Fan d69aeeaff4 [SPARK-19516][DOC] update public doc to use SparkSession instead of SparkContext
## What changes were proposed in this pull request?

After Spark 2.0, `SparkSession` becomes the new entry point for Spark applications. We should update the public documents to reflect this.

## How was this patch tested?

N/A

Author: Wenchen Fan <wenchen@databricks.com>

Closes #16856 from cloud-fan/doc.
2017-03-07 11:32:36 -08:00
Renamed from docs/programming-guide.md (Browse further)