spark-instrumented-optimizer/docs/hadoop-provided.md

---
layout: global
displayTitle: Using Spark's "Hadoop Free" Build
title: Using Spark's "Hadoop Free" Build
---

Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify `SPARK_DIST_CLASSPATH` to include Hadoop's package jars. The most convenient place to do this is by adding an entry in `conf/spark-env.sh`.

This page describes how to connect Spark to Hadoop for different types of distributions.

# Apache Hadoop
For Apache distributions, you can use Hadoop's 'classpath' command. For instance:

{% highlight bash %}
### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

{% endhighlight %}
[SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided builds This provides preliminary documentation pointing out how to use the Hadoop free builds. I am hoping over time this list can grow to include most of the popular Hadoop distributions. Getting more people using these builds will help us long term reduce the number of binaries we build. Author: Patrick Wendell <patrick@databricks.com> Closes #6729 from pwendell/hadoop-provided and squashes the following commits: 1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds (cherry picked from commit 6e4fb0c9e8f03cf068c422777cfce82a89e8e738) Signed-off-by: Patrick Wendell <patrick@databricks.com> 2015-06-09 19:14:21 -04:00			`---`
			`layout: global`
			`displayTitle: Using Spark's "Hadoop Free" Build`
			`title: Using Spark's "Hadoop Free" Build`
			`---`

			Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify `SPARK_DIST_CLASSPATH` to include Hadoop's package jars. The most convenient place to do this is by adding an entry in `conf/spark-env.sh`.

			`This page describes how to connect Spark to Hadoop for different types of distributions.`

			`# Apache Hadoop`
			`For Apache distributions, you can use Hadoop's 'classpath' command. For instance:`

			`{% highlight bash %}`
			`### in conf/spark-env.sh ###`

			`# If 'hadoop' binary is on your PATH`
			`export SPARK_DIST_CLASSPATH=$(hadoop classpath)`

			`# With explicit path to 'hadoop' binary`
			`export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)`

			`# Passing a Hadoop configuration directory`
[SPARK-6511] [docs] Fix example command in hadoop-provided docs. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6766 from vanzin/SPARK-6511 and squashes the following commits: 49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs. (cherry picked from commit 9cbdf31ec1399d4d43a1863c15688ce78b6dfd92) Signed-off-by: Reynold Xin <rxin@databricks.com> 2015-06-11 18:29:03 -04:00			`export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)`
[SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided builds This provides preliminary documentation pointing out how to use the Hadoop free builds. I am hoping over time this list can grow to include most of the popular Hadoop distributions. Getting more people using these builds will help us long term reduce the number of binaries we build. Author: Patrick Wendell <patrick@databricks.com> Closes #6729 from pwendell/hadoop-provided and squashes the following commits: 1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds (cherry picked from commit 6e4fb0c9e8f03cf068c422777cfce82a89e8e738) Signed-off-by: Patrick Wendell <patrick@databricks.com> 2015-06-09 19:14:21 -04:00
			`{% endhighlight %}`