spark-instrumented-optimizer/docs/hadoop-provided.md
Marcelo Vanzin 9cbdf31ec1 [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:

49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
2015-06-11 15:29:03 -07:00

1 KiB

layout displayTitle title
global Using Spark's "Hadoop Free" Build Using Spark's "Hadoop Free" Build

Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify SPARK_DIST_CLASSPATH to include Hadoop's package jars. The most convenient place to do this is by adding an entry in conf/spark-env.sh.

This page describes how to connect Spark to Hadoop for different types of distributions.

Apache Hadoop

For Apache distributions, you can use Hadoop's 'classpath' command. For instance:

{% highlight bash %}

in conf/spark-env.sh

If 'hadoop' binary is on your PATH

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

With explicit path to 'hadoop' binary

export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

Passing a Hadoop configuration directory

export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

{% endhighlight %}