8b25f62bf1
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #6766 from vanzin/SPARK-6511 and squashes the following commits:
49f0f67 [Marcelo Vanzin] [SPARK-6511] [docs] Fix example command in hadoop-provided docs.
(cherry picked from commit 9cbdf31ec1
)
Signed-off-by: Reynold Xin <rxin@databricks.com>
27 lines
1 KiB
Markdown
27 lines
1 KiB
Markdown
---
|
|
layout: global
|
|
displayTitle: Using Spark's "Hadoop Free" Build
|
|
title: Using Spark's "Hadoop Free" Build
|
|
---
|
|
|
|
Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify `SPARK_DIST_CLASSPATH` to include Hadoop's package jars. The most convenient place to do this is by adding an entry in `conf/spark-env.sh`.
|
|
|
|
This page describes how to connect Spark to Hadoop for different types of distributions.
|
|
|
|
# Apache Hadoop
|
|
For Apache distributions, you can use Hadoop's 'classpath' command. For instance:
|
|
|
|
{% highlight bash %}
|
|
### in conf/spark-env.sh ###
|
|
|
|
# If 'hadoop' binary is on your PATH
|
|
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
|
|
|
|
# With explicit path to 'hadoop' binary
|
|
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
|
|
|
|
# Passing a Hadoop configuration directory
|
|
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
|
|
|
|
{% endhighlight %}
|