[SPARK-13521][BUILD] Remove reference to Tachyon in cluster & release scripts
## What changes were proposed in this pull request? We provide a very limited set of cluster management script in Spark for Tachyon, although Tachyon itself provides a much better version of it. Given now Spark users can simply use Tachyon as a normal file system and does not require extensive configurations, we can remove this management capabilities to simplify Spark bash scripts. Note that this also reduces coupling between a 3rd party external system and Spark's release scripts, and would eliminate possibility for failures such as Tachyon being renamed or the tar balls being relocated. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #11400 from rxin/release-script.
This commit is contained in:
parent
f77dc4e1e2
commit
59e3e10be2
|
@ -929,30 +929,6 @@ Apart from these, the following properties are also available, and may be useful
|
|||
mapping has high overhead for blocks close to or below the page size of the operating system.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.externalBlockStore.blockManager</code></td>
|
||||
<td>org.apache.spark.storage.TachyonBlockManager</td>
|
||||
<td>
|
||||
Implementation of external block manager (file system) that store RDDs. The file system's URL is set by
|
||||
<code>spark.externalBlockStore.url</code>.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.externalBlockStore.baseDir</code></td>
|
||||
<td>System.getProperty("java.io.tmpdir")</td>
|
||||
<td>
|
||||
Directories of the external block store that store RDDs. The file system's URL is set by
|
||||
<code>spark.externalBlockStore.url</code> It can also be a comma-separated list of multiple
|
||||
directories on Tachyon file system.
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>spark.externalBlockStore.url</code></td>
|
||||
<td>tachyon://localhost:19998 for Tachyon</td>
|
||||
<td>
|
||||
The URL of the underlying external blocker file system in the external block store.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
#### Networking
|
||||
|
|
|
@ -54,8 +54,7 @@ an application to gain back cores on one node when it has work to do. To use thi
|
|||
|
||||
Note that none of the modes currently provide memory sharing across applications. If you would like to share
|
||||
data this way, we recommend running a single server application that can serve multiple requests by querying
|
||||
the same RDDs. In future releases, in-memory storage systems such as [Tachyon](http://tachyon-project.org) will
|
||||
provide another approach to share RDDs.
|
||||
the same RDDs.
|
||||
|
||||
## Dynamic Resource Allocation
|
||||
|
||||
|
|
|
@ -1177,7 +1177,7 @@ that originally created it.
|
|||
|
||||
In addition, each persisted RDD can be stored using a different *storage level*, allowing you, for example,
|
||||
to persist the dataset on disk, persist it in memory but as serialized Java objects (to save space),
|
||||
replicate it across nodes, or store it off-heap in [Tachyon](http://tachyon-project.org/).
|
||||
replicate it across nodes.
|
||||
These levels are set by passing a
|
||||
`StorageLevel` object ([Scala](api/scala/index.html#org.apache.spark.storage.StorageLevel),
|
||||
[Java](api/java/index.html?org/apache/spark/storage/StorageLevel.html),
|
||||
|
@ -1218,24 +1218,11 @@ storage levels is:
|
|||
<td> MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc. </td>
|
||||
<td> Same as the levels above, but replicate each partition on two cluster nodes. </td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td> OFF_HEAP (experimental) </td>
|
||||
<td> Store RDD in serialized format in <a href="http://tachyon-project.org">Tachyon</a>.
|
||||
Compared to MEMORY_ONLY_SER, OFF_HEAP reduces garbage collection overhead and allows executors
|
||||
to be smaller and to share a pool of memory, making it attractive in environments with
|
||||
large heaps or multiple concurrent applications. Furthermore, as the RDDs reside in Tachyon,
|
||||
the crash of an executor does not lead to losing the in-memory cache. In this mode, the memory
|
||||
in Tachyon is discardable. Thus, Tachyon does not attempt to reconstruct a block that it evicts
|
||||
from memory. If you plan to use Tachyon as the off heap store, Spark is compatible with Tachyon
|
||||
out-of-the-box. Please refer to this <a href="http://tachyon-project.org/master/Running-Spark-on-Tachyon.html">page</a>
|
||||
for the suggested version pairings.
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
**Note:** *In Python, stored objects will always be serialized with the [Pickle](https://docs.python.org/2/library/pickle.html) library,
|
||||
so it does not matter whether you choose a serialized level. The available storage levels in Python include `MEMORY_ONLY`, `MEMORY_ONLY_2`,
|
||||
`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, `DISK_ONLY_2` and `OFF_HEAP`.*
|
||||
`MEMORY_AND_DISK`, `MEMORY_AND_DISK_2`, `DISK_ONLY`, and `DISK_ONLY_2`.*
|
||||
|
||||
Spark also automatically persists some intermediate data in shuffle operations (e.g. `reduceByKey`), even without users calling `persist`. This is done to avoid recomputing the entire input if a node fails during the shuffle. We still recommend users call `persist` on the resulting RDD if they plan to reuse it.
|
||||
|
||||
|
@ -1259,11 +1246,6 @@ requests from a web application). *All* the storage levels provide full fault to
|
|||
recomputing lost data, but the replicated ones let you continue running tasks on the RDD without
|
||||
waiting to recompute a lost partition.
|
||||
|
||||
* In environments with high amounts of memory or multiple applications, the experimental `OFF_HEAP`
|
||||
mode has several advantages:
|
||||
* It allows multiple executors to share the same pool of memory in Tachyon.
|
||||
* It significantly reduces garbage collection costs.
|
||||
* Cached data is not lost if individual executors crash.
|
||||
|
||||
### Removing Data
|
||||
|
||||
|
|
|
@ -32,11 +32,6 @@ set -x
|
|||
SPARK_HOME="$(cd "`dirname "$0"`"; pwd)"
|
||||
DISTDIR="$SPARK_HOME/dist"
|
||||
|
||||
SPARK_TACHYON=false
|
||||
TACHYON_VERSION="0.8.2"
|
||||
TACHYON_TGZ="tachyon-${TACHYON_VERSION}-bin.tar.gz"
|
||||
TACHYON_URL="http://tachyon-project.org/downloads/files/${TACHYON_VERSION}/${TACHYON_TGZ}"
|
||||
|
||||
MAKE_TGZ=false
|
||||
NAME=none
|
||||
MVN="$SPARK_HOME/build/mvn"
|
||||
|
@ -45,7 +40,7 @@ function exit_with_usage {
|
|||
echo "make-distribution.sh - tool for making binary distributions of Spark"
|
||||
echo ""
|
||||
echo "usage:"
|
||||
cl_options="[--name] [--tgz] [--mvn <mvn-command>] [--with-tachyon]"
|
||||
cl_options="[--name] [--tgz] [--mvn <mvn-command>]"
|
||||
echo "./make-distribution.sh $cl_options <maven build options>"
|
||||
echo "See Spark's \"Building Spark\" doc for correct Maven options."
|
||||
echo ""
|
||||
|
@ -69,9 +64,6 @@ while (( "$#" )); do
|
|||
echo "Error: '--with-hive' is no longer supported, use Maven options -Phive and -Phive-thriftserver"
|
||||
exit_with_usage
|
||||
;;
|
||||
--with-tachyon)
|
||||
SPARK_TACHYON=true
|
||||
;;
|
||||
--tgz)
|
||||
MAKE_TGZ=true
|
||||
;;
|
||||
|
@ -150,12 +142,6 @@ else
|
|||
echo "Making distribution for Spark $VERSION in $DISTDIR..."
|
||||
fi
|
||||
|
||||
if [ "$SPARK_TACHYON" == "true" ]; then
|
||||
echo "Tachyon Enabled"
|
||||
else
|
||||
echo "Tachyon Disabled"
|
||||
fi
|
||||
|
||||
# Build uber fat JAR
|
||||
cd "$SPARK_HOME"
|
||||
|
||||
|
@ -219,40 +205,6 @@ if [ -d "$SPARK_HOME"/R/lib/SparkR ]; then
|
|||
cp "$SPARK_HOME/R/lib/sparkr.zip" "$DISTDIR"/R/lib
|
||||
fi
|
||||
|
||||
# Download and copy in tachyon, if requested
|
||||
if [ "$SPARK_TACHYON" == "true" ]; then
|
||||
TMPD=`mktemp -d 2>/dev/null || mktemp -d -t 'disttmp'`
|
||||
|
||||
pushd "$TMPD" > /dev/null
|
||||
echo "Fetching tachyon tgz"
|
||||
|
||||
TACHYON_DL="${TACHYON_TGZ}.part"
|
||||
if [ $(command -v curl) ]; then
|
||||
curl --silent -k -L "${TACHYON_URL}" > "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
|
||||
elif [ $(command -v wget) ]; then
|
||||
wget --quiet "${TACHYON_URL}" -O "${TACHYON_DL}" && mv "${TACHYON_DL}" "${TACHYON_TGZ}"
|
||||
else
|
||||
printf "You do not have curl or wget installed. please install Tachyon manually.\n"
|
||||
exit -1
|
||||
fi
|
||||
|
||||
tar xzf "${TACHYON_TGZ}"
|
||||
cp "tachyon-${TACHYON_VERSION}/assembly/target/tachyon-assemblies-${TACHYON_VERSION}-jar-with-dependencies.jar" "$DISTDIR/lib"
|
||||
mkdir -p "$DISTDIR/tachyon/src/main/java/tachyon/web"
|
||||
cp -r "tachyon-${TACHYON_VERSION}"/{bin,conf,libexec} "$DISTDIR/tachyon"
|
||||
cp -r "tachyon-${TACHYON_VERSION}"/servers/src/main/java/tachyon/web "$DISTDIR/tachyon/src/main/java/tachyon/web"
|
||||
|
||||
if [[ `uname -a` == Darwin* ]]; then
|
||||
# need to run sed differently on osx
|
||||
nl=$'\n'; sed -i "" -e "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\\$nl export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
|
||||
else
|
||||
sed -i "s|export TACHYON_JAR=\$TACHYON_HOME/target/\(.*\)|# This is set for spark's make-distribution\n export TACHYON_JAR=\$TACHYON_HOME/../lib/\1|" "$DISTDIR/tachyon/libexec/tachyon-config.sh"
|
||||
fi
|
||||
|
||||
popd > /dev/null
|
||||
rm -rf "$TMPD"
|
||||
fi
|
||||
|
||||
if [ "$MAKE_TGZ" == "true" ]; then
|
||||
TARDIR_NAME=spark-$VERSION-bin-$NAME
|
||||
TARDIR="$SPARK_HOME/$TARDIR_NAME"
|
||||
|
|
|
@ -25,22 +25,11 @@ if [ -z "${SPARK_HOME}" ]; then
|
|||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
|
||||
fi
|
||||
|
||||
TACHYON_STR=""
|
||||
|
||||
while (( "$#" )); do
|
||||
case $1 in
|
||||
--with-tachyon)
|
||||
TACHYON_STR="--with-tachyon"
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
# Load the Spark configuration
|
||||
. "${SPARK_HOME}/sbin/spark-config.sh"
|
||||
|
||||
# Start Master
|
||||
"${SPARK_HOME}/sbin"/start-master.sh $TACHYON_STR
|
||||
"${SPARK_HOME}/sbin"/start-master.sh
|
||||
|
||||
# Start Workers
|
||||
"${SPARK_HOME}/sbin"/start-slaves.sh $TACHYON_STR
|
||||
"${SPARK_HOME}/sbin"/start-slaves.sh
|
||||
|
|
|
@ -39,21 +39,6 @@ fi
|
|||
|
||||
ORIGINAL_ARGS="$@"
|
||||
|
||||
START_TACHYON=false
|
||||
|
||||
while (( "$#" )); do
|
||||
case $1 in
|
||||
--with-tachyon)
|
||||
if [ ! -e "${SPARK_HOME}"/tachyon/bin/tachyon ]; then
|
||||
echo "Error: --with-tachyon specified, but tachyon not found."
|
||||
exit -1
|
||||
fi
|
||||
START_TACHYON=true
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
. "${SPARK_HOME}/sbin/spark-config.sh"
|
||||
|
||||
. "${SPARK_HOME}/bin/load-spark-env.sh"
|
||||
|
@ -73,9 +58,3 @@ fi
|
|||
"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
|
||||
--ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \
|
||||
$ORIGINAL_ARGS
|
||||
|
||||
if [ "$START_TACHYON" == "true" ]; then
|
||||
"${SPARK_HOME}"/tachyon/bin/tachyon bootstrap-conf $SPARK_MASTER_IP
|
||||
"${SPARK_HOME}"/tachyon/bin/tachyon format -s
|
||||
"${SPARK_HOME}"/tachyon/bin/tachyon-start.sh master
|
||||
fi
|
||||
|
|
|
@ -23,21 +23,6 @@ if [ -z "${SPARK_HOME}" ]; then
|
|||
export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
|
||||
fi
|
||||
|
||||
START_TACHYON=false
|
||||
|
||||
while (( "$#" )); do
|
||||
case $1 in
|
||||
--with-tachyon)
|
||||
if [ ! -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
|
||||
echo "Error: --with-tachyon specified, but tachyon not found."
|
||||
exit -1
|
||||
fi
|
||||
START_TACHYON=true
|
||||
;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
. "${SPARK_HOME}/sbin/spark-config.sh"
|
||||
. "${SPARK_HOME}/bin/load-spark-env.sh"
|
||||
|
||||
|
@ -50,12 +35,5 @@ if [ "$SPARK_MASTER_IP" = "" ]; then
|
|||
SPARK_MASTER_IP="`hostname`"
|
||||
fi
|
||||
|
||||
if [ "$START_TACHYON" == "true" ]; then
|
||||
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon bootstrap-conf "$SPARK_MASTER_IP"
|
||||
|
||||
# set -t so we can call sudo
|
||||
SPARK_SSH_OPTS="-o StrictHostKeyChecking=no -t" "${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/tachyon/bin/tachyon-start.sh" worker SudoMount \; sleep 1
|
||||
fi
|
||||
|
||||
# Launch the slaves
|
||||
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin/start-slave.sh" "spark://$SPARK_MASTER_IP:$SPARK_MASTER_PORT"
|
||||
|
|
|
@ -26,7 +26,3 @@ fi
|
|||
. "${SPARK_HOME}/sbin/spark-config.sh"
|
||||
|
||||
"${SPARK_HOME}/sbin"/spark-daemon.sh stop org.apache.spark.deploy.master.Master 1
|
||||
|
||||
if [ -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
|
||||
"${SPARK_HOME}/sbin"/../tachyon/bin/tachyon killAll tachyon.master.Master
|
||||
fi
|
||||
|
|
|
@ -25,9 +25,4 @@ fi
|
|||
|
||||
. "${SPARK_HOME}/bin/load-spark-env.sh"
|
||||
|
||||
# do before the below calls as they exec
|
||||
if [ -e "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon ]; then
|
||||
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/../tachyon/bin/tachyon killAll tachyon.worker.Worker
|
||||
fi
|
||||
|
||||
"${SPARK_HOME}/sbin/slaves.sh" cd "${SPARK_HOME}" \; "${SPARK_HOME}/sbin"/stop-slave.sh
|
||||
|
|
Loading…
Reference in a new issue