spark-instrumented-optimizer/bin/load-spark-env.sh

66 lines
2.4 KiB
Bash
Raw Permalink Normal View History

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# This script loads spark-env.sh if it exists, and ensures it is only loaded once.
# spark-env.sh is loaded from SPARK_CONF_DIR if set, or within the current directory's
# conf/ subdirectory.
# Figure out where Spark is installed
if [ -z "${SPARK_HOME}" ]; then
[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed ## What changes were proposed in this pull request? This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129). Done: - pip installable on conda [manual tested] - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested] - Automated testing of this (virtualenv) - packaging and signing with release-build* Possible follow up work: - release-build update to publish to PyPI (SPARK-18128) - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?) - Windows support and or testing ( SPARK-18136 ) - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test - consider how we want to number our dev/snapshot versions Explicitly out of scope: - Using pip installed PySpark to start a standalone cluster - Using pip installed PySpark for non-Python Spark programs *I've done some work to test release-build locally but as a non-committer I've just done local testing. ## How was this patch tested? Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration. release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites) Author: Holden Karau <holden@us.ibm.com> Author: Juliet Hougland <juliet@cloudera.com> Author: Juliet Hougland <not@myemail.com> Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
2016-11-16 17:22:15 -05:00
source "$(dirname "$0")"/find-spark-home
fi
2018-11-20 09:29:59 -05:00
SPARK_ENV_SH="spark-env.sh"
if [ -z "$SPARK_ENV_LOADED" ]; then
export SPARK_ENV_LOADED=1
[SPARK-22466][SPARK SUBMIT] export SPARK_CONF_DIR while conf is default ## What changes were proposed in this pull request? We use SPARK_CONF_DIR to switch spark conf directory and can be visited if we explicitly export it in spark-env.sh, but with default settings, it can't be done. This PR export SPARK_CONF_DIR while it is default. ### Before ``` KentKentsMacBookPro  ~/Documents/spark-packages/spark-2.3.0-SNAPSHOT-bin-master  bin/spark-shell --master local Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 17/11/08 10:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/11/08 10:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. Spark context Web UI available at http://169.254.168.63:4041 Spark context available as 'sc' (master = local, app id = local-1510108125770). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65) Type in expressions to have them evaluated. Type :help for more information. scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = None ``` ### After ``` scala> sys.env.get("SPARK_CONF_DIR") res0: Option[String] = Some(/Users/Kent/Documents/spark/conf) ``` ## How was this patch tested? vanzin Author: Kent Yao <yaooqinn@hotmail.com> Closes #19688 from yaooqinn/SPARK-22466.
2017-11-09 00:33:08 -05:00
export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"
2018-11-20 09:29:59 -05:00
SPARK_ENV_SH="${SPARK_CONF_DIR}/${SPARK_ENV_SH}"
if [[ -f "${SPARK_ENV_SH}" ]]; then
# Promote all variable declarations to environment (exported) variables
set -a
2018-11-20 09:29:59 -05:00
. ${SPARK_ENV_SH}
set +a
fi
fi
Support cross building for Scala 2.11 Let's give this another go using a version of Hive that shades its JLine dependency. Author: Prashant Sharma <prashant.s@imaginea.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #3159 from pwendell/scala-2.11-prashant and squashes the following commits: e93aa3e [Patrick Wendell] Restoring -Phive-thriftserver profile and cleaning up build script. f65d17d [Patrick Wendell] Fixing build issue due to merge conflict a8c41eb [Patrick Wendell] Reverting dev/run-tests back to master state. 7a6eb18 [Patrick Wendell] Merge remote-tracking branch 'apache/master' into scala-2.11-prashant 583aa07 [Prashant Sharma] REVERT ME: removed hive thirftserver 3680e58 [Prashant Sharma] Revert "REVERT ME: Temporarily removing some Cli tests." 935fb47 [Prashant Sharma] Revert "Fixed by disabling a few tests temporarily." 925e90f [Prashant Sharma] Fixed by disabling a few tests temporarily. 2fffed3 [Prashant Sharma] Exclude groovy from sbt build, and also provide a way for such instances in future. 8bd4e40 [Prashant Sharma] Switched to gmaven plus, it fixes random failures observer with its predecessor gmaven. 5272ce5 [Prashant Sharma] SPARK_SCALA_VERSION related bugs. 2121071 [Patrick Wendell] Migrating version detection to PySpark b1ed44d [Patrick Wendell] REVERT ME: Temporarily removing some Cli tests. 1743a73 [Patrick Wendell] Removing decimal test that doesn't work with Scala 2.11 f5cad4e [Patrick Wendell] Add Scala 2.11 docs 210d7e1 [Patrick Wendell] Revert "Testing new Hive version with shaded jline" 48518ce [Patrick Wendell] Remove association of Hive and Thriftserver profiles. e9d0a06 [Patrick Wendell] Revert "Enable thritfserver for Scala 2.10 only" 67ec364 [Patrick Wendell] Guard building of thriftserver around Scala 2.10 check 8502c23 [Patrick Wendell] Enable thritfserver for Scala 2.10 only e22b104 [Patrick Wendell] Small fix in pom file ec402ab [Patrick Wendell] Various fixes 0be5a9d [Patrick Wendell] Testing new Hive version with shaded jline 4eaec65 [Prashant Sharma] Changed scripts to ignore target. 5167bea [Prashant Sharma] small correction a4fcac6 [Prashant Sharma] Run against scala 2.11 on jenkins. 80285f4 [Prashant Sharma] MAven equivalent of setting spark.executor.extraClasspath during tests. 034b369 [Prashant Sharma] Setting test jars on executor classpath during tests from sbt. d4874cb [Prashant Sharma] Fixed Python Runner suite. null check should be first case in scala 2.11. 6f50f13 [Prashant Sharma] Fixed build after rebasing with master. We should use ${scala.binary.version} instead of just 2.10 e56ca9d [Prashant Sharma] Print an error if build for 2.10 and 2.11 is spotted. 937c0b8 [Prashant Sharma] SCALA_VERSION -> SPARK_SCALA_VERSION cb059b0 [Prashant Sharma] Code review 0476e5e [Prashant Sharma] Scala 2.11 support with repl and all build changes.
2014-11-12 00:36:48 -05:00
# Setting SPARK_SCALA_VERSION if not already set.
[SPARK-32434][CORE] Support Scala 2.13 in AbstractCommandBuilder and load-spark-env scripts ### What changes were proposed in this pull request? This PR aims to support Scala 2.11 at `AbstractCommandBuilder.java` and `load-spark-env` scripts. ### Why are the changes needed? Currently, Scala 2.12 is only supported and the following fails. ``` $ dev/change-scala-version.sh 2.13 $ build/mvn test -pl core --am -Pscala-2.13 -DwildcardSuites=none -Dtest=org.apache.spark.launcher.SparkLauncherSuite ... [ERROR] Failures: [ERROR] SparkLauncherSuite.testChildProcLauncher:123 expected:<0> but was:<1> [ERROR] SparkLauncherSuite.testSparkLauncherGetError:274 [ERROR] Tests run: 6, Failures: 2, Errors: 0, Skipped: 0 ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This should be tested manually with the above command. ``` [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 2.186 s] [INFO] Spark Project Tags ................................. SUCCESS [ 4.400 s] [INFO] Spark Project Local DB ............................. SUCCESS [ 1.744 s] [INFO] Spark Project Networking ........................... SUCCESS [ 2.233 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 1.527 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 5.564 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 1.946 s] [INFO] Spark Project Core ................................. SUCCESS [01:21 min] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:41 min [INFO] Finished at: 2020-07-24T20:04:34-07:00 [INFO] ------------------------------------------------------------------------ ``` Closes #29227 from dongjoon-hyun/SPARK-32434. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-07-25 11:19:02 -04:00
if [ -z "$SPARK_SCALA_VERSION" ]; then
SCALA_VERSION_1=2.13
SCALA_VERSION_2=2.12
ASSEMBLY_DIR_1="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_1}"
ASSEMBLY_DIR_2="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_2}"
ENV_VARIABLE_DOC="https://spark.apache.org/docs/latest/configuration.html#environment-variables"
if [[ -d "$ASSEMBLY_DIR_1" && -d "$ASSEMBLY_DIR_2" ]]; then
echo "Presence of build for multiple Scala versions detected ($ASSEMBLY_DIR_1 and $ASSEMBLY_DIR_2)." 1>&2
echo "Remove one of them or, export SPARK_SCALA_VERSION=$SCALA_VERSION_1 in ${SPARK_ENV_SH}." 1>&2
echo "Visit ${ENV_VARIABLE_DOC} for more details about setting environment variables in spark-env.sh." 1>&2
exit 1
fi
if [[ -d "$ASSEMBLY_DIR_1" ]]; then
export SPARK_SCALA_VERSION=${SCALA_VERSION_1}
else
export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
fi
fi