acb9715779
## What changes were proposed in this pull request? When running SparkR job in yarn-cluster mode, it will download Spark package from apache website which is not necessary. ``` ./bin/spark-submit --master yarn-cluster ./examples/src/main/r/dataframe.R ``` The following is output: ``` Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var, window The following objects are masked from ‘package:base’: as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind, sample, startsWith, subset, summary, transform, union Spark not found in SPARK_HOME: Spark not found in the cache directory. Installation will start. MirrorUrl not provided. Looking for preferred site from apache website... ...... ``` There's no ```SPARK_HOME``` in yarn-cluster mode since the R process is in a remote host of the yarn cluster rather than in the client host. The JVM comes up first and the R process then connects to it. So in such cases we should never have to download Spark as Spark is already running. ## How was this patch tested? Offline test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15888 from yanboliang/spark-18444. |
||
---|---|---|
.. | ||
jarTest.R | ||
packageInAJarTest.R | ||
test_binary_function.R | ||
test_binaryFile.R | ||
test_broadcast.R | ||
test_client.R | ||
test_context.R | ||
test_includePackage.R | ||
test_jvm_api.R | ||
test_mllib.R | ||
test_parallelize_collect.R | ||
test_rdd.R | ||
test_Serde.R | ||
test_shuffle.R | ||
test_sparkR.R | ||
test_sparkSQL.R | ||
test_take.R | ||
test_textFile.R | ||
test_utils.R | ||
test_Windows.R |