e9f2e34261
### What changes were proposed in this pull request?
This PR proposes to ask users if they want to download and install SparkR when they install SparkR from CRAN.
`SPARKR_ASK_INSTALLATION` environment variable was added in case other notebook projects are affected.
### Why are the changes needed?
This is required for CRAN. Currently SparkR is removed: https://cran.r-project.org/web/packages/SparkR/index.html.
See also https://lists.apache.org/thread.html/r02b9046273a518e347dfe85f864d23d63d3502c6c1edd33df17a3b86%40%3Cdev.spark.apache.org%3E
### Does this PR introduce _any_ user-facing change?
Yes, `sparkR.session(...)` will ask if users want to download and install Spark package or not if they are in the plain R shell or `Rscript`.
### How was this patch tested?
**R shell**
Valid input (`n`):
```
> sparkR.session(master="local")
Spark not found in SPARK_HOME:
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n
```
```
Error in sparkCheckInstall(sparkHome, master, deployMode) :
Please make sure Spark package is installed in this machine.
- If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME.
- If not, you may run install.spark function to do the job.
```
Invalid input:
```
> sparkR.session(master="local")
Spark not found in SPARK_HOME:
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc
```
```
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n):
```
Valid input (`y`):
```
> sparkR.session(master="local")
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): y
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
Preferred mirror site found: https://ftp.riken.jp/net/apache/spark
Downloading spark-3.3.0 for Hadoop 2.7 from:
- https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz
trying URL 'https://ftp.riken.jp/net/apache/spark/spark-3.3.0/spark-3.3.0-bin-hadoop2.7.tgz'
...
```
**Rscript**
```
cat tmp.R
```
```
library(SparkR, lib.loc = c(file.path(".", "R", "lib")))
sparkR.session(master="local")
```
```
Rscript tmp.R
```
Valid input (`n`):
```
Spark not found in SPARK_HOME:
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): n
```
```
Error in sparkCheckInstall(sparkHome, master, deployMode) :
Please make sure Spark package is installed in this machine.
- If there is one, set the path in sparkHome parameter or environment variable SPARK_HOME.
- If not, you may run install.spark function to do the job.
Calls: sparkR.session -> sparkCheckInstall
```
Invalid input:
```
Spark not found in SPARK_HOME:
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): abc
```
```
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n):
```
Valid input (`y`):
```
...
Spark not found in SPARK_HOME:
Will you download and install (or reuse if it exists) Spark package under the cache [/.../Caches/spark]? (y/n): y
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
Preferred mirror site found: https://ftp.riken.jp/net/apache/spark
Downloading spark-3.3.0 for Hadoop 2.7 from:
...
```
`bin/sparkR` and `bin/spark-submit *.R` are not affected (tested).
Closes #33887 from HyukjinKwon/SPARK-36631.
Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit
|
||
---|---|---|
.. | ||
inst | ||
R | ||
src-native | ||
tests | ||
vignettes | ||
.lintr | ||
.Rbuildignore | ||
DESCRIPTION | ||
NAMESPACE |