[SPARK-17445][DOCS] Reference an ASF page as the main place to find third-party packages

## What changes were proposed in this pull request?

Point references to spark-packages.org to https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects

This will be accompanied by a parallel change to the spark-website repo, and additional changes to this wiki.

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #15075 from srowen/SPARK-17445.
This commit is contained in:
Sean Owen 2016-09-14 10:10:16 +01:00
parent 4cea9da2ae
commit dc0a4c9161
9 changed files with 18 additions and 19 deletions

View file

@ -6,7 +6,7 @@ It lists steps that are required before creating a PR. In particular, consider:
- Is the change important and ready enough to ask the community to spend time reviewing? - Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests? - Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a package on http://spark-packages.org ? - Is this a new feature that can stand alone as a [third party project](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects) ?
- Is the change being proposed clearly explained and motivated? - Is the change being proposed clearly explained and motivated?
When you contribute code, you affirm that the contribution is your original work and that you When you contribute code, you affirm that the contribution is your original work and that you

View file

@ -100,7 +100,7 @@ sparkR.stop <- function() {
#' @param sparkEnvir Named list of environment variables to set on worker nodes #' @param sparkEnvir Named list of environment variables to set on worker nodes
#' @param sparkExecutorEnv Named list of environment variables to be used when launching executors #' @param sparkExecutorEnv Named list of environment variables to be used when launching executors
#' @param sparkJars Character vector of jar files to pass to the worker nodes #' @param sparkJars Character vector of jar files to pass to the worker nodes
#' @param sparkPackages Character vector of packages from spark-packages.org #' @param sparkPackages Character vector of package coordinates
#' @seealso \link{sparkR.session} #' @seealso \link{sparkR.session}
#' @rdname sparkR.init-deprecated #' @rdname sparkR.init-deprecated
#' @export #' @export
@ -327,7 +327,7 @@ sparkRHive.init <- function(jsc = NULL) {
#' @param sparkHome Spark Home directory. #' @param sparkHome Spark Home directory.
#' @param sparkConfig named list of Spark configuration to set on worker nodes. #' @param sparkConfig named list of Spark configuration to set on worker nodes.
#' @param sparkJars character vector of jar files to pass to the worker nodes. #' @param sparkJars character vector of jar files to pass to the worker nodes.
#' @param sparkPackages character vector of packages from spark-packages.org #' @param sparkPackages character vector of package coordinates
#' @param enableHiveSupport enable support for Hive, fallback if not built with Hive support; once #' @param enableHiveSupport enable support for Hive, fallback if not built with Hive support; once
#' set, this cannot be turned off on an existing session #' set, this cannot be turned off on an existing session
#' @param ... named Spark properties passed to the method. #' @param ... named Spark properties passed to the method.

View file

@ -114,7 +114,7 @@
<li class="divider"></li> <li class="divider"></li>
<li><a href="building-spark.html">Building Spark</a></li> <li><a href="building-spark.html">Building Spark</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li> <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects">Supplemental Projects</a></li> <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects">Third Party Projects</a></li>
</ul> </ul>
</li> </li>
</ul> </ul>

View file

@ -120,7 +120,7 @@ options for deployment:
* [OpenStack Swift](storage-openstack-swift.html) * [OpenStack Swift](storage-openstack-swift.html)
* [Building Spark](building-spark.html): build Spark using the Maven system * [Building Spark](building-spark.html): build Spark using the Maven system
* [Contributing to Spark](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) * [Contributing to Spark](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark)
* [Supplemental Projects](https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects): related third party Spark projects * [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects): related third party Spark projects
**External Resources:** **External Resources:**

View file

@ -110,7 +110,8 @@ head(df)
SparkR supports operating on a variety of data sources through the `SparkDataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources. SparkR supports operating on a variety of data sources through the `SparkDataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources.
The general method for creating SparkDataFrames from data sources is `read.df`. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically. SparkR supports reading JSON, CSV and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by The general method for creating SparkDataFrames from data sources is `read.df`. This method takes in the path for the file to load and the type of data source, and the currently active SparkSession will be used automatically.
SparkR supports reading JSON, CSV and Parquet files natively, and through packages available from sources like [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects), you can find data source connectors for popular file formats like Avro. These packages can either be added by
specifying `--packages` with `spark-submit` or `sparkR` commands, or if initializing SparkSession with `sparkPackages` parameter when in an interactive R shell or from RStudio. specifying `--packages` with `spark-submit` or `sparkR` commands, or if initializing SparkSession with `sparkPackages` parameter when in an interactive R shell or from RStudio.
<div data-lang="r" markdown="1"> <div data-lang="r" markdown="1">

View file

@ -2382,7 +2382,7 @@ additional effort may be necessary to achieve exactly-once semantics. There are
- [Kafka Integration Guide](streaming-kafka-integration.html) - [Kafka Integration Guide](streaming-kafka-integration.html)
- [Kinesis Integration Guide](streaming-kinesis-integration.html) - [Kinesis Integration Guide](streaming-kinesis-integration.html)
- [Custom Receiver Guide](streaming-custom-receivers.html) - [Custom Receiver Guide](streaming-custom-receivers.html)
* Third-party DStream data sources can be found in [Spark Packages](https://spark-packages.org/) * Third-party DStream data sources can be found in [Third Party Projects](https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects)
* API documentation * API documentation
- Scala docs - Scala docs
* [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and * [StreamingContext](api/scala/index.html#org.apache.spark.streaming.StreamingContext) and

View file

@ -142,12 +142,13 @@ case class DataSource(
} else if (provider.toLowerCase == "avro" || } else if (provider.toLowerCase == "avro" ||
provider == "com.databricks.spark.avro") { provider == "com.databricks.spark.avro") {
throw new AnalysisException( throw new AnalysisException(
s"Failed to find data source: ${provider.toLowerCase}. Please use Spark " + s"Failed to find data source: ${provider.toLowerCase}. Please find an Avro " +
"package http://spark-packages.org/package/databricks/spark-avro") "package at " +
"https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects")
} else { } else {
throw new ClassNotFoundException( throw new ClassNotFoundException(
s"Failed to find data source: $provider. Please find packages at " + s"Failed to find data source: $provider. Please find packages at " +
"http://spark-packages.org", "https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects",
error) error)
} }
} }

View file

@ -1645,21 +1645,18 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
e = intercept[AnalysisException] { e = intercept[AnalysisException] {
sql(s"select id from `com.databricks.spark.avro`.`file_path`") sql(s"select id from `com.databricks.spark.avro`.`file_path`")
} }
assert(e.message.contains("Failed to find data source: com.databricks.spark.avro. " + assert(e.message.contains("Failed to find data source: com.databricks.spark.avro."))
"Please use Spark package http://spark-packages.org/package/databricks/spark-avro"))
// data source type is case insensitive // data source type is case insensitive
e = intercept[AnalysisException] { e = intercept[AnalysisException] {
sql(s"select id from Avro.`file_path`") sql(s"select id from Avro.`file_path`")
} }
assert(e.message.contains("Failed to find data source: avro. Please use Spark package " + assert(e.message.contains("Failed to find data source: avro."))
"http://spark-packages.org/package/databricks/spark-avro"))
e = intercept[AnalysisException] { e = intercept[AnalysisException] {
sql(s"select id from avro.`file_path`") sql(s"select id from avro.`file_path`")
} }
assert(e.message.contains("Failed to find data source: avro. Please use Spark package " + assert(e.message.contains("Failed to find data source: avro."))
"http://spark-packages.org/package/databricks/spark-avro"))
e = intercept[AnalysisException] { e = intercept[AnalysisException] {
sql(s"select id from `org.apache.spark.sql.sources.HadoopFsRelationProvider`.`file_path`") sql(s"select id from `org.apache.spark.sql.sources.HadoopFsRelationProvider`.`file_path`")

View file

@ -74,16 +74,16 @@ class ResolvedDataSourceSuite extends SparkFunSuite {
val error1 = intercept[AnalysisException] { val error1 = intercept[AnalysisException] {
getProvidingClass("avro") getProvidingClass("avro")
} }
assert(error1.getMessage.contains("spark-packages")) assert(error1.getMessage.contains("Failed to find data source: avro."))
val error2 = intercept[AnalysisException] { val error2 = intercept[AnalysisException] {
getProvidingClass("com.databricks.spark.avro") getProvidingClass("com.databricks.spark.avro")
} }
assert(error2.getMessage.contains("spark-packages")) assert(error2.getMessage.contains("Failed to find data source: com.databricks.spark.avro."))
val error3 = intercept[ClassNotFoundException] { val error3 = intercept[ClassNotFoundException] {
getProvidingClass("asfdwefasdfasdf") getProvidingClass("asfdwefasdfasdf")
} }
assert(error3.getMessage.contains("spark-packages")) assert(error3.getMessage.contains("Failed to find data source: asfdwefasdfasdf."))
} }
} }