01cc852982
### What changes were proposed in this pull request? Change the link to the Scala API document. ``` $ git grep "#org.apache.spark.package" docs/_layouts/global.html: <li><a href="api/scala/index.html#org.apache.spark.package">Scala</a></li> docs/index.md:* [Spark Scala API (Scaladoc)](api/scala/index.html#org.apache.spark.package) docs/rdd-programming-guide.md:[Scala](api/scala/#org.apache.spark.package), [Java](api/java/), [Python](api/python/) and [R](api/R/). ``` ### Why are the changes needed? The home page link for Scala API document is incorrect after upgrade to 3.0 ### Does this PR introduce any user-facing change? Document UI change only. ### How was this patch tested? Local test, attach screenshots below: Before: ![image](https://user-images.githubusercontent.com/4833765/74335713-c2385300-4dd7-11ea-95d8-f5a3639d2578.png) After: ![image](https://user-images.githubusercontent.com/4833765/74335727-cbc1bb00-4dd7-11ea-89d9-4dcc1310e679.png) Closes #27549 from xuanyuanking/scala-doc. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
97 lines
3.7 KiB
Markdown
97 lines
3.7 KiB
Markdown
---
|
|
layout: global
|
|
title: JSON Files
|
|
displayTitle: JSON Files
|
|
license: |
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
---
|
|
|
|
<div class="codetabs">
|
|
|
|
<div data-lang="scala" markdown="1">
|
|
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
|
|
This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
|
|
or a JSON file.
|
|
|
|
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
|
|
line must contain a separate, self-contained valid JSON object. For more information, please see
|
|
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
|
|
|
|
For a regular multi-line JSON file, set the `multiLine` option to `true`.
|
|
|
|
{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
|
|
</div>
|
|
|
|
<div data-lang="java" markdown="1">
|
|
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
|
|
This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
|
|
or a JSON file.
|
|
|
|
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
|
|
line must contain a separate, self-contained valid JSON object. For more information, please see
|
|
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
|
|
|
|
For a regular multi-line JSON file, set the `multiLine` option to `true`.
|
|
|
|
{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
|
|
</div>
|
|
|
|
<div data-lang="python" markdown="1">
|
|
Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame.
|
|
This conversion can be done using `SparkSession.read.json` on a JSON file.
|
|
|
|
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
|
|
line must contain a separate, self-contained valid JSON object. For more information, please see
|
|
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
|
|
|
|
For a regular multi-line JSON file, set the `multiLine` parameter to `True`.
|
|
|
|
{% include_example json_dataset python/sql/datasource.py %}
|
|
</div>
|
|
|
|
<div data-lang="r" markdown="1">
|
|
Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using
|
|
the `read.json()` function, which loads data from a directory of JSON files where each line of the
|
|
files is a JSON object.
|
|
|
|
Note that the file that is offered as _a json file_ is not a typical JSON file. Each
|
|
line must contain a separate, self-contained valid JSON object. For more information, please see
|
|
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).
|
|
|
|
For a regular multi-line JSON file, set a named parameter `multiLine` to `TRUE`.
|
|
|
|
{% include_example json_dataset r/RSparkSQLExample.R %}
|
|
|
|
</div>
|
|
|
|
<div data-lang="sql" markdown="1">
|
|
|
|
{% highlight sql %}
|
|
|
|
CREATE TEMPORARY VIEW jsonTable
|
|
USING org.apache.spark.sql.json
|
|
OPTIONS (
|
|
path "examples/src/main/resources/people.json"
|
|
)
|
|
|
|
SELECT * FROM jsonTable
|
|
|
|
{% endhighlight %}
|
|
|
|
</div>
|
|
|
|
</div>
|