spark-instrumented-optimizer/docs/sql-data-sources-json.md

---
layout: global
title: JSON Files
displayTitle: JSON Files
---

<div class="codetabs">

<div data-lang="scala"  markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
or a JSON file.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
line must contain a separate, self-contained valid JSON object. For more information, please see
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).

For a regular multi-line JSON file, set the `multiLine` option to `true`.

{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
</div>

<div data-lang="java"  markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
or a JSON file.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
line must contain a separate, self-contained valid JSON object. For more information, please see
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).

For a regular multi-line JSON file, set the `multiLine` option to `true`.

{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
</div>

<div data-lang="python"  markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame.
This conversion can be done using `SparkSession.read.json` on a JSON file.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
line must contain a separate, self-contained valid JSON object. For more information, please see
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).

For a regular multi-line JSON file, set the `multiLine` parameter to `True`.

{% include_example json_dataset python/sql/datasource.py %}
</div>

<div data-lang="r"  markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using
the `read.json()` function, which loads data from a directory of JSON files where each line of the
files is a JSON object.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
line must contain a separate, self-contained valid JSON object. For more information, please see
[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).

For a regular multi-line JSON file, set a named parameter `multiLine` to `TRUE`.

{% include_example json_dataset r/RSparkSQLExample.R %}

</div>

<div data-lang="sql"  markdown="1">

{% highlight sql %}

CREATE TEMPORARY VIEW jsonTable
USING org.apache.spark.sql.json
OPTIONS (
  path "examples/src/main/resources/people.json"
)

SELECT * FROM jsonTable

{% endhighlight %}

</div>

</div>
[SPARK-24499][SQL][DOC] Split the page of sql-programming-guide.html to multiple separate pages ## What changes were proposed in this pull request? 1. Split the main page of sql-programming-guide into 7 parts: - Getting Started - Data Sources - Performance Turing - Distributed SQL Engine - PySpark Usage Guide for Pandas with Apache Arrow - Migration Guide - Reference 2. Add left menu for sql-programming-guide, keep first level index for each part in the menu. ![image](https://user-images.githubusercontent.com/4833765/47016859-6332e180-d183-11e8-92e8-ce62518a83c4.png) ## How was this patch tested? Local test with jekyll build/serve. Closes #22746 from xuanyuanking/SPARK-24499. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com> 2018-10-18 14:59:06 -04:00			`---`
			`layout: global`
			`title: JSON Files`
			`displayTitle: JSON Files`
			`---`

			`<div class="codetabs">`

			`<div data-lang="scala" markdown="1">`
			Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
			This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
			`or a JSON file.`

			`Note that the file that is offered as _a json file_ is not a typical JSON file. Each`
			`line must contain a separate, self-contained valid JSON object. For more information, please see`
			`[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).`

			For a regular multi-line JSON file, set the `multiLine` option to `true`.

			`{% include_example json_dataset scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}`
			`</div>`

			`<div data-lang="java" markdown="1">`
			Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
			This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
			`or a JSON file.`

			`Note that the file that is offered as _a json file_ is not a typical JSON file. Each`
			`line must contain a separate, self-contained valid JSON object. For more information, please see`
			`[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).`

			For a regular multi-line JSON file, set the `multiLine` option to `true`.

			`{% include_example json_dataset java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}`
			`</div>`

			`<div data-lang="python" markdown="1">`
			`Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame.`
			This conversion can be done using `SparkSession.read.json` on a JSON file.

			`Note that the file that is offered as _a json file_ is not a typical JSON file. Each`
			`line must contain a separate, self-contained valid JSON object. For more information, please see`
			`[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).`

			For a regular multi-line JSON file, set the `multiLine` parameter to `True`.

			`{% include_example json_dataset python/sql/datasource.py %}`
			`</div>`

			`<div data-lang="r" markdown="1">`
			`Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using`
			the `read.json()` function, which loads data from a directory of JSON files where each line of the
			`files is a JSON object.`

			`Note that the file that is offered as _a json file_ is not a typical JSON file. Each`
			`line must contain a separate, self-contained valid JSON object. For more information, please see`
			`[JSON Lines text format, also called newline-delimited JSON](http://jsonlines.org/).`

			For a regular multi-line JSON file, set a named parameter `multiLine` to `TRUE`.

			`{% include_example json_dataset r/RSparkSQLExample.R %}`

			`</div>`

			`<div data-lang="sql" markdown="1">`

			`{% highlight sql %}`

			`CREATE TEMPORARY VIEW jsonTable`
			`USING org.apache.spark.sql.json`
			`OPTIONS (`
			`path "examples/src/main/resources/people.json"`
			`)`

			`SELECT * FROM jsonTable`

			`{% endhighlight %}`

			`</div>`

			`</div>`