[SPARK-34437][SQL][DOCS] Update Spark SQL guide about the rebasing DS options and SQL configs
### What changes were proposed in this pull request? In the PR, I propose to update the Spark SQL guide about the SQL configs that are related to datetime rebasing: - spark.sql.parquet.int96RebaseModeInWrite - spark.sql.parquet.datetimeRebaseModeInWrite - spark.sql.parquet.int96RebaseModeInRead - spark.sql.parquet.datetimeRebaseModeInRead - spark.sql.avro.datetimeRebaseModeInWrite - spark.sql.avro.datetimeRebaseModeInRead Parquet options added by #31489: - datetimeRebaseMode - int96RebaseMode and Avro options added by #31529: - datetimeRebaseMode <img width="998" alt="Screenshot 2021-02-17 at 21 42 09" src="https://user-images.githubusercontent.com/1580697/108252043-3afb8900-7169-11eb-8568-511e21fa7f78.png"> ### Why are the changes needed? To inform users about supported DS options and SQL configs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By generating the doc and manually checking: ``` $ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch ``` Closes #31564 from MaxGekk/doc-rebase-options. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
7b549c3e53
commit
b58f0976a9
|
@ -283,6 +283,19 @@ Data source options of Avro can be set via:
|
||||||
</td>
|
</td>
|
||||||
<td>function <code>from_avro</code></td>
|
<td>function <code>from_avro</code></td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>datetimeRebaseMode</code></td>
|
||||||
|
<td>The SQL config <code>spark.sql.avro</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
|
||||||
|
<td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
|
||||||
|
Currently supported modes are:
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
|
||||||
|
<li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>read and function <code>from_avro</code></td>
|
||||||
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
@ -318,6 +331,31 @@ Configuration of Avro can be done using the `setConf` method on SparkSession or
|
||||||
</td>
|
</td>
|
||||||
<td>2.4.0</td>
|
<td>2.4.0</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.avro.datetimeRebaseModeInRead</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Avro files.</li>
|
||||||
|
</ul>
|
||||||
|
This config is only effective if the writer info (like Spark, Hive) of the Avro files is unknown.
|
||||||
|
</td>
|
||||||
|
<td>3.0.0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.avro.datetimeRebaseModeInWrite</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Avro files.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>3.0.0</td>
|
||||||
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
## Compatibility with Databricks spark-avro
|
## Compatibility with Databricks spark-avro
|
||||||
|
|
|
@ -252,6 +252,42 @@ REFRESH TABLE my_table;
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
## Data Source Option
|
||||||
|
|
||||||
|
Data source options of Parquet can be set via:
|
||||||
|
* the `.option`/`.options` methods of `DataFrameReader` or `DataFrameWriter`
|
||||||
|
* the `.option`/`.options` methods of `DataStreamReader` or `DataStreamWriter`
|
||||||
|
|
||||||
|
<table class="table">
|
||||||
|
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>datetimeRebaseMode</code></td>
|
||||||
|
<td>The SQL config <code>spark.sql.parquet</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
|
||||||
|
<td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
|
||||||
|
Currently supported modes are:
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
|
||||||
|
<li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>read</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td><code>int96RebaseMode</code></td>
|
||||||
|
<td>The SQL config <code>spark.sql.parquet</code> <code>.int96RebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
|
||||||
|
<td>The <code>int96RebaseMode</code> option allows to specify the rebasing mode for INT96 timestamps from the Julian to Proleptic Gregorian calendar.<br>
|
||||||
|
Currently supported modes are:
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: fails in reads of ancient INT96 timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: loads INT96 timestamps without rebasing.</li>
|
||||||
|
<li><code>LEGACY</code>: performs rebasing of ancient timestamps from the Julian to Proleptic Gregorian calendar.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>read</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
Configuration of Parquet can be done using the `setConf` method on `SparkSession` or by running
|
Configuration of Parquet can be done using the `setConf` method on `SparkSession` or by running
|
||||||
|
@ -329,4 +365,54 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
|
||||||
</td>
|
</td>
|
||||||
<td>1.6.0</td>
|
<td>1.6.0</td>
|
||||||
</tr>
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.parquet.datetimeRebaseModeInRead</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
|
||||||
|
</ul>
|
||||||
|
This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
|
||||||
|
</td>
|
||||||
|
<td>3.0.0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.parquet.datetimeRebaseModeInWrite</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>3.0.0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.parquet.int96RebaseModeInRead</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Julian to Proleptic Gregorian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient INT96 timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase INT96 timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
|
||||||
|
</ul>
|
||||||
|
This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
|
||||||
|
</td>
|
||||||
|
<td>3.1.0</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>spark.sql.parquet.int96RebaseModeInWrite</td>
|
||||||
|
<td><code>EXCEPTION</code></td>
|
||||||
|
<td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Proleptic Gregorian to Julian calendar:<br>
|
||||||
|
<ul>
|
||||||
|
<li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient timestamps that are ambiguous between the two calendars.</li>
|
||||||
|
<li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
|
||||||
|
<li><code>LEGACY</code>: Spark will rebase INT96 timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
|
||||||
|
</ul>
|
||||||
|
</td>
|
||||||
|
<td>3.1.0</td>
|
||||||
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
Loading…
Reference in a new issue