[SPARK-34437][SQL][DOCS] Update Spark SQL guide about the rebasing DS options and SQL configs

### What changes were proposed in this pull request? In the PR, I propose to update the Spark SQL guide about the SQL configs that are related to datetime rebasing: - spark.sql.parquet.int96RebaseModeInWrite - spark.sql.parquet.datetimeRebaseModeInWrite - spark.sql.parquet.int96RebaseModeInRead - spark.sql.parquet.datetimeRebaseModeInRead - spark.sql.avro.datetimeRebaseModeInWrite - spark.sql.avro.datetimeRebaseModeInRead Parquet options added by #31489: - datetimeRebaseMode - int96RebaseMode and Avro options added by #31529: - datetimeRebaseMode <img width="998" alt="Screenshot 2021-02-17 at 21 42 09" src="https://user-images.githubusercontent.com/1580697/108252043-3afb8900-7169-11eb-8568-511e21fa7f78.png"> ### Why are the changes needed? To inform users about supported DS options and SQL configs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By generating the doc and manually checking: ``` $ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch ``` Closes #31564 from MaxGekk/doc-rebase-options. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-02-18 17:48:50 +09:00 · 2021-02-18 17:48:50 +09:00 · b58f0976a9
parent 7b549c3e53
commit b58f0976a9
2 changed files with 124 additions and 0 deletions
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@ -283,6 +283,19 @@ Data source options of Avro can be set via:
    </td>
    <td>function <code>from_avro</code></td>
  </tr>
  <tr>
    <td><code>datetimeRebaseMode</code></td>
    <td>The SQL config <code>spark.sql.avro</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
    <td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
      Currently supported modes are:
      <ul>
        <li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
        <li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
        <li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
      </ul>
    </td>
    <td>read and function <code>from_avro</code></td>
  </tr>
 </table>
 ## Configuration
@ -318,6 +331,31 @@ Configuration of Avro can be done using the `setConf` method on SparkSession or
    </td>
    <td>2.4.0</td>
  </tr>
  <tr>
    <td>spark.sql.avro.datetimeRebaseModeInRead</td>
    <td><code>EXCEPTION</code></td>
    <td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
      <ul>
        <li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
        <li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
        <li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Avro files.</li>
      </ul>
      This config is only effective if the writer info (like Spark, Hive) of the Avro files is unknown.
    </td>
    <td>3.0.0</td>
  </tr>
  <tr>
    <td>spark.sql.avro.datetimeRebaseModeInWrite</td>
    <td><code>EXCEPTION</code></td>
    <td>The rebasing mode for the values of the <code>date</code>, <code>timestamp-micros</code>, <code>timestamp-millis</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
      <ul>
        <li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
        <li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
        <li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Avro files.</li>
      </ul>
    </td>
    <td>3.0.0</td>
  </tr>
 </table>
 ## Compatibility with Databricks spark-avro
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@ -252,6 +252,42 @@ REFRESH TABLE my_table;
 </div>
 ## Data Source Option
 Data source options of Parquet can be set via:
 * the `.option`/`.options` methods of `DataFrameReader` or `DataFrameWriter`
 * the `.option`/`.options` methods of `DataStreamReader` or `DataStreamWriter`
 <table class="table">
  <tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Scope</b></th></tr>
  <tr>
    <td><code>datetimeRebaseMode</code></td>
    <td>The SQL config <code>spark.sql.parquet</code> <code>.datetimeRebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
    <td>The <code>datetimeRebaseMode</code> option allows to specify the rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar.<br>
      Currently supported modes are:
      <ul>
        <li><code>EXCEPTION</code>: fails in reads of ancient dates/timestamps that are ambiguous between the two calendars.</li>
        <li><code>CORRECTED</code>: loads dates/timestamps without rebasing.</li>
        <li><code>LEGACY</code>: performs rebasing of ancient dates/timestamps from the Julian to Proleptic Gregorian calendar.</li>
      </ul>
    </td>
    <td>read</td>
  </tr>
  <tr>
    <td><code>int96RebaseMode</code></td>
    <td>The SQL config <code>spark.sql.parquet</code> <code>.int96RebaseModeInRead</code> which is <code>EXCEPTION</code> by default</td>
    <td>The <code>int96RebaseMode</code> option allows to specify the rebasing mode for INT96 timestamps from the Julian to Proleptic Gregorian calendar.<br>
      Currently supported modes are:
      <ul>
        <li><code>EXCEPTION</code>: fails in reads of ancient INT96 timestamps that are ambiguous between the two calendars.</li>
        <li><code>CORRECTED</code>: loads INT96 timestamps without rebasing.</li>
        <li><code>LEGACY</code>: performs rebasing of ancient timestamps from the Julian to Proleptic Gregorian calendar.</li>
      </ul>
    </td>
    <td>read</td>
  </tr>
 </table>
 ### Configuration
 Configuration of Parquet can be done using the `setConf` method on `SparkSession` or by running
@ -329,4 +365,54 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession
  </td>
  <td>1.6.0</td>
 </tr>
 <tr>
 <td>spark.sql.parquet.datetimeRebaseModeInRead</td>
  <td><code>EXCEPTION</code></td>
  <td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Julian to Proleptic Gregorian calendar:<br>
    <ul>
      <li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
      <li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
      <li><code>LEGACY</code>: Spark will rebase dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
    </ul>
    This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
  </td>
  <td>3.0.0</td>
 </tr>
 <tr>
  <td>spark.sql.parquet.datetimeRebaseModeInWrite</td>
  <td><code>EXCEPTION</code></td>
  <td>The rebasing mode for the values of the <code>DATE</code>, <code>TIMESTAMP_MILLIS</code>, <code>TIMESTAMP_MICROS</code> logical types from the Proleptic Gregorian to Julian calendar:<br>
    <ul>
      <li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient dates/timestamps that are ambiguous between the two calendars.</li>
      <li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
      <li><code>LEGACY</code>: Spark will rebase dates/timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
    </ul>
  </td>
  <td>3.0.0</td>
 </tr>
 <tr>
  <td>spark.sql.parquet.int96RebaseModeInRead</td>
  <td><code>EXCEPTION</code></td>
  <td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Julian to Proleptic Gregorian calendar:<br>
    <ul>
      <li><code>EXCEPTION</code>: Spark will fail the reading if it sees ancient INT96 timestamps that are ambiguous between the two calendars.</li>
      <li><code>CORRECTED</code>: Spark will not do rebase and read the dates/timestamps as it is.</li>
      <li><code>LEGACY</code>: Spark will rebase INT96 timestamps from the legacy hybrid (Julian + Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files.</li>
    </ul>
    This config is only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.
  </td>
  <td>3.1.0</td>
 </tr>
 <tr>
  <td>spark.sql.parquet.int96RebaseModeInWrite</td>
  <td><code>EXCEPTION</code></td>
  <td>The rebasing mode for the values of the <code>INT96</code> timestamp type from the Proleptic Gregorian to Julian calendar:<br>
    <ul>
      <li><code>EXCEPTION</code>: Spark will fail the writing if it sees ancient timestamps that are ambiguous between the two calendars.</li>
      <li><code>CORRECTED</code>: Spark will not do rebase and write the dates/timestamps as it is.</li>
      <li><code>LEGACY</code>: Spark will rebase INT96 timestamps from Proleptic Gregorian calendar to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files.</li>
    </ul>
  </td>
  <td>3.1.0</td>
 </tr>
 </table>