[SPARK-35965][DOCS] Add doc for ORC nested column vectorized reader

### What changes were proposed in this pull request? In https://issues.apache.org/jira/browse/SPARK-34862, we added support for ORC nested column vectorized reader, and it is disabled by default for now. So we would like to add the user-facing documentation for it, and user can opt-in to use it if they want. ### Why are the changes needed? To make user be aware of the feature, and let them know the instruction to use the feature. ### Does this PR introduce _any_ user-facing change? Yes, the documentation itself. ### How was this patch tested? Manually check generated documentation as below. <img width="1153" alt="Screen Shot 2021-07-01 at 12 19 40 AM" src="https://user-images.githubusercontent.com/4629931/124083422-b0724280-da02-11eb-93aa-a25d118ba56e.png"> <img width="1147" alt="Screen Shot 2021-07-01 at 12 19 52 AM" src="https://user-images.githubusercontent.com/4629931/124083442-b5cf8d00-da02-11eb-899f-827d55b8558d.png"> Closes #33168 from c21/orc-doc. Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-07-01 19:01:35 +09:00 · 2021-07-01 19:01:35 +09:00 · 3c3193c0fc
parent 0c34b96541
commit 3c3193c0fc
1 changed files with 12 additions and 0 deletions
--- a/docs/sql-data-sources-orc.md
+++ b/docs/sql-data-sources-orc.md
@ -37,6 +37,8 @@ For example, historically, `native` implementation handles `CHAR/VARCHAR` with S
 `native` implementation supports a vectorized ORC reader and has been the default ORC implementaion since Spark 2.3.
 The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`.
 For nested data types (array, map and struct), vectorized reader is disabled by default. Set `spark.sql.orc.enableNestedColumnVectorizedReader` to `true` to enable vectorized reader for these types.
 For the Hive ORC serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
 the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`, and is turned on by default.
@ -151,6 +153,16 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC
    </td>
    <td>2.3.0</td>
  </tr>
  <tr>
    <td><code>spark.sql.orc.enableNestedColumnVectorizedReader</code></td>
    <td><code>false</code></td>
    <td>
      Enables vectorized orc decoding in <code>native</code> implementation for nested data types
      (array, map and struct). If <code>spark.sql.orc.enableVectorizedReader</code> is set to
      <code>false</code>, this is ignored.
    </td>
    <td>3.2.0</td>
  </tr>
  <tr>
  <td><code>spark.sql.orc.mergeSchema</code></td>
  <td>false</td>