From 3c3193c0fcee532ca13e33e84abf2bb9abe4f7a2 Mon Sep 17 00:00:00 2001
From: Cheng Su <chengsu@fb.com>
Date: Thu, 1 Jul 2021 19:01:35 +0900
Subject: [PATCH] [SPARK-35965][DOCS] Add doc for ORC nested column vectorized
 reader

### What changes were proposed in this pull request?

In https://issues.apache.org/jira/browse/SPARK-34862, we added support for ORC nested column vectorized reader, and it is disabled by default for now. So we would like to add the user-facing documentation for it, and user can opt-in to use it if they want.

### Why are the changes needed?

To make user be aware of the feature, and let them know the instruction to use the feature.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation itself.

### How was this patch tested?

Manually check generated documentation as below.

<img width="1153" alt="Screen Shot 2021-07-01 at 12 19 40 AM" src="https://user-images.githubusercontent.com/4629931/124083422-b0724280-da02-11eb-93aa-a25d118ba56e.png">

<img width="1147" alt="Screen Shot 2021-07-01 at 12 19 52 AM" src="https://user-images.githubusercontent.com/4629931/124083442-b5cf8d00-da02-11eb-899f-827d55b8558d.png">

Closes #33168 from c21/orc-doc.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
---
 docs/sql-data-sources-orc.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)
diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md
index 4471527a76..28e237a382 100644
--- a/docs/sql-data-sources-orc.md
+++ b/docs/sql-data-sources-orc.md
@@ -37,6 +37,8 @@ For example, historically, `native` implementation handles `CHAR/VARCHAR` with S
 
 `native` implementation supports a vectorized ORC reader and has been the default ORC implementaion since Spark 2.3.
 The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`.
+For nested data types (array, map and struct), vectorized reader is disabled by default. Set `spark.sql.orc.enableNestedColumnVectorizedReader` to `true` to enable vectorized reader for these types.
+
 For the Hive ORC serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
 the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`, and is turned on by default.
 
@@ -151,6 +153,16 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC
     </td>
     <td>2.3.0</td>
   </tr>
+  <tr>
+    <td><code>spark.sql.orc.enableNestedColumnVectorizedReader</code></td>
+    <td><code>false</code></td>
+    <td>
+      Enables vectorized orc decoding in <code>native</code> implementation for nested data types
+      (array, map and struct). If <code>spark.sql.orc.enableVectorizedReader</code> is set to
+      <code>false</code>, this is ignored.
+    </td>
+    <td>3.2.0</td>
+  </tr>
   <tr>
   <td><code>spark.sql.orc.mergeSchema</code></td>
   <td>false</td>