From 3c3193c0fcee532ca13e33e84abf2bb9abe4f7a2 Mon Sep 17 00:00:00 2001 From: Cheng Su Date: Thu, 1 Jul 2021 19:01:35 +0900 Subject: [PATCH] [SPARK-35965][DOCS] Add doc for ORC nested column vectorized reader ### What changes were proposed in this pull request? In https://issues.apache.org/jira/browse/SPARK-34862, we added support for ORC nested column vectorized reader, and it is disabled by default for now. So we would like to add the user-facing documentation for it, and user can opt-in to use it if they want. ### Why are the changes needed? To make user be aware of the feature, and let them know the instruction to use the feature. ### Does this PR introduce _any_ user-facing change? Yes, the documentation itself. ### How was this patch tested? Manually check generated documentation as below. Screen Shot 2021-07-01 at 12 19 40 AM Screen Shot 2021-07-01 at 12 19 52 AM Closes #33168 from c21/orc-doc. Authored-by: Cheng Su Signed-off-by: Hyukjin Kwon --- docs/sql-data-sources-orc.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index 4471527a76..28e237a382 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -37,6 +37,8 @@ For example, historically, `native` implementation handles `CHAR/VARCHAR` with S `native` implementation supports a vectorized ORC reader and has been the default ORC implementaion since Spark 2.3. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. +For nested data types (array, map and struct), vectorized reader is disabled by default. Set `spark.sql.orc.enableNestedColumnVectorizedReader` to `true` to enable vectorized reader for these types. + For the Hive ORC serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`, and is turned on by default. @@ -151,6 +153,16 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC 2.3.0 + + spark.sql.orc.enableNestedColumnVectorizedReader + false + + Enables vectorized orc decoding in native implementation for nested data types + (array, map and struct). If spark.sql.orc.enableVectorizedReader is set to + false, this is ignored. + + 3.2.0 + spark.sql.orc.mergeSchema false