diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md index 4471527a76..28e237a382 100644 --- a/docs/sql-data-sources-orc.md +++ b/docs/sql-data-sources-orc.md @@ -37,6 +37,8 @@ For example, historically, `native` implementation handles `CHAR/VARCHAR` with S `native` implementation supports a vectorized ORC reader and has been the default ORC implementaion since Spark 2.3. The vectorized reader is used for the native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl` is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. +For nested data types (array, map and struct), vectorized reader is disabled by default. Set `spark.sql.orc.enableNestedColumnVectorizedReader` to `true` to enable vectorized reader for these types. + For the Hive ORC serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`), the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`, and is turned on by default. @@ -151,6 +153,16 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC 2.3.0 + + spark.sql.orc.enableNestedColumnVectorizedReader + false + + Enables vectorized orc decoding in native implementation for nested data types + (array, map and struct). If spark.sql.orc.enableVectorizedReader is set to + false, this is ignored. + + 3.2.0 + spark.sql.orc.mergeSchema false