987f386588
## What changes were proposed in this pull request? 1. Split the main page of sql-programming-guide into 7 parts: - Getting Started - Data Sources - Performance Turing - Distributed SQL Engine - PySpark Usage Guide for Pandas with Apache Arrow - Migration Guide - Reference 2. Add left menu for sql-programming-guide, keep first level index for each part in the menu. ![image](https://user-images.githubusercontent.com/4833765/47016859-6332e180-d183-11e8-92e8-ce62518a83c4.png) ## How was this patch tested? Local test with jekyll build/serve. Closes #22746 from xuanyuanking/SPARK-24499. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>
27 lines
1.4 KiB
Markdown
27 lines
1.4 KiB
Markdown
---
|
|
layout: global
|
|
title: ORC Files
|
|
displayTitle: ORC Files
|
|
---
|
|
|
|
Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
|
|
To do that, the following configurations are newly added. The vectorized reader is used for the
|
|
native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
|
|
is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
|
|
serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
|
|
the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.
|
|
|
|
<table class="table">
|
|
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th></tr>
|
|
<tr>
|
|
<td><code>spark.sql.orc.impl</code></td>
|
|
<td><code>native</code></td>
|
|
<td>The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>. <code>native</code> means the native ORC support that is built on Apache ORC 1.4. `hive` means the ORC library in Hive 1.2.1.</td>
|
|
</tr>
|
|
<tr>
|
|
<td><code>spark.sql.orc.enableVectorizedReader</code></td>
|
|
<td><code>true</code></td>
|
|
<td>Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>, a new non-vectorized ORC reader is used in <code>native</code> implementation. For <code>hive</code> implementation, this is ignored.</td>
|
|
</tr>
|
|
</table>
|