2018-10-18 14:59:06 -04:00
|
|
|
---
|
|
|
|
layout: global
|
|
|
|
title: ORC Files
|
|
|
|
displayTitle: ORC Files
|
2019-03-30 20:49:45 -04:00
|
|
|
license: |
|
|
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
|
|
this work for additional information regarding copyright ownership.
|
|
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
|
|
(the "License"); you may not use this file except in compliance with
|
|
|
|
the License. You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License.
|
2018-10-18 14:59:06 -04:00
|
|
|
---
|
|
|
|
|
|
|
|
Since Spark 2.3, Spark supports a vectorized ORC reader with a new ORC file format for ORC files.
|
|
|
|
To do that, the following configurations are newly added. The vectorized reader is used for the
|
|
|
|
native ORC tables (e.g., the ones created using the clause `USING ORC`) when `spark.sql.orc.impl`
|
|
|
|
is set to `native` and `spark.sql.orc.enableVectorizedReader` is set to `true`. For the Hive ORC
|
|
|
|
serde tables (e.g., the ones created using the clause `USING HIVE OPTIONS (fileFormat 'ORC')`),
|
|
|
|
the vectorized reader is used when `spark.sql.hive.convertMetastoreOrc` is also set to `true`.
|
|
|
|
|
|
|
|
<table class="table">
|
2020-03-30 23:33:46 -04:00
|
|
|
<tr><th><b>Property Name</b></th><th><b>Default</b></th><th><b>Meaning</b></th><th><b>Since Version</b></th></tr>
|
2018-10-18 14:59:06 -04:00
|
|
|
<tr>
|
|
|
|
<td><code>spark.sql.orc.impl</code></td>
|
|
|
|
<td><code>native</code></td>
|
2020-03-30 23:33:46 -04:00
|
|
|
<td>
|
|
|
|
The name of ORC implementation. It can be one of <code>native</code> and <code>hive</code>.
|
|
|
|
<code>native</code> means the native ORC support. <code>hive</code> means the ORC library
|
|
|
|
in Hive.
|
|
|
|
</td>
|
|
|
|
<td>2.3.0</td>
|
2018-10-18 14:59:06 -04:00
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td><code>spark.sql.orc.enableVectorizedReader</code></td>
|
|
|
|
<td><code>true</code></td>
|
2020-03-30 23:33:46 -04:00
|
|
|
<td>
|
|
|
|
Enables vectorized orc decoding in <code>native</code> implementation. If <code>false</code>,
|
|
|
|
a new non-vectorized ORC reader is used in <code>native</code> implementation.
|
|
|
|
For <code>hive</code> implementation, this is ignored.
|
|
|
|
</td>
|
|
|
|
<td>2.3.0</td>
|
2018-10-18 14:59:06 -04:00
|
|
|
</tr>
|
|
|
|
</table>
|