spark-instrumented-optimizer/dev
Dongjoon Hyun 1c9486c1ac [SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write
## What changes were proposed in this pull request?

Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` are applied for all columns. This has been a big huddle to enable dictionary encoding. From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This PR aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3.

The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly.
```
ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv)
ORC-403: [C++] Add checks to avoid invalid offsets in InputStream
ORC-405: Remove calcite as a dependency from the benchmarks.
ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places.
ORC-383: Parallel builds fails with ConcurrentModificationException
ORC-382: Apache rat exclusions + add rat check to travis
ORC-401: Fix incorrect quoting in specification.
ORC-385: Change RecordReader to extend Closeable.
ORC-384: [C++] fix memory leak when loading non-ORC files
ORC-391: [c++] parseType does not accept underscore in the field name
ORC-397: Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan.
ORC-389: Add ability to not decode Acid metadata columns
```

## How was this patch tested?

Pass the Jenkins with newly added test cases.

Closes #22622 from dongjoon-hyun/SPARK-25635.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
2018-10-05 16:42:06 -07:00
..
create-release [SPARK-24530][FOLLOWUP] run Sphinx with python 3 in docker 2018-10-02 10:10:22 -07:00
deps [SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write 2018-10-05 16:42:06 -07:00
sparktestsupport [PYSPARK] Updates to pyspark broadcast 2018-09-17 14:06:09 -05:00
tests [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-23429][CORE] Add executor memory metrics to heartbeat and expose in executors REST API 2018-09-07 10:42:46 -07:00
appveyor-guide.md [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
appveyor-install-dependencies.ps1 [SPARK-24956][BUILD][FOLLOWUP] Upgrade Maven version to 3.5.4 for AppVeyor as well 2018-07-31 09:14:29 +08:00
change-scala-version.sh [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 2017-07-13 17:06:24 +08:00
check-license [SPARK-22511][BUILD] Update maven central repo address 2017-11-14 17:58:07 -06:00
checkstyle-suppressions.xml [HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle 2017-09-27 13:40:21 -07:00
checkstyle.xml [HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle 2017-09-27 13:40:21 -07:00
github_jira_sync.py [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-python [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0 2018-09-14 20:13:07 -05:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
lint-scala [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
make-distribution.sh [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and specialize for source vs binary 2018-09-17 08:54:44 -05:00
merge_spark_pr.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
mima [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
pip-sanity-check.py [SPARK-19064][PYSPARK] Fix pip installing of sub components 2017-01-25 14:43:39 -08:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names 2018-09-07 09:35:25 -07:00
run-pip-tests Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
run-tests [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7 2017-10-22 02:22:35 +09:00
run-tests-jenkins [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
run-tests-jenkins.py [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0 2018-09-14 20:13:07 -05:00
run-tests.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
sbt-checkstyle [SPARK-22269][BUILD] Run Java linter via SBT for Jenkins 2018-05-24 14:19:32 +08:00
scalastyle [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
test-dependencies.sh [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups 2018-04-24 09:57:09 -07:00
tox.ini [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0 2018-09-14 20:13:07 -05:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.