History

Dongjoon Hyun 1c9486c1ac [SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write ## What changes were proposed in this pull request? Before ORC 1.5.3, `orc.dictionary.key.threshold` and `hive.exec.orc.dictionary.key.size.threshold` are applied for all columns. This has been a big huddle to enable dictionary encoding. From ORC 1.5.3, `orc.column.encoding.direct` is added to enforce direct encoding selectively in a column-wise manner. This PR aims to add that feature by upgrading ORC from 1.5.2 to 1.5.3. The followings are the patches in ORC 1.5.3 and this feature is the only one related to Spark directly. ``` ORC-406: ORC: Char(n) and Varchar(n) writers truncate to n bytes & corrupts multi-byte data (gopalv) ORC-403: [C++] Add checks to avoid invalid offsets in InputStream ORC-405: Remove calcite as a dependency from the benchmarks. ORC-375: Fix libhdfs on gcc7 by adding #include <functional> two places. ORC-383: Parallel builds fails with ConcurrentModificationException ORC-382: Apache rat exclusions + add rat check to travis ORC-401: Fix incorrect quoting in specification. ORC-385: Change RecordReader to extend Closeable. ORC-384: [C++] fix memory leak when loading non-ORC files ORC-391: [c++] parseType does not accept underscore in the field name ORC-397: Allow selective disabling of dictionary encoding. Original patch was by Mithun Radhakrishnan. ORC-389: Add ability to not decode Acid metadata columns ``` ## How was this patch tested? Pass the Jenkins with newly added test cases. Closes #22622 from dongjoon-hyun/SPARK-25635. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: gatorsmile <gatorsmile@gmail.com>		2018-10-05 16:42:06 -07:00
..
create-release	[SPARK-24530][FOLLOWUP] run Sphinx with python 3 in docker	2018-10-02 10:10:22 -07:00
deps	[SPARK-25635][SQL][BUILD] Support selective direct encoding in native ORC write	2018-10-05 16:42:06 -07:00
sparktestsupport	[PYSPARK] Updates to pyspark broadcast	2018-09-17 14:06:09 -05:00
tests	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
.gitignore	[SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file.	2018-01-31 00:51:00 +09:00
.rat-excludes	[SPARK-23429][CORE] Add executor memory metrics to heartbeat and expose in executors REST API	2018-09-07 10:42:46 -07:00
appveyor-guide.md	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
appveyor-install-dependencies.ps1	[SPARK-24956][BUILD][FOLLOWUP] Upgrade Maven version to 3.5.4 for AppVeyor as well	2018-07-31 09:14:29 +08:00
change-scala-version.sh	[SPARK-19810][BUILD][CORE] Remove support for Scala 2.10	2017-07-13 17:06:24 +08:00
check-license	[SPARK-22511][BUILD] Update maven central repo address	2017-11-14 17:58:07 -06:00
checkstyle-suppressions.xml	[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle	2017-09-27 13:40:21 -07:00
checkstyle.xml	[HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle	2017-09-27 13:40:21 -07:00
github_jira_sync.py	[MINOR] Fix a bunch of typos	2018-01-02 07:10:19 +09:00
lint-java	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
lint-python	[SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0	2018-09-14 20:13:07 -05:00
lint-r	[SPARK-10328] [SPARKR] Fix generic for na.omit	2015-08-28 00:37:50 -07:00
lint-r.R	[SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r	2017-10-01 18:42:45 +09:00
lint-scala	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	2014-08-06 12:58:24 -07:00
make-distribution.sh	[SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and specialize for source vs binary	2018-09-17 08:54:44 -05:00
merge_spark_pr.py	[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4	2018-09-13 11:19:43 +08:00
mima	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
pip-sanity-check.py	[SPARK-19064][PYSPARK] Fix pip installing of sub components	2017-01-25 14:43:39 -08:00
README.md	Merge pull request #565 from pwendell/dev-scripts. Closes #565 .	2014-02-08 23:13:34 -08:00
requirements.txt	[SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names	2018-09-07 09:35:25 -07:00
run-pip-tests	Fix typos detected by github.com/client9/misspell	2018-08-11 21:23:36 -05:00
run-tests	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
run-tests-jenkins	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00
run-tests-jenkins.py	[SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0	2018-09-14 20:13:07 -05:00
run-tests.py	[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4	2018-09-13 11:19:43 +08:00
sbt-checkstyle	[SPARK-22269][BUILD] Run Java linter via SBT for Jenkins	2018-05-24 14:19:32 +08:00
scalastyle	[SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses)	2018-01-13 21:34:28 -08:00
test-dependencies.sh	[SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups	2018-04-24 09:57:09 -07:00
tox.ini	[SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0	2018-09-14 20:13:07 -05:00

README.md

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.