spark-instrumented-optimizer/dev
Dongjoon Hyun 486ecc680e [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4
## What changes were proposed in this pull request?

ORC 1.4.4 includes [nine fixes](https://issues.apache.org/jira/issues/?filter=12342568&jql=project%20%3D%20ORC%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%201.4.4). One of the issues is about `Timestamp` bug (ORC-306) which occurs when `native` ORC vectorized reader reads ORC column vector's sub-vector `times` and `nanos`. ORC-306 fixes this according to the [original definition](https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampColumnVector.java#L45-L46) and this PR includes the updated interpretation on ORC column vectors. Note that `hive` ORC reader and ORC MR reader is not affected.

```scala
scala> spark.version
res0: String = 2.3.0
scala> spark.sql("set spark.sql.orc.impl=native")
scala> Seq(java.sql.Timestamp.valueOf("1900-05-05 12:34:56.000789")).toDF().write.orc("/tmp/orc")
scala> spark.read.orc("/tmp/orc").show(false)
+--------------------------+
|value                     |
+--------------------------+
|1900-05-05 12:34:55.000789|
+--------------------------+
```

This PR aims to update Apache Spark to use it.

**FULL LIST**

ID | TITLE
-- | --
ORC-281 | Fix compiler warnings from clang 5.0
ORC-301 | `extractFileTail` should open a file in `try` statement
ORC-304 | Fix TestRecordReaderImpl to not fail with new storage-api
ORC-306 | Fix incorrect workaround for bug in java.sql.Timestamp
ORC-324 | Add support for ARM and PPC arch
ORC-330 | Remove unnecessary Hive artifacts from root pom
ORC-332 | Add syntax version to orc_proto.proto
ORC-336 | Remove avro and parquet dependency management entries
ORC-360 | Implement error checking on subtype fields in Java

## How was this patch tested?

Pass the Jenkins.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #21372 from dongjoon-hyun/SPARK_ORC144.
2018-05-24 11:34:13 +08:00
..
create-release [SPARK-23601][BUILD][FOLLOW-UP] Keep md5 checksums for nexus artifacts. 2018-05-16 13:34:54 -07:00
deps [SPARK-24322][BUILD] Upgrade Apache ORC to 1.4.4 2018-05-24 11:34:13 +08:00
sparktestsupport [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
tests [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-23362][SS] Migrate Kafka Microbatch source to v2 2018-02-16 14:30:19 -08:00
appveyor-guide.md [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
appveyor-install-dependencies.ps1 [MINOR][BUILD] Download RAT and R version info over HTTPS; use RAT 0.12 2017-08-12 14:31:05 +09:00
change-scala-version.sh [SPARK-19810][BUILD][CORE] Remove support for Scala 2.10 2017-07-13 17:06:24 +08:00
check-license [SPARK-22511][BUILD] Update maven central repo address 2017-11-14 17:58:07 -06:00
checkstyle-suppressions.xml [HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle 2017-09-27 13:40:21 -07:00
checkstyle.xml [HOTFIX][BUILD] Fix finalizer checkstyle error and re-disable checkstyle 2017-09-27 13:40:21 -07:00
github_jira_sync.py [MINOR] Fix a bunch of typos 2018-01-02 07:10:19 +09:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-python [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
lint-r [SPARK-10328] [SPARKR] Fix generic for na.omit 2015-08-28 00:37:50 -07:00
lint-r.R [SPARK-22063][R] Fixes lint check failures in R by latest commit sha1 ID of lint-r 2017-10-01 18:42:45 +09:00
lint-scala [SPARK-2627] [PySpark] have the build enforce PEP 8 automatically 2014-08-06 12:58:24 -07:00
make-distribution.sh [SPARK-23383][BUILD][MINOR] Make a distribution should exit with usage while detecting wrong options 2018-02-20 07:51:30 -06:00
merge_spark_pr.py [MINOR][PROJECT-INFRA] Check if 'original_head' variable is defined in clean_up at merge script 2018-05-21 09:47:52 +08:00
mima [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
pip-sanity-check.py [SPARK-19064][PYSPARK] Fix pip installing of sub components 2017-01-25 14:43:39 -08:00
README.md Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-19064][PYSPARK] Fix pip installing of sub components 2017-01-25 14:43:39 -08:00
run-pip-tests [PYSPARK] Update py4j to version 0.10.7. 2018-05-09 10:47:35 -07:00
run-tests [SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7 2017-10-22 02:22:35 +09:00
run-tests-jenkins [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
run-tests-jenkins.py [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT 2018-01-13 00:37:59 +08:00
run-tests.py [SPARK-23522][PYTHON] always use sys.exit over builtin exit 2018-03-08 20:38:34 +09:00
scalastyle [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
test-dependencies.sh [SPARK-23807][BUILD] Add Hadoop 3.1 profile with relevant POM fix ups 2018-04-24 09:57:09 -07:00
tox.ini [SPARK-23174][BUILD][PYTHON] python code style checker update 2018-01-24 21:13:47 +09:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.