spark-instrumented-optimizer/sql/core/src/test/resources
Damian Guy 071bbad5db [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists
This PR is inspired by #8063 authored by dguy. Especially, testing Parquet files added here are all taken from that PR.

**Committer who merges this PR should attribute it to "Damian Guy <damian.guygmail.com>".**

----

SPARK-6776 and SPARK-6777 followed `parquet-avro` to implement backwards-compatibility rules defined in `parquet-format` spec. However, both Spark SQL and `parquet-avro` neglected the following statement in `parquet-format`:

> This does not affect repeated fields that are not annotated: A repeated field that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field.

One of the consequences is that, Parquet files generated by `parquet-protobuf` containing unannotated repeated fields are not correctly converted to Catalyst arrays.

This PR fixes this issue by

1. Handling unannotated repeated fields in `CatalystSchemaConverter`.
2. Converting this kind of special repeated fields to Catalyst arrays in `CatalystRowConverter`.

   Two special converters, `RepeatedPrimitiveConverter` and `RepeatedGroupConverter`, are added. They delegate actual conversion work to a child `elementConverter` and accumulates elements in an `ArrayBuffer`.

   Two extra methods, `start()` and `end()`, are added to `ParentContainerUpdater`. So that they can be used to initialize new `ArrayBuffer`s for unannotated repeated fields, and propagate converted array values to upstream.

Author: Cheng Lian <lian@databricks.com>

Closes #8070 from liancheng/spark-9340/unannotated-parquet-list and squashes the following commits:

ace6df7 [Cheng Lian] Moves ParquetProtobufCompatibilitySuite
f1c7bfd [Cheng Lian] Updates .rat-excludes
420ad2b [Cheng Lian] Fixes converting unannotated Parquet lists
2015-08-11 12:46:33 +08:00
..
META-INF/services [SPARK-9486][SQL] Add data source aliasing for external packages 2015-08-08 11:03:01 -07:00
log4j.properties [SPARK-7743] [SQL] Parquet 1.7 2015-06-04 11:32:03 -07:00
nested-array-struct.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
old-repeated-int.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
old-repeated-message.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
old-repeated.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
parquet-thrift-compat.snappy.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
proto-repeated-string.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
proto-repeated-struct.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
proto-struct-with-array-many.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00
proto-struct-with-array.parquet [SPARK-9340] [SQL] Fixes converting unannotated Parquet lists 2015-08-11 12:46:33 +08:00