spark-instrumented-optimizer/external
Liang-Chi Hsieh 527d936049 [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
## What changes were proposed in this pull request?

When using `from_avro` to deserialize avro data to catalyst StructType format, if `ConvertToLocalRelation` is applied at the time, `from_avro` produces only the last value (overriding previous values).

The cause is `AvroDeserializer` reuses output row for StructType. Normally, it should be fine in Spark SQL. But `ConvertToLocalRelation` just uses `InterpretedProjection` to project local rows. `InterpretedProjection` creates new row for each output thro, it includes the same nested row object from `AvroDeserializer`. By the end, converted local relation has only last value.

I think there're two possible options:

1. Make `AvroDeserializer` output new row for StructType.
2. Use `InterpretedMutableProjection` in `ConvertToLocalRelation` and call `copy()` on output rows.

Option 2 is chose because previously `ConvertToLocalRelation` also creates new rows, this `InterpretedMutableProjection` + `copy()` shoudn't bring too much performance penalty. `ConvertToLocalRelation` should be arguably less critical, compared with `AvroDeserializer`.

## How was this patch tested?

Added test.

Closes #24805 from viirya/SPARK-27798.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-06-07 13:47:36 -07:00
..
avro [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation 2019-06-07 13:47:36 -07:00
docker [SPARK-27794][R][DOCS] Use https URL for CRAN repo 2019-05-22 14:28:21 -07:00
docker-integration-tests [SPARK-27596][SQL] The JDBC 'query' option doesn't work for Oracle database 2019-05-05 21:52:23 -07:00
kafka-0-10 [SPARK-27294][SS] Add multi-cluster Kafka delegation token 2019-05-07 11:40:43 -07:00
kafka-0-10-assembly [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00
kafka-0-10-sql [SPARK-27748][SS] Kafka consumer/producer password/token redaction. 2019-06-03 15:43:08 -07:00
kafka-0-10-token-provider [SPARK-27748][SS] Kafka consumer/producer password/token redaction. 2019-06-03 15:43:08 -07:00
kinesis-asl [SPARK-27610][YARN] Shade netty native libraries 2019-05-07 10:47:36 -07:00
kinesis-asl-assembly [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00
spark-ganglia-lgpl [SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 2018-11-14 16:22:23 -08:00