spark-instrumented-optimizer

History

Gengliang Wang 48adc91057 [SPARK-28698][SQL] Support user-specified output schema in `to_avro` ## What changes were proposed in this pull request? The mapping of Spark schema to Avro schema is many-to-many. (See https://spark.apache.org/docs/latest/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion) The default schema mapping might not be exactly what users want. For example, by default, a "string" column is always written as "string" Avro type, but users might want to output the column as "enum" Avro type. With PR https://github.com/apache/spark/pull/21847, Spark supports user-specified schema in the batch writer. For the function `to_avro`, we should support user-specified output schema as well. ## How was this patch tested? Unit test. Closes #25419 from gengliangwang/to_avro. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2019-08-13 20:52:16 +08:00
..
__init__.py	[SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs	2019-03-11 10:15:07 +09:00
functions.py	[SPARK-28698][SQL] Support user-specified output schema in `to_avro`	2019-08-13 20:52:16 +08:00

Gengliang Wang 48adc91057 [SPARK-28698][SQL] Support user-specified output schema in to_avro

## What changes were proposed in this pull request?

The mapping of Spark schema to Avro schema is many-to-many. (See https://spark.apache.org/docs/latest/sql-data-sources-avro.html#supported-types-for-spark-sql---avro-conversion)
The default schema mapping might not be exactly what users want. For example, by default, a "string" column is always written as "string" Avro type, but users might want to output the column as "enum" Avro type.
With PR https://github.com/apache/spark/pull/21847, Spark supports user-specified schema in the batch writer.
For the function `to_avro`, we should support user-specified output schema as well.

## How was this patch tested?

Unit test.

Closes #25419 from gengliangwang/to_avro.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

2019-08-13 20:52:16 +08:00

__init__.py [SPARK-26856][PYSPARK] Python support for from_avro and to_avro APIs 2019-03-11 10:15:07 +09:00

functions.py [SPARK-28698][SQL] Support user-specified output schema in to_avro 2019-08-13 20:52:16 +08:00