spark-instrumented-optimizer

History

Gengliang Wang 395860a986 [SPARK-24768][SQL] Have a built-in AVRO data source implementation ## What changes were proposed in this pull request? Apache Avro (https://avro.apache.org) is a popular data serialization format. It is widely used in the Spark and Hadoop ecosystem, especially for Kafka-based data pipelines. Using the external package https://github.com/databricks/spark-avro, Spark SQL can read and write the avro data. Making spark-Avro built-in can provide a better experience for first-time users of Spark SQL and structured streaming. We expect the built-in Avro data source can further improve the adoption of structured streaming. The proposal is to inline code from spark-avro package (https://github.com/databricks/spark-avro). The target release is Spark 2.4. [Built-in AVRO Data Source In Spark 2.4.pdf](https://github.com/apache/spark/files/2181511/Built-in.AVRO.Data.Source.In.Spark.2.4.pdf) ## How was this patch tested? Unit test Author: Gengliang Wang <gengliang.wang@databricks.com> Closes #21742 from gengliangwang/export_avro.		2018-07-12 13:55:25 -07:00
..
__init__.py	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	2016-11-16 14:22:15 -08:00
modules.py	[SPARK-24768][SQL] Have a built-in AVRO data source implementation	2018-07-12 13:55:25 -07:00
shellutils.py	[SPARK-22302][INFRA] Remove manual backports for subprocess and print explicit message for < Python 2.7	2017-10-22 02:22:35 +09:00
toposort.py	[MINOR] Fix typos in dev/* scripts.	2018-01-31 07:37:25 +09:00