spark-instrumented-optimizer

History

Gengliang Wang 07593d362f [SPARK-27506][SQL][FOLLOWUP] Use option `avroSchema` to specify an evolved schema in `from_avro` ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/26780 In https://github.com/apache/spark/pull/26780, a new Avro data source option `actualSchema` is introduced for setting the original Avro schema in function `from_avro`, while the expected schema is supposed to be set in the parameter `jsonFormatSchema` of `from_avro`. However, there is another Avro data source option `avroSchema`. It is used for setting the expected schema in readiong and writing. This PR is to use the option `avroSchema` option for reading Avro data with an evolved schema and remove the new one `actualSchema` ### Why are the changes needed? Unify and simplify the Avro data source options. ### Does this PR introduce any user-facing change? Yes. To deserialize Avro data with an evolved schema, before changes: ``` from_avro('col, expectedSchema, ("actualSchema" -> actualSchema)) ``` After changes: ``` from_avro('col, actualSchema, ("avroSchema" -> expectedSchema)) ``` The second parameter is always the actual Avro schema after changes. ### How was this patch tested? Update the existing tests in https://github.com/apache/spark/pull/26780 Closes #27045 from gengliangwang/renameAvroOption. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-12-30 18:14:21 +09:00
..
avro	[SPARK-27506][SQL][FOLLOWUP] Use option `avroSchema` to specify an evolved schema in `from_avro`	2019-12-30 18:14:21 +09:00
tests	[SPARK-29188][PYTHON] toPandas (without Arrow) gets wrong dtypes when applied on empty DF	2019-12-12 20:49:10 +09:00
__init__.py	[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF	2019-09-30 22:25:35 +09:00
catalog.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
cogroup.py	[SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF	2019-09-30 22:25:35 +09:00
column.py	[SPARK-29664][PYTHON][SQL] Column.getItem behavior is not consistent with Scala	2019-11-01 12:25:48 +09:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[MINOR][PYSPARK][DOCS] Fix typo in example documentation	2019-11-01 11:55:29 -07:00
dataframe.py	[SPARK-30200][SQL][FOLLOW-UP] Expose only explain(mode: String) in Scala side, and clean up related codes	2019-12-16 14:42:35 +09:00
functions.py	[MINOR][DOCS] Fix documentation for slide function	2019-12-16 16:29:09 +09:00
group.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
readwriter.py	[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC	2019-12-23 09:57:42 +09:00
session.py	[MINOR][PYSPARK][DOCS] Fix typo in example documentation	2019-11-01 11:55:29 -07:00
streaming.py	[SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC	2019-12-23 09:57:42 +09:00
types.py	[SPARK-29798][PYTHON][SQL] Infers bytes as binary type in createDataFrame in Python 3 at PySpark	2019-11-08 12:10:39 -08:00
udf.py	[SPARK-27463][PYTHON] Support Dataframe Cogroup via Pandas UDFs	2019-09-17 17:13:50 -07:00
utils.py	[SPARK-29376][SQL][PYTHON] Upgrade Apache Arrow to version 0.15.1	2019-11-15 13:27:30 +09:00
window.py	[SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations	2019-09-01 10:15:00 -05:00