spark-instrumented-optimizer/sql/core
Jungtaek Lim (HeartSaVioR) 233a8494c8 [SPARK-27237][SS] Introduce State schema validation among query restart
## What changes were proposed in this pull request?

Please refer the description of [SPARK-27237](https://issues.apache.org/jira/browse/SPARK-27237) to see rationalization of this patch.

This patch proposes to introduce state schema validation, via storing key schema and value schema to `schema` file (for the first time) and verify new key schema and value schema for state are compatible with existing one. To be clear for definition of "compatible", state schema is "compatible" when number of fields are same and data type for each field is same - Spark has been allowing rename of field.

This patch will prevent query run which has incompatible state schema, which would reduce the chance to get indeterministic behavior (actually renaming of field is also the smell of semantically incompatible, but end users could just modify its name so we can't say) as well as providing more informative error message.

## How was this patch tested?

Added UTs.

Closes #24173 from HeartSaVioR/SPARK-27237.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-12-04 19:33:11 +09:00
..
benchmarks [SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in SubExprEliminationBenchmark 2020-11-25 15:22:47 -08:00
src [SPARK-27237][SS] Introduce State schema validation among query restart 2020-12-04 19:33:11 +09:00
pom.xml [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier 2020-10-11 21:54:56 -07:00