a03d77d326
### What changes were proposed in this pull request? 1. Replace the metadata key `org.apache.spark.int96NoRebase` by `org.apache.spark.legacyINT96`. 2. Change the condition when new key should be saved to parquet metadata: it should be saved when the SQL config `spark.sql.legacy.parquet.int96RebaseModeInWrite` is set to `LEGACY`. 3. Change handling the metadata key in read: - If there is no the key in parquet metadata, take the rebase mode from the SQL config: `spark.sql.legacy.parquet.int96RebaseModeInRead` - If parquet files were saved by Spark < 3.1.0, use the `LEGACY` rebasing mode for INT96 type. - For files written by Spark >= 3.1.0, if the `org.apache.spark.legacyINT96` presents in metadata, perform rebasing otherwise don't. ### Why are the changes needed? - To not increase parquet size by default when `spark.sql.legacy.parquet.int96RebaseModeInWrite` is `EXCEPTION` after https://github.com/apache/spark/pull/30121. - To have the implementation similar to `org.apache.spark.legacyDateTime` - To minimise impact on other subsystems that are based on file sizes like gathering statistics. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Modified test in `ParquetIOSuite` Closes #30132 from MaxGekk/int96-flip-metadata-rebase-key. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |