spark-instrumented-optimizer/sql/core
Chao Sun 2479778934
[SPARK-33492][SQL] DSv2: Append/Overwrite/ReplaceTable should invalidate cache
### What changes were proposed in this pull request?

This adds changes in the following places:
- logic to also refresh caches referencing the target table in v2 `AppendDataExec`, `OverwriteByExpressionExec`, `OverwritePartitionsDynamicExec`, as well as their v1 fallbacks `AppendDataExecV1` and `OverwriteByExpressionExecV1`.
- logic to invalidate caches referencing the target table in v2 `ReplaceTableAsSelectExec` and its atomic version `AtomicReplaceTableAsSelectExec`. These are only supported in v2 at the moment though.

In addition to the above, in order to test the v1 write fallback behavior, I extended `InMemoryTableWithV1Fallback` to also support batch reads.

### Why are the changes needed?

Currently in DataSource v2 we don't refresh or invalidate caches referencing the target table when the table content is changed by operations such as append, overwrite, or replace table. This is different from DataSource v1, and could potentially cause data correctness issue if the staled caches are queried later.

### Does this PR introduce _any_ user-facing change?

Yes. Now When a data source v2 is cached (either directly or indirectly), all the relevant caches will be refreshed or invalidated if the table is replaced.

### How was this patch tested?

Added unit tests for the new code path.

Closes #30429 from sunchao/SPARK-33492.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-11-20 14:59:56 -08:00
..
benchmarks [SPARK-33427][SQL] Add subexpression elimination for interpreted expression evaluation 2020-11-17 14:29:37 +00:00
src [SPARK-33492][SQL] DSv2: Append/Overwrite/ReplaceTable should invalidate cache 2020-11-20 14:59:56 -08:00
pom.xml [SPARK-33107][BUILD][FOLLOW-UP] Remove com.twitter:parquet-hadoop-bundle:1.6.0 and orc.classifier 2020-10-11 21:54:56 -07:00