spark-instrumented-optimizer

History

Shixiong Zhu 9bf4e2baad [SPARK-19497][SS] Implement streaming deduplication ## What changes were proposed in this pull request? This PR adds a special streaming deduplication operator to support `dropDuplicates` with `aggregation` and watermark. It reuses the `dropDuplicates` API but creates new logical plan `Deduplication` and new physical plan `DeduplicationExec`. The following cases are supported: - one or multiple `dropDuplicates()` without aggregation (with or without watermark) - `dropDuplicates` before aggregation Not supported cases: - `dropDuplicates` after aggregation Breaking changes: - `dropDuplicates` without aggregation doesn't work with `complete` or `update` mode. ## How was this patch tested? The new unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #16970 from zsxwing/dedup.		2017-02-23 11:25:39 -08:00
..
benchmarks	[SPARK-17335][SQL] Fix ArrayType and MapType CatalogString.	2016-09-03 19:02:20 +02:00
src	[SPARK-19497][SS] Implement streaming deduplication	2017-02-23 11:25:39 -08:00
pom.xml	[SPARK-19409][BUILD][TEST-MAVEN] Fix ParquetAvroCompatibilitySuite failure due to test dependency on avro	2017-02-08 12:21:49 +00:00