diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 3f89111687..18dfbec794 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1820,6 +1820,8 @@ Some of them are as follows. - Distinct operations on streaming Datasets are not supported. +- Deduplication operation is not supported after aggregation on a streaming Datasets. + - Sorting operations are supported on streaming Datasets only after an aggregation and in Complete Output Mode. - Few types of outer joins on streaming Datasets are not supported. See the @@ -3464,7 +3466,7 @@ the effect of the change is not well-defined. For all of them: - *Streaming aggregation*: For example, `sdf.groupBy("a").agg(...)`. Any change in number or type of grouping keys or aggregates is not allowed. - - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any change in number or type of grouping keys or aggregates is not allowed. + - *Streaming deduplication*: For example, `sdf.dropDuplicates("a")`. Any change in number or type of deduplicating columns is not allowed. - *Stream-stream join*: For example, `sdf1.join(sdf2, ...)` (i.e. both inputs are generated with `sparkSession.readStream`). Changes in the schema or equi-joining columns are not allowed. Changes in join type (outer or inner) are not allowed. Other changes in the join condition are ill-defined.