spark-instrumented-optimizer

History

Kunal Khamar 3783539d7a [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch. ## What changes were proposed in this pull request? If the user changes the shuffle partition number between batches, Streaming aggregation will fail. Here are some possible cases: - Change "spark.sql.shuffle.partitions" - Use "repartition" and change the partition number in codes - RangePartitioner doesn't generate deterministic partitions. Right now it's safe as we disallow sort before aggregation. Not sure if we will add some operators using RangePartitioner in future. ## How was this patch tested? - Unit tests - Manual tests - forward compatibility tested by using the new `OffsetSeqMetadata` json with Spark v2.1.0 Author: Kunal Khamar <kkhamar@outlook.com> Closes #17216 from kunalkhamar/num-partitions.	2017-03-17 16:16:22 -07:00
..
main	[SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.	2017-03-17 16:16:22 -07:00
test	[SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.	2017-03-17 16:16:22 -07:00

Kunal Khamar 3783539d7a [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.

## What changes were proposed in this pull request?

If the user changes the shuffle partition number between batches, Streaming aggregation will fail.

Here are some possible cases:

- Change "spark.sql.shuffle.partitions"
- Use "repartition" and change the partition number in codes
- RangePartitioner doesn't generate deterministic partitions. Right now it's safe as we disallow sort before aggregation. Not sure if we will add some operators using RangePartitioner in future.

## How was this patch tested?

- Unit tests
- Manual tests
  - forward compatibility tested by using the new `OffsetSeqMetadata` json with Spark v2.1.0

Author: Kunal Khamar <kkhamar@outlook.com>

Closes #17216 from kunalkhamar/num-partitions.

2017-03-17 16:16:22 -07:00

main

[SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.

2017-03-17 16:16:22 -07:00

test

[SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.

2017-03-17 16:16:22 -07:00