diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md index 9a6e302656..339792b413 100644 --- a/docs/structured-streaming-kafka-integration.md +++ b/docs/structured-streaming-kafka-integration.md @@ -393,10 +393,12 @@ The following configurations are optional: int none streaming and batch - Minimum number of partitions to read from Kafka. + Desired minimum number of partitions to read from Kafka. By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions consuming from Kafka. If you set this option to a value greater than your topicPartitions, Spark will divvy up large - Kafka partitions to smaller pieces. + Kafka partitions to smaller pieces. Please note that this configuration is like a `hint`: the + number of Spark tasks will be **approximately** `minPartitions`. It can be less or more depending on + rounding errors or Kafka partitions that didn't receive any new data. groupIdPrefix