[MINOR][DOC][SS] Correct description of minPartitions in Kafka option

## What changes were proposed in this pull request? `minPartitions` has been used as a hint and relevant method (KafkaOffsetRangeCalculator.getRanges) doesn't guarantee the behavior that partitions will be equal or more than given value. d67b98ea01/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala (L32-L46) This patch makes clear the configuration is a hint, and actual partitions could be less or more. ## How was this patch tested? Just a documentation change. Closes #25332 from HeartSaVioR/MINOR-correct-kafka-structured-streaming-doc-minpartition. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-08-02 09:12:54 -07:00 · 2019-08-02 09:12:54 -07:00 · 7ffc00ccc3
parent b148bd5ccb
commit 7ffc00ccc3
1 changed files with 4 additions and 2 deletions
--- a/docs/structured-streaming-kafka-integration.md
+++ b/docs/structured-streaming-kafka-integration.md
@ -393,10 +393,12 @@ The following configurations are optional:
  <td>int</td>
  <td>none</td>
  <td>streaming and batch</td>
-  <td>Minimum number of partitions to read from Kafka.
+  <td>Desired minimum number of partitions to read from Kafka.
  By default, Spark has a 1-1 mapping of topicPartitions to Spark partitions consuming from Kafka.
  If you set this option to a value greater than your topicPartitions, Spark will divvy up large
-  Kafka partitions to smaller pieces.</td>
+  Kafka partitions to smaller pieces. Please note that this configuration is like a `hint`: the
+  number of Spark tasks will be **approximately** `minPartitions`. It can be less or more depending on
+  rounding errors or Kafka partitions that didn't receive any new data.</td>
 </tr>
 <tr>
  <td>groupIdPrefix</td>