[SPARK-19564][SPARK-19559][SS][KAFKA] KafkaOffsetReader's consumers should not be in the same group

## What changes were proposed in this pull request?

In `KafkaOffsetReader`, when error occurs, we abort the existing consumer and create a new consumer. In our current implementation, the first consumer and the second consumer would be in the same group (which leads to SPARK-19559), **_violating our intention of the two consumers not being in the same group._**

The cause is that, in our current implementation, the first consumer is created before `groupId` and `nextId` are initialized in the constructor. Then even if `groupId` and `nextId` are increased during the creation of that first consumer, `groupId` and `nextId` would still be initialized to default values in the constructor for the second consumer.

We should make sure that `groupId` and `nextId` are initialized before any consumer is created.

## How was this patch tested?

Ran 100 times of `KafkaSourceSuite`; all passed

Author: Liwei Lin <lwlin7@gmail.com>

Closes #16902 from lw-lin/SPARK-19564-.
This commit is contained in:
Liwei Lin 2017-02-12 23:00:22 -08:00 committed by Shixiong Zhu
parent bc0a0e6392
commit 2bdbc87052

View file

@ -64,6 +64,13 @@ private[kafka010] class KafkaOffsetReader(
})
val execContext = ExecutionContext.fromExecutorService(kafkaReaderThread)
/**
* Place [[groupId]] and [[nextId]] here so that they are initialized before any consumer is
* created -- see SPARK-19564.
*/
private var groupId: String = null
private var nextId = 0
/**
* A KafkaConsumer used in the driver to query the latest Kafka offsets. This only queries the
* offsets and never commits them.
@ -76,10 +83,6 @@ private[kafka010] class KafkaOffsetReader(
private val offsetFetchAttemptIntervalMs =
readerOptions.getOrElse("fetchOffset.retryIntervalMs", "1000").toLong
private var groupId: String = null
private var nextId = 0
private def nextGroupId(): String = {
groupId = driverGroupIdPrefix + "-" + nextId
nextId += 1