spark-instrumented-optimizer/external
Gabor Somogyi 3641c3dd69 [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer
### What changes were proposed in this pull request?

Kafka producers are now closed when `spark.kafka.producer.cache.timeout` reached which could be significant problem when processing big SQL queries. The workaround was to increase `spark.kafka.producer.cache.timeout` to a number where the biggest SQL query can be finished.

In this PR I've adapted similar solution which already exists on the consumer side, namely applies Apache Commons Pool on the producer side as well. Main advantages choosing this solution:
* Producers are not closed until they're in use
* No manual reference counting needed (which may be error prone)
* Thread-safe by design
* Provides jmx connection to the pool where metrics can be fetched

What this PR contains:
* Introduced producer side parameters to configure pool
* Renamed `InternalKafkaConsumerPool` to `InternalKafkaConnectorPool` and made it abstract
* Created 2 implementations from it: `InternalKafkaConsumerPool` and `InternalKafkaProducerPool`
* Adapted `CachedKafkaProducer` to use `InternalKafkaProducerPool`
* Changed `KafkaDataWriter` and `KafkaDataWriteTask` to release producer even in failure scenario
* Added several new tests
* Extended `KafkaTest` to clear not only producers but consumers as well
* Renamed `InternalKafkaConsumerPoolSuite` to `InternalKafkaConnectorPoolSuite` where only consumer tests are checking the behavior (please see comment for reasoning)

What this PR not yet contains(but intended when the main concept is stable):
* User facing documentation

### Why are the changes needed?
Kafka producer closed after 10 minutes (with default settings).

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Existing + additional unit tests.
Cluster tests being started.

Closes #25853 from gaborgsomogyi/SPARK-21869.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
2019-11-07 17:06:32 -08:00
..
avro [SPARK-29757][SQL] Move calendar interval constants together 2019-11-07 19:48:19 +08:00
docker [SPARK-28683][BUILD] Upgrade Scala to 2.12.10 2019-09-18 13:30:36 -07:00
docker-integration-tests Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00
kafka-0-10 [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer 2019-11-07 17:06:32 -08:00
kafka-0-10-assembly Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00
kafka-0-10-sql [SPARK-21869][SS] Apply Apache Commons Pool to Kafka producer 2019-11-07 17:06:32 -08:00
kafka-0-10-token-provider Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00
kinesis-asl Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00
kinesis-asl-assembly Revert "Prepare Spark release v3.0.0-preview-rc2" 2019-10-30 17:45:44 -07:00
spark-ganglia-lgpl [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ 2019-11-03 15:13:06 -08:00