spark-instrumented-optimizer/external
Tyson Condie b0a5cd8909 [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires
## What changes were proposed in this pull request?

Add a new Kafka Sink and Kafka Relation for writing streaming and batch queries, respectively, to Apache Kafka.
### Streaming Kafka Sink
- When addBatch is called
-- If batchId is great than the last written batch
--- Write batch to Kafka
---- Topic will be taken from the record, if present, or from a topic option, which overrides topic in record.
-- Else ignore

### Batch Kafka Sink
- KafkaSourceProvider will implement CreatableRelationProvider
- CreatableRelationProvider#createRelation will write the passed in Dataframe to a Kafka
- Topic will be taken from the record, if present, or from topic option, which overrides topic in record.
- Save modes Append and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException

tdas zsxwing

## How was this patch tested?

### The following unit tests will be included
- write to stream with topic field: valid stream write with data that includes an existing topic in the schema
- write structured streaming aggregation w/o topic field, with default topic: valid stream write with data that does not include a topic field, but the configuration includes a default topic
- write data with bad schema: various cases of writing data that does not conform to a proper schema e.g., 1. no topic field or default topic, and 2. no value field
- write data with valid schema but wrong types: data with a complete schema but wrong types e.g., key and value types are integers.
- write to non-existing topic: write a stream to a topic that does not exist in Kafka, which has been configured to not auto-create topics.
- write batch to kafka: simple write batch to Kafka, which goes through the same code path as streaming scenario, so validity checks will not be redone here.

### Examples
```scala
// Structured Streaming
val writer = inputStringStream.map(s => s.get(0).toString.getBytes()).toDF("value")
 .selectExpr("value as key", "value as value")
 .writeStream
 .format("kafka")
 .option("checkpointLocation", checkpointDir)
 .outputMode(OutputMode.Append)
 .option("kafka.bootstrap.servers", brokerAddress)
 .option("topic", topic)
 .queryName("kafkaStream")
 .start()

// Batch
val df = spark
 .sparkContext
 .parallelize(Seq("1", "2", "3", "4", "5"))
 .map(v => (topic, v))
 .toDF("topic", "value")

df.write
 .format("kafka")
 .option("kafka.bootstrap.servers",brokerAddress)
 .option("topic", topic)
 .save()
```
Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: Tyson Condie <tcondie@gmail.com>

Closes #17043 from tcondie/kafka-writer.
2017-03-06 16:39:05 -08:00
..
docker [SPARK-13595][BUILD] Move docker, extras modules into external 2016-03-09 18:27:44 +00:00
docker-integration-tests [SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner. 2017-02-14 15:34:12 -08:00
flume [SPARK-17807][CORE] split test-tags into test-JAR 2016-12-21 16:37:20 -08:00
flume-assembly [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00
flume-sink [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo 2017-01-04 15:07:29 +00:00
kafka-0-8 [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due to path and resource-not-closed problems on Windows 2017-01-10 13:19:21 +00:00
kafka-0-8-assembly [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00
kafka-0-10 [SPARK-19534][TESTS] Convert Java tests to use lambdas, Java 8 features 2017-02-19 09:42:50 -08:00
kafka-0-10-assembly [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00
kafka-0-10-sql [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires 2017-03-06 16:39:05 -08:00
kinesis-asl [SPARK-19304][STREAMING][KINESIS] fix kinesis slow checkpoint recovery 2017-03-06 10:41:49 -08:00
kinesis-asl-assembly [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00
spark-ganglia-lgpl [SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT 2016-12-02 21:09:37 -08:00