spark-instrumented-optimizer

History

Tyson Condie b0a5cd8909 [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires ## What changes were proposed in this pull request? Add a new Kafka Sink and Kafka Relation for writing streaming and batch queries, respectively, to Apache Kafka. ### Streaming Kafka Sink - When addBatch is called -- If batchId is great than the last written batch --- Write batch to Kafka ---- Topic will be taken from the record, if present, or from a topic option, which overrides topic in record. -- Else ignore ### Batch Kafka Sink - KafkaSourceProvider will implement CreatableRelationProvider - CreatableRelationProvider#createRelation will write the passed in Dataframe to a Kafka - Topic will be taken from the record, if present, or from topic option, which overrides topic in record. - Save modes Append and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the schema - write structured streaming aggregation w/o topic field, with default topic: valid stream write with data that does not include a topic field, but the configuration includes a default topic - write data with bad schema: various cases of writing data that does not conform to a proper schema e.g., 1. no topic field or default topic, and 2. no value field - write data with valid schema but wrong types: data with a complete schema but wrong types e.g., key and value types are integers. - write to non-existing topic: write a stream to a topic that does not exist in Kafka, which has been configured to not auto-create topics. - write batch to kafka: simple write batch to Kafka, which goes through the same code path as streaming scenario, so validity checks will not be redone here. ### Examples ```scala // Structured Streaming val writer = inputStringStream.map(s => s.get(0).toString.getBytes()).toDF("value") .selectExpr("value as key", "value as value") .writeStream .format("kafka") .option("checkpointLocation", checkpointDir) .outputMode(OutputMode.Append) .option("kafka.bootstrap.servers", brokerAddress) .option("topic", topic) .queryName("kafkaStream") .start() // Batch val df = spark .sparkContext .parallelize(Seq("1", "2", "3", "4", "5")) .map(v => (topic, v)) .toDF("topic", "value") df.write .format("kafka") .option("kafka.bootstrap.servers",brokerAddress) .option("topic", topic) .save() ``` Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcondie@gmail.com> Closes #17043 from tcondie/kafka-writer.		2017-03-06 16:39:05 -08:00
..
docker	[SPARK-13595][BUILD] Move docker, extras modules into external	2016-03-09 18:27:44 +00:00
docker-integration-tests	[SPARK-19318][SQL] Fix to treat JDBC connection properties specified by the user in case-sensitive manner.	2017-02-14 15:34:12 -08:00
flume	[SPARK-17807][CORE] split test-tags into test-JAR	2016-12-21 16:37:20 -08:00
flume-assembly	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	2016-12-02 21:09:37 -08:00
flume-sink	[MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo	2017-01-04 15:07:29 +00:00
kafka-0-8	[SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due to path and resource-not-closed problems on Windows	2017-01-10 13:19:21 +00:00
kafka-0-8-assembly	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	2016-12-02 21:09:37 -08:00
kafka-0-10	[SPARK-19534][TESTS] Convert Java tests to use lambdas, Java 8 features	2017-02-19 09:42:50 -08:00
kafka-0-10-assembly	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	2016-12-02 21:09:37 -08:00
kafka-0-10-sql	[SPARK-19719][SS] Kafka writer for both structured streaming and batch queires	2017-03-06 16:39:05 -08:00
kinesis-asl	[SPARK-19304][STREAMING][KINESIS] fix kinesis slow checkpoint recovery	2017-03-06 10:41:49 -08:00
kinesis-asl-assembly	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	2016-12-02 21:09:37 -08:00
spark-ganglia-lgpl	[SPARK-18695] Bump master branch version to 2.2.0-SNAPSHOT	2016-12-02 21:09:37 -08:00