spark-instrumented-optimizer

History

Michael Armbrust caea152145 [SPARK-13985][SQL] Deterministic batches with ids This PR relaxes the requirements of a `Sink` for structured streaming to only require idempotent appending of data. Previously the `Sink` needed to be able to transactionally append data while recording an opaque offset indicated how far in a stream we have processed. In order to do this, a new write-ahead-log has been added to stream execution, which records the offsets that will are present in each batch. The log is created in the newly added `checkpointLocation`, which defaults to `${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden by setting `checkpointLocation` in `DataFrameWriter`. In addition to making sinks easier to write the addition of batchIds and a checkpoint location is done in anticipation of integration with the the `StateStore` (#11645). Author: Michael Armbrust <michael@databricks.com> Closes #11804 from marmbrus/batchIds.	2016-03-22 10:18:42 -07:00
..
src	[SPARK-13985][SQL] Deterministic batches with ids	2016-03-22 10:18:42 -07:00
pom.xml	[SPARK-13780][SQL] Add missing dependency to build.	2016-03-11 10:27:38 -08:00

Michael Armbrust caea152145 [SPARK-13985][SQL] Deterministic batches with ids

This PR relaxes the requirements of a `Sink` for structured streaming to only require idempotent appending of data.  Previously the `Sink` needed to be able to transactionally append data while recording an opaque offset indicated how far in a stream we have processed.

In order to do this, a new write-ahead-log has been added to stream execution, which records the offsets that will are present in each batch.  The log is created in the newly added `checkpointLocation`, which defaults to `${spark.sql.streaming.checkpointLocation}/${queryName}` but can be overriden by setting `checkpointLocation` in `DataFrameWriter`.

In addition to making sinks easier to write the addition of batchIds and a checkpoint location is done in anticipation of integration with the the `StateStore` (#11645).

Author: Michael Armbrust <michael@databricks.com>

Closes #11804 from marmbrus/batchIds.

2016-03-22 10:18:42 -07:00

src

[SPARK-13985][SQL] Deterministic batches with ids

2016-03-22 10:18:42 -07:00

pom.xml

[SPARK-13780][SQL] Add missing dependency to build.

2016-03-11 10:27:38 -08:00