spark-instrumented-optimizer/python/pyspark/sql
Tathagata Das 90b11439b3 [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming
## What changes were proposed in this pull request?
Currently structured streaming only supports append output mode.  This PR adds the following.

- Added support for Complete output mode in the internal state store, analyzer and planner.
- Added public API in Scala and Python for users to specify output mode
- Added checks for unsupported combinations of output mode and DF operations
  - Plans with no aggregation should support only Append mode
  - Plans with aggregation should support only Update and Complete modes
  - Default output mode is Append mode (**Question: should we change this to automatically set to Complete mode when there is aggregation?**)
- Added support for Complete output mode in Memory Sink. So Memory Sink internally supports append and complete, update. But from public API only Complete and Append output modes are supported.

## How was this patch tested?
Unit tests in various test suites
- StreamingAggregationSuite: tests for complete mode
- MemorySinkSuite: tests for checking behavior in Append and Complete modes.
- UnsupportedOperationSuite: tests for checking unsupported combinations of DF ops and output modes
- DataFrameReaderWriterSuite: tests for checking that output mode cannot be called on static DFs
- Python doc test and existing unit tests modified to call write.outputMode.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #13286 from tdas/complete-mode.
2016-05-31 15:57:01 -07:00
..
__init__.py [SPARK-14945][PYTHON] SparkSession Python API 2016-04-28 10:55:48 -07:00
catalog.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
column.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-15075][SPARK-15345][SQL] Clean up SparkSession builder and propagate config options to existing sessions if specified 2016-05-19 21:53:26 -07:00
dataframe.py [SPARK-15392][SQL] fix default value of size estimation of logical plan 2016-05-19 12:12:42 -07:00
functions.py [MINOR] Fix Typos 'a -> an' 2016-05-26 22:39:14 -07:00
group.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
readwriter.py [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming 2016-05-31 15:57:01 -07:00
session.py [SPARK-15520][SQL] Also set sparkContext confs when using SparkSession builder in pyspark 2016-05-26 12:05:47 -07:00
streaming.py [SPARK-14896][SQL] Deprecate HiveContext in python 2016-05-04 17:39:30 -07:00
tests.py [SPARK-15517][SQL][STREAMING] Add support for complete output mode in Structure Streaming 2016-05-31 15:57:01 -07:00
types.py [SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not actually test with unicode column name 2016-05-18 11:18:33 -07:00
utils.py [SPARK-14603][SQL][FOLLOWUP] Verification of Metadata Operations by Session Catalog 2016-05-19 11:46:11 -07:00
window.py [SPARK-14058][PYTHON] Incorrect docstring in Window.order 2016-03-21 23:52:33 -07:00