## What changes were proposed in this pull request?
In structured streaming, Spark does not report errors when the specified directory does not exist. This is a behavior different from the batch mode. This patch changes the behavior to fail if the directory does not exist (when the path is not a glob pattern).
## How was this patch tested?
Updated unit tests to reflect the new behavior.
Author: Reynold Xin <rxin@databricks.com>
Closes#14002 from rxin/SPARK-16335.
## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad behavior because it can mask legitimate errors and the resulting plan will silently have 0 rows. This patch changes it to not silently drop the errors.
## How was this patch tested?
Manually verified.
Author: Reynold Xin <rxin@databricks.com>
Closes#13987 from rxin/SPARK-16313.
## What changes were proposed in this pull request?
- Moved DataStreamReader/Writer from pyspark.sql to pyspark.sql.streaming to make them consistent with scala packaging
- Exposed the necessary classes in sql.streaming package so that they appear in the docs
- Added pyspark.sql.streaming module to the docs
## How was this patch tested?
- updated unit tests.
- generated docs for testing visibility of pyspark.sql.streaming classes.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#13955 from tdas/SPARK-16266.
Renamed for simplicity, so that its obvious that its related to streaming.
Existing unit tests.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#13673 from tdas/SPARK-15953.
## What changes were proposed in this pull request?
A follow up PR for #13655 to fix a wrong format tag.
## How was this patch tested?
Jenkins unit tests.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#13665 from zsxwing/fix.
## What changes were proposed in this pull request?
Currently, the DataFrameReader/Writer has method that are needed for streaming and non-streaming DFs. This is quite awkward because each method in them through runtime exception for one case or the other. So rather having half the methods throw runtime exceptions, its just better to have a different reader/writer API for streams.
- [x] Python API!!
## How was this patch tested?
Existing unit tests + two sets of unit tests for DataFrameReader/Writer and DataStreamReader/Writer.
Author: Tathagata Das <tathagata.das1565@gmail.com>
Closes#13653 from tdas/SPARK-15933.
## What changes were proposed in this pull request?
This PR just enables tests for sql/streaming.py and also fixes the failures.
## How was this patch tested?
Existing unit tests.
Author: Shixiong Zhu <shixiong@databricks.com>
Closes#13655 from zsxwing/python-streaming-test.
## What changes were proposed in this pull request?
`an -> a`
Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one.
## How was this patch tested?
manual tests
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#13515 from zhengruifeng/an_a.
## What changes were proposed in this pull request?
This patch moves all user-facing structured streaming classes into sql.streaming. As part of this, I also added some since version annotation to methods and classes that don't have them.
## How was this patch tested?
Updated tests to reflect the moves.
Author: Reynold Xin <rxin@databricks.com>
Closes#13429 from rxin/SPARK-15686.
## What changes were proposed in this pull request?
See title.
## How was this patch tested?
PySpark tests.
Author: Andrew Or <andrew@databricks.com>
Closes#12917 from andrewor14/deprecate-hive-context-python.
## What changes were proposed in this pull request?
This PR adds Python APIs for:
- `ContinuousQueryManager`
- `ContinuousQueryException`
The `ContinuousQueryException` is a very basic wrapper, it doesn't provide the functionality that the Scala side provides, but it follows the same pattern for `AnalysisException`.
For `ContinuousQueryManager`, all APIs are provided except for registering listeners.
This PR also attempts to fix test flakiness by stopping all active streams just before tests.
## How was this patch tested?
Python Doc tests and unit tests
Author: Burak Yavuz <brkyvz@gmail.com>
Closes#12673 from brkyvz/pyspark-cqm.
## What changes were proposed in this pull request?
This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:
- ContinuousQuery
- Trigger
- ProcessingTime
in pyspark under `pyspark.sql.streaming`.
In addition, it contains the new methods added under:
- `DataFrameWriter`
a) `startStream`
b) `trigger`
c) `queryName`
- `DataFrameReader`
a) `stream`
- `DataFrame`
a) `isStreaming`
This PR doesn't contain all methods exposed for `ContinuousQuery`, for example:
- `exception`
- `sourceStatuses`
- `sinkStatus`
They may be added in a follow up.
This PR also contains some very minor doc fixes in the Scala side.
## How was this patch tested?
Python doc tests
TODO:
- [ ] verify Python docs look good
Author: Burak Yavuz <brkyvz@gmail.com>
Author: Burak Yavuz <burak@databricks.com>
Closes#12320 from brkyvz/stream-python.