spark-instrumented-optimizer/python/pyspark/sql
Dongjoon Hyun 9962390af7 [SPARK-22781][SS] Support creating streaming dataset with ORC files
## What changes were proposed in this pull request?

Like `Parquet`, users can use `ORC` with Apache Spark structured streaming. This PR adds `orc()` to `DataStreamReader`(Scala/Python) in order to support creating streaming dataset with ORC file format more easily like the other file formats. Also, this adds a test coverage for ORC data source and updates the document.

**BEFORE**

```scala
scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()
<console>:24: error: value orc is not a member of org.apache.spark.sql.streaming.DataStreamReader
       spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()
```

**AFTER**
```scala
scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start()
res0: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper678b3746

scala>
-------------------------------------------
Batch: 0
-------------------------------------------
+---+
|  a|
+---+
|  1|
+---+
```

## How was this patch tested?

Pass the newly added test cases.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #19975 from dongjoon-hyun/SPARK-22781.
2017-12-19 23:50:06 -08:00
..
__init__.py [SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark 2017-11-02 15:22:52 +01:00
catalog.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
column.py [SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column 2017-08-24 20:29:03 +09:00
conf.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
context.py [SPARK-20586][SQL] Add deterministic to ScalaUDF 2017-07-25 17:19:44 -07:00
dataframe.py [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dataset API 2017-12-19 20:47:12 -08:00
functions.py [SPARK-22829] Add new built-in function date_trunc() 2017-12-19 20:22:33 -08:00
group.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
readwriter.py [SPARK-16496][SQL] Add wholetext as option for reading text in SQL. 2017-12-14 11:19:34 -08:00
session.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
streaming.py [SPARK-22781][SS] Support creating streaming dataset with ORC files 2017-12-19 23:50:06 -08:00
tests.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
types.py [SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone 2017-11-28 16:45:22 +08:00
udf.py [SPARK-22409] Introduce function type argument in pandas_udf 2017-11-17 16:43:08 +01:00
utils.py [MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo 2017-01-04 15:07:29 +00:00
window.py [SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames 2016-12-02 17:39:28 -08:00