spark-instrumented-optimizer

History

Dongjoon Hyun 9962390af7 [SPARK-22781][SS] Support creating streaming dataset with ORC files ## What changes were proposed in this pull request? Like `Parquet`, users can use `ORC` with Apache Spark structured streaming. This PR adds `orc()` to `DataStreamReader`(Scala/Python) in order to support creating streaming dataset with ORC file format more easily like the other file formats. Also, this adds a test coverage for ORC data source and updates the document. BEFORE ```scala scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start() <console>:24: error: value orc is not a member of org.apache.spark.sql.streaming.DataStreamReader spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start() ``` AFTER ```scala scala> spark.readStream.schema("a int").orc("/tmp/orc_ss").writeStream.format("console").start() res0: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper678b3746 scala> ------------------------------------------- Batch: 0 ------------------------------------------- +---+ \| a\| +---+ \| 1\| +---+ ``` ## How was this patch tested? Pass the newly added test cases. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #19975 from dongjoon-hyun/SPARK-22781.		2017-12-19 23:50:06 -08:00
..
__init__.py	[SPARK-22369][PYTHON][DOCS] Exposes catalog API documentation in PySpark	2017-11-02 15:22:52 +01:00
catalog.py	[SPARK-22409] Introduce function type argument in pandas_udf	2017-11-17 16:43:08 +01:00
column.py	[SPARK-19165][PYTHON][SQL] PySpark APIs using columns as arguments should validate input types for column	2017-08-24 20:29:03 +09:00
conf.py	[SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code	2016-05-23 18:14:48 -07:00
context.py	[SPARK-20586][SQL] Add deterministic to ScalaUDF	2017-07-25 17:19:44 -07:00
dataframe.py	[SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dataset API	2017-12-19 20:47:12 -08:00
functions.py	[SPARK-22829] Add new built-in function date_trunc()	2017-12-19 20:22:33 -08:00
group.py	[SPARK-22409] Introduce function type argument in pandas_udf	2017-11-17 16:43:08 +01:00
readwriter.py	[SPARK-16496][SQL] Add wholetext as option for reading text in SQL.	2017-12-14 11:19:34 -08:00
session.py	[SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone	2017-11-28 16:45:22 +08:00
streaming.py	[SPARK-22781][SS] Support creating streaming dataset with ORC files	2017-12-19 23:50:06 -08:00
tests.py	[SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone	2017-11-28 16:45:22 +08:00
types.py	[SPARK-22395][SQL][PYTHON] Fix the behavior of timestamp values for Pandas to respect session timezone	2017-11-28 16:45:22 +08:00
udf.py	[SPARK-22409] Introduce function type argument in pandas_udf	2017-11-17 16:43:08 +01:00
utils.py	[MINOR][DOCS] Remove consecutive duplicated words/typo in Spark Repo	2017-01-04 15:07:29 +00:00
window.py	[SPARK-18690][PYTHON][SQL] Backward compatibility of unbounded frames	2016-12-02 17:39:28 -08:00