ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Huaxin Gao	46be1e01e9	[SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF ### What changes were proposed in this pull request? Add a SQL example for UDAF ### Why are the changes needed? To make SQL Reference complete ### Does this PR introduce any user-facing change? Yes. Add the following page, also change ```Sql``` to ```SQL``` in the example tab for all the sql examples. <img width="1110" alt="Screen Shot 2020-04-13 at 6 09 24 PM" src="https://user-images.githubusercontent.com/13592258/79175240-06cd7400-7db2-11ea-8f3e-af71a591a64b.png"> ### How was this patch tested? Manually build and check Closes #28209 from huaxingao/udf_followup. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-14 13:29:44 +09:00
yi.wu	5983ad9cc4	[SPARK-30506][SQL][DOC] Document for generic file source options/configs ### What changes were proposed in this pull request? Add a new document page named Generic File Source Options for Data Sources menu and added following sub items: * spark.sql.files.ignoreCorruptFiles * spark.sql.files.ignoreMissingFiles * pathGlobFilter * recursiveFileLookup And here're snapshots of the generated document: <img width="1080" alt="doc-1" src="https://user-images.githubusercontent.com/16397174/73816825-87a54800-4824-11ea-97da-e5c40c59a7d4.png"> <img width="1081" alt="doc-2" src="https://user-images.githubusercontent.com/16397174/73816827-8a07a200-4824-11ea-99ec-9c8b0286625e.png"> <img width="1080" alt="doc-3" src="https://user-images.githubusercontent.com/16397174/73816831-8c69fc00-4824-11ea-84f0-6c9e94c2f0e2.png"> <img width="1081" alt="doc-4" src="https://user-images.githubusercontent.com/16397174/73816834-8f64ec80-4824-11ea-9355-76ad45476634.png"> ### Why are the changes needed? Better guidance for end-user. ### Does this PR introduce any user-facing change? No, added in Spark 3.0. ### How was this patch tested? Pass Jenkins. Closes #27302 from Ngone51/doc-generic-file-source-option. Lead-authored-by: yi.wu <yi.wu@databricks.com> Co-authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-02-05 17:16:38 +08:00
Gengliang Wang	78a403fab9	[SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources ## What changes were proposed in this pull request? ### Background: The data source option `pathGlobFilter` is introduced for Binary file format: https://github.com/apache/spark/pull/24354 , which can be used for filtering file names, e.g. reading `.png` files only while there is `.json` files in the same directory. ### Proposal: Make the option `pathGlobFilter` as a general option for all file sources. The path filtering should happen in the path globbing on Driver. ### Motivation: Filtering the file path names in file scan tasks on executors is kind of ugly. ### Impact: 1. The splitting of file partitions will be more balanced. 2. The metrics of file scan will be more accurate. 3. Users can use the option for reading other file sources. ## How was this patch tested? Unit tests Closes #24518 from gengliangwang/globFilter. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2019-05-09 08:41:43 +09:00
Sean Owen	754f820035	[SPARK-26918][DOCS] All .md should have ASF license header ## What changes were proposed in this pull request? Add AL2 license to metadata of all .md files. This seemed to be the tidiest way as it will get ignored by .md renderers and other tools. Attempts to write them as markdown comments revealed that there is no such standard thing. ## How was this patch tested? Doc build Closes #24243 from srowen/SPARK-26918. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-03-30 19:49:45 -05:00
Peter G. Horvath	653d1bc232	[SPARK-26835][DOCS] Notes API documentation for available options of Data sources in SparkSQL guide ## What changes were proposed in this pull request? This PR proposes to add some pointers of available options of Data source in Spark SQL guide. ## How was this patch tested? N/A: documentation change Closes #23742 from peter-gergely-horvath/SPARK-26835. Authored-by: Peter G. Horvath <peter.gergely.horvath@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2019-02-13 08:02:51 -06:00
Keiji Yoshida	de42281527	[MINOR][DOCS][WIP] Fix Typos ## What changes were proposed in this pull request? Fix Typos. ## How was this patch tested? NA Closes #23145 from kjmrknsn/docUpdate. Authored-by: Keiji Yoshida <kjmrknsn@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	2018-11-29 10:39:00 -06:00
Dongjoon Hyun	4506dad8a9	[SPARK-25656][SQL][DOC][EXAMPLE] Add a doc and examples about extra data source options ## What changes were proposed in this pull request? Our current doc does not explain how we are passing the data source specific options to the underlying data source. According to [the review comment](https://github.com/apache/spark/pull/22622#discussion_r222911529), this PR aims to add more detailed information and examples ## How was this patch tested? Manual. Closes #22801 from dongjoon-hyun/SPARK-25656. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2018-10-23 12:41:20 -07:00
Yuanjian Li	987f386588	[SPARK-24499][SQL][DOC] Split the page of sql-programming-guide.html to multiple separate pages ## What changes were proposed in this pull request? 1. Split the main page of sql-programming-guide into 7 parts: - Getting Started - Data Sources - Performance Turing - Distributed SQL Engine - PySpark Usage Guide for Pandas with Apache Arrow - Migration Guide - Reference 2. Add left menu for sql-programming-guide, keep first level index for each part in the menu. ![image](https://user-images.githubusercontent.com/4833765/47016859-6332e180-d183-11e8-92e8-ce62518a83c4.png) ## How was this patch tested? Local test with jekyll build/serve. Closes #22746 from xuanyuanking/SPARK-24499. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: gatorsmile <gatorsmile@gmail.com>	2018-10-18 11:59:06 -07:00

8 commits