ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
William Hyun	99f6f7f8f8	[SPARK-36657][SQL] Update comment in 'gen-sql-config-docs.py' ### What changes were proposed in this pull request? This PR aims to update comments in `gen-sql-config-docs.py`. ### Why are the changes needed? To make it up to date according to Spark version 3.2.0 release. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A. Closes #33902 from williamhyun/fixtool. Authored-by: William Hyun <william@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit `b72fa5ef1c`) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2021-09-02 18:51:10 -07:00
Hyukjin Kwon	20750a3f9e	[SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception ### What changes were proposed in this pull request? This PR proposes to use a proper built-in exceptions instead of the plain `Exception` in Python. While I am here, I fixed another minor issue at `DataFrams.schema` together: ```diff - except AttributeError as e: - raise Exception( - "Unable to parse datatype from schema. %s" % e) + except Exception as e: + raise ValueError( + "Unable to parse datatype from schema. %s" % e) from e ``` Now it catches all exceptions during schema parsing, chains the exception with `ValueError`. Previously it only caught `AttributeError` that does not catch all cases. ### Why are the changes needed? For users to expect the proper exceptions. ### Does this PR introduce _any_ user-facing change? Yeah, the exception classes became different but should be compatible because previous exception was plain `Exception` which other exceptions inherit. ### How was this patch tested? Existing unittests should cover, Closes #31238 Closes #32650 from HyukjinKwon/SPARK-32194. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>	2021-05-26 11:54:40 +09:00
Kent Yao	5ba467ca1d	[SPARK-31550][SQL][DOCS] Set nondeterministic configurations with general meanings in sql configuration doc ### What changes were proposed in this pull request? ```scala spark.sql.session.timeZone spark.sql.warehouse.dir ``` these 2 configs are nondeterministic and vary with environments Besides, reflect code in `gen-sql-config-docs.py` via https://github.com/apache/spark/pull/28274#discussion_r412893096 and `configuration.md` via https://github.com/apache/spark/pull/28274#discussion_r412894905 ### Why are the changes needed? doc fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? verify locally ![image](https://user-images.githubusercontent.com/8326978/80179099-5e7da200-8632-11ea-803f-d47a93151869.png) Closes #28322 from yaooqinn/SPARK-31550. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-27 17:08:52 +09:00
Kent Yao	2c2062ea7c	[SPARK-31498][SQL][DOCS] Dump public static sql configurations through doc generation ### What changes were proposed in this pull request? Currently, only the non-static public SQL configurations are dump to public doc, we'd better also add those static public ones as the command `set -v` This PR force call StaticSQLConf to buildStaticConf. ### Why are the changes needed? Fix missing SQL configurations in doc ### Does this PR introduce any user-facing change? NO ### How was this patch tested? add unit test and verify locally to see if public static SQL conf is in `docs/sql-config.html` Closes #28274 from yaooqinn/SPARK-31498. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-04-22 10:16:39 +00:00
Takeshi Yamamuro	e42dbe7cd4	[SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions ### What changes were proposed in this pull request? This PR intends to add a Python script to generates a SQL document for built-in functions and the document in SQL references. ### Why are the changes needed? To make SQL references complete. ### Does this PR introduce any user-facing change? Yes; ![a](https://user-images.githubusercontent.com/692303/79406712-c39e1b80-7fd2-11ea-8b85-9f9cbb6efed3.png) ![b](https://user-images.githubusercontent.com/692303/79320526-eb46a280-7f44-11ea-8639-90b1fb2b8848.png) ![c](https://user-images.githubusercontent.com/692303/79320707-3365c500-7f45-11ea-9984-69ffe800fb87.png) ### How was this patch tested? Manually checked and added tests. Closes #28224 from maropu/SPARK-31429. Lead-authored-by: Takeshi Yamamuro <yamamuro@apache.org> Co-authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-04-21 10:55:13 +09:00
beliefer	59d6d5cbb0	[SPARK-30840][CORE][SQL] Add version property for ConfigEntry and ConfigBuilder ### What changes were proposed in this pull request? Spark `ConfigEntry` and `ConfigBuilder` missing Spark version information of each configuration at release. This is not good for Spark user when they visiting the page of spark configuration. http://spark.apache.org/docs/latest/configuration.html The new Spark SQL config docs looks like: ![sql配置截屏](https://user-images.githubusercontent.com/8486025/74604522-cb882f00-50f9-11ea-8683-57a90f9e3347.png) ``` > SET -v spark.sql.adaptive.enabled false When true, enable adaptive query execution. spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin 0.2 The relation with a non-empty partition ratio lower than this config will not be considered as the build side of a broadcast-hash join in adaptive execution regardless of its size.This configuration only has an effect when 'spark.sql.adaptive.enabled' is enabled. spark.sql.adaptive.optimizeSkewedJoin.enabled true When true and adaptive execution is enabled, a skewed join is automatically handled at runtime. spark.sql.adaptive.optimizeSkewedJoin.skewedPartitionFactor 10 A partition is considered as a skewed partition if its size is larger than this factor multiple the median partition size and also larger than spark.sql.adaptive.optimizeSkewedJoin.skewedPartitionSizeThreshold spark.sql.adaptive.optimizeSkewedJoin.skewedPartitionMaxSplits 5 Configures the maximum number of task to handle a skewed partition in adaptive skewedjoin. spark.sql.adaptive.optimizeSkewedJoin.skewedPartitionSizeThreshold 64MB Configures the minimum size in bytes for a partition that is considered as a skewed partition in adaptive skewed join. spark.sql.adaptive.shuffle.fetchShuffleBlocksInBatch.enabled true Whether to fetch the continuous shuffle blocks in batch. Instead of fetching blocks one by one, fetching continuous shuffle blocks for the same map task in batch can reduce IO and improve performance. Note, multiple continuous blocks exist in single fetch request only happen when 'spark.sql.adaptive.enabled' and 'spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled' is enabled, this feature also depends on a relocatable serializer, the concatenation support codec in use and the new version shuffle fetch protocol. spark.sql.adaptive.shuffle.localShuffleReader.enabled true When true and 'spark.sql.adaptive.enabled' is enabled, this enables the optimization of converting the shuffle reader to local shuffle reader for the shuffle exchange of the broadcast hash join in probe side. spark.sql.adaptive.shuffle.maxNumPostShufflePartitions <undefined> The advisory maximum number of post-shuffle partitions used in adaptive execution. This is used as the initial number of pre-shuffle partitions. By default it equals to spark.sql.shuffle.partitions. This configuration only has an effect when 'spark.sql.adaptive.enabled' and 'spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled' is enabled. ``` Note: Because there are so many configuration items that are exposed and require a lot of finishing, I will add the version numbers of these configuration items in another PR. ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Exists UT Closes #27592 from beliefer/add-version-to-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-22 09:46:42 +09:00
Nicholas Chammas	339c0f9a62	[SPARK-30510][SQL][DOCS] Publicly document Spark SQL configuration options ### What changes were proposed in this pull request? This PR adds a doc builder for Spark SQL's configuration options. Here's what the new Spark SQL config docs look like ([configuration.html.zip](https://github.com/apache/spark/files/4172109/configuration.html.zip)): ![Screen Shot 2020-02-07 at 12 13 23 PM](https://user-images.githubusercontent.com/1039369/74050007-425b5480-49a3-11ea-818c-42700c54d1fb.png) Compare this to the [current docs](http://spark.apache.org/docs/3.0.0-preview2/configuration.html#spark-sql): ![Screen Shot 2020-02-04 at 4 55 10 PM](https://user-images.githubusercontent.com/1039369/73790828-24a5a980-476f-11ea-998c-12cd613883e8.png) ### Why are the changes needed? There is no visibility into the various Spark SQL configs on [the config docs page](http://spark.apache.org/docs/3.0.0-preview2/configuration.html#spark-sql). ### Does this PR introduce any user-facing change? No, apart from new documentation. ### How was this patch tested? I tested this manually by building the docs and reviewing them in my browser. Closes #27459 from nchammas/SPARK-30510-spark-sql-options. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-02-09 19:20:47 +09:00

7 commits