ODIn/spark-instrumented-optimizer

Author	SHA1	Message	Date
Qianyang Yu	0f54dc7c03	[SPARK-30962][SQL][DOC] Documentation for Alter table command phase 2 ### What changes were proposed in this pull request? ### Why are the changes needed? Based on [JIRA 30962](https://issues.apache.org/jira/browse/SPARK-30962), we want to add all the support `Alter Table` syntax for V1 table. ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? Before: The documentation looks like [Alter Table](https://github.com/apache/spark/pull/25590) After: <img width="850" alt="Screen Shot 2020-03-03 at 2 02 23 PM" src="https://user-images.githubusercontent.com/7550280/75824837-168c7e00-5d59-11ea-9751-d1dab0f5a892.png"> <img width="977" alt="Screen Shot 2020-03-03 at 2 02 41 PM" src="https://user-images.githubusercontent.com/7550280/75824859-21dfa980-5d59-11ea-8b49-3adf6eb55fc6.png"> <img width="1028" alt="Screen Shot 2020-03-03 at 2 02 59 PM" src="https://user-images.githubusercontent.com/7550280/75824884-2e640200-5d59-11ea-81ef-d77d0a8efee2.png"> <img width="864" alt="Screen Shot 2020-03-03 at 2 03 14 PM" src="https://user-images.githubusercontent.com/7550280/75824910-39b72d80-5d59-11ea-84d0-bffa2499f086.png"> <img width="823" alt="Screen Shot 2020-03-03 at 2 03 28 PM" src="https://user-images.githubusercontent.com/7550280/75824937-45a2ef80-5d59-11ea-932c-314924856834.png"> <img width="811" alt="Screen Shot 2020-03-03 at 2 03 42 PM" src="https://user-images.githubusercontent.com/7550280/75824965-4cc9fd80-5d59-11ea-815b-8c1ebad310b1.png"> <img width="827" alt="Screen Shot 2020-03-03 at 2 03 53 PM" src="https://user-images.githubusercontent.com/7550280/75824978-518eb180-5d59-11ea-8a55-2fa26376b9c1.png"> <img width="783" alt="Screen Shot 2020-03-03 at 2 04 03 PM" src="https://user-images.githubusercontent.com/7550280/75825001-5bb0b000-5d59-11ea-8dd9-dcfbfa1b4330.png"> Notes: Those syntaxes are not supported by v1 Table. - `ALTER TABLE .. RENAME COLUMN` - `ALTER TABLE ... DROP (COLUMN \| COLUMNS)` - `ALTER TABLE ... (ALTER \| CHANGE) COLUMN? alterColumnAction` only support change comments, not other actions: `datatype, position, (SET \| DROP) NOT NULL` - `ALTER TABLE .. CHANGE COLUMN?` - `ALTER TABLE .... REPLACE COLUMNS` - `ALTER TABLE ... RECOVER PARTITIONS` - Closes #27779 from kevinyu98/spark-30962-alterT. Authored-by: Qianyang Yu <qyu@us.ibm.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-11 08:47:30 +09:00
yi.wu	34be83e08b	[SPARK-31037][SQL][FOLLOW-UP] Replace legacy ReduceNumShufflePartitions with CoalesceShufflePartitions in comment ### What changes were proposed in this pull request? Replace legacy `ReduceNumShufflePartitions` with `CoalesceShufflePartitions` in comment. ### Why are the changes needed? Rule `ReduceNumShufflePartitions` has renamed to `CoalesceShufflePartitions`, we should update related comment as well. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A. Closes #27865 from Ngone51/spark_31037_followup. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-10 11:09:36 -07:00
Kent Yao	3bd6ebff81	[SPARK-30189][SQL] Interval from year-month/date-time string should handle whitespaces ### What changes were proposed in this pull request? Currently, we parse interval from multi units strings or from date-time/year-month pattern strings, the former handles all whitespace, the latter not or even spaces. ### Why are the changes needed? behavior consistency ### Does this PR introduce any user-facing change? yes, interval in date-time/year-month like ``` select interval '\n-\t10\t 12:34:46.789\t' day to second -- !query 126 schema struct<INTERVAL '-10 days -12 hours -34 minutes -46.789 seconds':interval> -- !query 126 output -10 days -12 hours -34 minutes -46.789 seconds ``` is valid now. ### How was this patch tested? add ut. Closes #26815 from yaooqinn/SPARK-30189. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-10 22:08:58 +08:00
Terry Kim	294f6056fa	[SPARK-31078][SQL] Respect aliases in output ordering ### What changes were proposed in this pull request? Currently, in the following scenario, an unnecessary `Sort` node is introduced: ```scala withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") { val df = (0 until 20).toDF("i").as("df") df.repartition(8, df("i")).write.format("parquet") .bucketBy(8, "i").sortBy("i").saveAsTable("t") val t1 = spark.table("t") val t2 = t1.selectExpr("i as ii") t1.join(t2, t1("i") === t2("ii")).explain } ``` ``` == Physical Plan == (3) SortMergeJoin [i#8], [ii#10], Inner :- (1) Project [i#8] : +- (1) Filter isnotnull(i#8) : +- (1) ColumnarToRow : +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8 +- (2) Sort [ii#10 ASC NULLS FIRST], false, 0 <==== UNNECESSARY +- (2) Project [i#8 AS ii#10] +- (2) Filter isnotnull(i#8) +- (2) ColumnarToRow +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8 ``` Notice that `Sort [ii#10 ASC NULLS FIRST], false, 0` is introduced even though the underlying data is already sorted. This is because `outputOrdering` doesn't handle aliases correctly. This PR proposes to fix this issue. ### Why are the changes needed? To better handle aliases in `outputOrdering`. ### Does this PR introduce any user-facing change? Yes, now with the fix, the `explain` prints out the following: ``` == Physical Plan == (3) SortMergeJoin [i#8], [ii#10], Inner :- (1) Project [i#8] : +- (1) Filter isnotnull(i#8) : +- (1) ColumnarToRow : +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8 +- (2) Project [i#8 AS ii#10] +- (2) Filter isnotnull(i#8) +- *(2) ColumnarToRow +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8 ``` ### How was this patch tested? Tests added. Closes #27842 from imback82/alias_aware_sort_order. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-10 20:15:48 +08:00
Eric Wu	15df2a3f40	[SPARK-31079][SQL] Logging QueryExecutionMetering in RuleExecutor logger ### What changes were proposed in this pull request? RuleExecutor already support metering for analyzer/optimizer rules. By providing such information in `PlanChangeLogger`, user can get more information when debugging rule changes . This PR enhanced `PlanChangeLogger` to display RuleExecutor metrics. This can be easily done by calling the existing API `resetMetrics` and `dumpTimeSpent`, but there might be conflicts if user is also collecting total metrics of a sql job. Thus I introduced `QueryExecutionMetrics`, as the snapshot of `QueryExecutionMetering`, to better support this feature. Information added to `PlanChangeLogger` ``` === Metrics of Executed Rules === Total number of runs: 554 Total time: 0.107756568 seconds Total number of effective runs: 11 Total time of effective runs: 0.047615486 seconds ``` ### Why are the changes needed? Provide better plan change debugging user experience ### Does this PR introduce any user-facing change? Only add more debugging info of `planChangeLog`, default log level is TRACE. ### How was this patch tested? Update existing tests to verify the new logs Closes #27846 from Eric5553/ExplainRuleExecMetrics. Authored-by: Eric Wu <492960551@qq.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-10 19:08:59 +08:00
beliefer	8ee41f3576	[SPARK-30992][DSTREAMS] Arrange scattered config of streaming module ### What changes were proposed in this pull request? I found a lot scattered config in `Streaming`.I think should arrange these config in unified position. ### Why are the changes needed? Arrange scattered config ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27744 from beliefer/arrange-scattered-streaming-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-10 18:04:09 +09:00
HyukjinKwon	815c7929c2	[SPARK-31065][SQL] Match schema_of_json to the schema inference of JSON data source ### What changes were proposed in this pull request? This PR proposes two things: 1. Convert `null` to `string` type during schema inference of `schema_of_json` as JSON datasource does. This is a bug fix as well because `null` string is not the proper DDL formatted string and it is unable for SQL parser to recognise it as a type string. We should match it to JSON datasource and return a string type so `schema_of_json` returns a proper DDL formatted string. 2. Let `schema_of_json` respect `dropFieldIfAllNull` option during schema inference. ### Why are the changes needed? To let `schema_of_json` return a proper DDL formatted string, and respect `dropFieldIfAllNull` option. ### Does this PR introduce any user-facing change? Yes, it does. ```scala import collection.JavaConverters._ import org.apache.spark.sql.functions._ spark.range(1).select(schema_of_json(lit("""{"id": ""}"""))).show() spark.range(1).select(schema_of_json(lit("""{"id": "a", "drop": {"drop": null}}"""), Map("dropFieldIfAllNull" -> "true").asJava)).show(false) ``` Before: ``` struct<id:null> struct<drop:struct<drop:null>,id:string> ``` After: ``` struct<id:string> struct<id:string> ``` ### How was this patch tested? Manually tested, and unittests were added. Closes #27854 from HyukjinKwon/SPARK-31065. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-10 00:33:32 -07:00
maryannxue	de6d9e4307	[SPARK-31096][SQL] Replace `Array` with `Seq` in AQE `CustomShuffleReaderExec` ### What changes were proposed in this pull request? This PR changes the type of `CustomShuffleReaderExec`'s `partitionSpecs` from `Array` to `Seq`, since `Array` compares references not values for equality, which could lead to potential plan reuse problem. ### Why are the changes needed? Unlike `Seq`, `Array` compares references not values for equality, which could lead to potential plan reuse problem. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Passes existing UTs. Closes #27857 from maryannxue/aqe-customreader-fix. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-10 14:15:44 +08:00
Yuchen Huo	a22994333a	[SPARK-30902][SQL][FOLLOW-UP] Allow ReplaceTableAsStatement to have none provider ### What changes were proposed in this pull request? This is a follow up for https://github.com/apache/spark/pull/27650 where allow None provider for create table. Here we are doing the same thing for ReplaceTable. ### Why are the changes needed? Although currently the ASTBuilder doesn't seem to allow `replace` without `USING` clause. This would allow `DataFrameWriterV2` to use the statements instead of commands directly. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests Closes #27838 from yuchenhuo/SPARK-30902. Authored-by: Yuchen Huo <yuchen.huo@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-10 11:37:31 +08:00
Thomas Graves	e807118eef	[SPARK-31055][DOCS] Update config docs for shuffle local host reads to have dep on external shuffle service ### What changes were proposed in this pull request? with SPARK-27651 we now support host local reads for shuffle, but only when external shuffle service is enabled. Update the config docs to state that. ### Why are the changes needed? clarify dependency ### Does this PR introduce any user-facing change? no ### How was this patch tested? n/a Closes #27812 from tgravescs/SPARK-27651-follow. Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-09 12:17:59 -07:00
Liang-Chi Hsieh	d21aab403a	[SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields ### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-09 11:06:45 -07:00
Huaxin Gao	b6b0343e3e	[SPARK-30929][ML] ML, GraphX 3.0 QA: API: New Scala APIs, docs ### What changes were proposed in this pull request? Auditing new ML Scala APIs introduced in 3.0. Fix found issues. ### Why are the changes needed? ### Does this PR introduce any user-facing change? Yes. Some doc changes ### How was this patch tested? Existing tests Closes #27818 from huaxingao/spark-30929. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-09 09:11:21 -05:00
yi.wu	ef51ff9dc8	[SPARK-31082][CORE] MapOutputTrackerMaster.getMapLocation should handle last mapIndex correctly ### What changes were proposed in this pull request? In `getMapLocation`, change the condition from `...endMapIndex < statuses.length` to `...endMapIndex <= statuses.length`. ### Why are the changes needed? `endMapIndex` is exclusive, we should include it when comparing to `statuses.length`. Otherwise, we can't get the location for last mapIndex. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Updated existed test. Closes #27850 from Ngone51/fix_getmaploction. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-09 15:53:34 +08:00
Kousuke Saruta	068bdd4415	[SPARK-31073][WEBUI] Add "shuffle write time" to task metrics summary in StagePage ### What changes were proposed in this pull request? I've applied following changed to StagePage. 1. Added `Shuffle Write Time` to task metrics summary. 2. Added checkbox for `Shuffle Write Time` as an additional metrics. 3. Renamed `Write Time` column in task table to `Shuffle Write Time` and let it as an additional column. ### Why are the changes needed? Task metrics summary doesn't show `Shuffle Write Time` even though it shows `Shuffle Read Blocked Time`. `Shuffle Read Blocked Time` is let as an additional metrics so I also let `Shuffle Write Time` as an other additional metrics. ### Does this PR introduce any user-facing change? Yes. After this change, task metrics summary can show `Shuffle Write Time` and its visibility is controlled by a checkbox. ![additional-metrics-after](https://user-images.githubusercontent.com/4736016/76101844-677acb80-6012-11ea-9923-d95d852c775b.png) ![task-summary-after](https://user-images.githubusercontent.com/4736016/76101856-6ea1d980-6012-11ea-9670-3cf0ecd6faff.png) `Write Time` column is already shown in task table but the title is ambiguous so I've renamed it as `Shuffle Write Time`. After this change, this column is also additional column like `Shuffle Read Blocked Time`. ![tasks-table-after](https://user-images.githubusercontent.com/4736016/76102216-00a9e200-6013-11ea-9d51-1a6ce2abb0b9.png) ### How was this patch tested? I've tested manually using following code and confirm the UI. `sc.parallelize(1 to 1000).map(x => (x,x)).reduceByKey(_+_).collect` Closes #27837 from sarutak/write-time. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-08 20:20:39 -07:00
Wenchen Fan	1aa184763a	[SPARK-31053][SQL] mark connector APIs as Evolving ### What changes were proposed in this pull request? The newly added catalog APIs are marked as Experimental but other DS v2 APIs are marked as Evolving. This PR makes it consistent and mark all Connector APIs as Evolving. ### Why are the changes needed? For consistency. ### Does this PR introduce any user-facing change? no ### How was this patch tested? N/A Closes #27811 from cloud-fan/tag. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>	2020-03-08 11:41:09 -07:00
beliefer	f8a3730fd7	[SPARK-30841][SQL][DOC][FOLLOW-UP] Add version information to the configuration of SQL ### What changes were proposed in this pull request? This PR follows https://github.com/apache/spark/pull/27691, https://github.com/apache/spark/pull/27730 and https://github.com/apache/spark/pull/27770 I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.redaction.options.regex \| 2.2.2 \| SPARK-23850 \| 6a55d8b03053e616dcacb79cd2c29a06d219dc32#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.redaction.string.regex \| 2.3.0 \| SPARK-22791 \| 28315714ddef3ddcc192375e98dd5207cf4ecc98#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.function.concatBinaryAsString \| 2.3.0 \| SPARK-22771 \| f2b3525c17d660cf6f082bbafea8632615b4f58e#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.function.eltOutputAsString \| 2.3.0 \| SPARK-22937 \| bf853018cabcd3b3abf84bfe534d2981020b4a71#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sources.validatePartitionColumns \| 3.0.0 \| SPARK-26263 \| 5a140b7844936cf2b65f08853b8cfd8c499d4f13#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.continuous.epochBacklogQueueSize \| 3.0.0 \| SPARK-24063 \| c4bbfd177b4e7cb46f47b39df9fd71d2d9a12c6d#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.continuous.executorQueueSize \| 2.3.0 \| SPARK-22789 \| 8941a4abcada873c26af924e129173dc33d66d71#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.continuous.executorPollIntervalMs \| 2.3.0 \| SPARK-22789 \| 8941a4abcada873c26af924e129173dc33d66d71#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sources.useV1SourceList \| 3.0.0 \| SPARK-28747 \| cb06209fc908bac6ce6a8f20653865489773cbc3#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.disabledV2Writers \| 2.3.1 \| SPARK-23196 \| 588b9694c1967ff45774431441e84081ee6eb515#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.disabledV2MicroBatchReaders \| 2.4.0 \| SPARK-23362 \| 0a73aa31f41c83503d5d99eff3c9d7b406014ab3#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sources.partitionOverwriteMode \| 2.3.0 \| SPARK-20236 \| b96248862589bae1ddcdb14ce4c802789a001306#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.storeAssignmentPolicy \| 3.0.0 \| SPARK-28730 \| 895c90b582cc2b2667241f66d5b733852aeef9eb#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.ansi.enabled \| 3.0.0 \| SPARK-30125 \| d9b30694122f8716d3acb448638ef1e2b96ebc7a#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.sortBeforeRepartition \| 2.1.4 \| SPARK-23207 and SPARK-22905 and SPARK-24564 and SPARK-25114 \| 4d2d3d47e00e78893b1ecd5a9a9070adc5243ac9#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.optimizer.nestedSchemaPruning.enabled \| 2.4.1 \| SPARK-4502 \| dfcff38394929970fee454c69864d0e10d59f8d4#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.optimizer.serializer.nestedSchemaPruning.enabled \| 3.0.0 \| SPARK-26837 \| 0f2c0b53e8fb18c86c67b5dd679c006db93f94a5#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.optimizer.expression.nestedPruning.enabled \| 3.0.0 \| SPARK-27707 \| 127bc899ae78d73332a87f0972b5db3c9936c1f1#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.topKSortFallbackThreshold \| 2.4.0 \| SPARK-24193 \| 8a837bf4f3f2758f7825d2362cf9de209026651a#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.csv.parser.columnPruning.enabled \| 2.4.0 \| SPARK-24244 and SPARK-24368 \| 64fad0b519cf35b8c0a0dec18dd3df9488a5ed25#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.repl.eagerEval.enabled \| 2.4.0 \| SPARK-24215 \| 6a0b77a55d53e74ac0a0892556c3a7a933474948#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.repl.eagerEval.maxNumRows \| 2.4.0 \| SPARK-24215 \| 6a0b77a55d53e74ac0a0892556c3a7a933474948#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.repl.eagerEval.truncate \| 2.4.0 \| SPARK-24215 \| 6a0b77a55d53e74ac0a0892556c3a7a933474948#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.codegen.aggregate.fastHashMap.capacityBit \| 2.4.0 \| SPARK-24978 \| 6193a202aab0271b4532ee4b740318290f2c44a1#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.avro.compression.codec \| 2.4.0 \| SPARK-24881 \| 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.avro.deflate.level \| 2.4.0 \| SPARK-24881 \| 0a0f68bae6c0a1bf30184b1e9ac6bf3805bd7511#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.sizeOfNull \| 2.4.0 \| SPARK-24605 \| d08f53dc61f662f5291f71bcbe1a7b9f531a34d2#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.replaceDatabricksSparkAvro.enabled \| 2.4.0 \| SPARK-25129 \| ac0174e55af2e935d41545721e9f430c942b3a0c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.setopsPrecedence.enabled \| 2.4.0 \| SPARK-24966 \| 73dd6cf9b558f9d752e1f3c13584344257ad7863#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.exponentLiteralAsDecimal.enabled \| 3.0.0 \| SPARK-29956 \| 87ebfaf003fcd05a7f6d23b3ecd4661409ce5f2f#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.allowNegativeScaleOfDecimal \| 3.0.0 \| SPARK-30812 \| b76bc0b1b8b2abd00a84f805af90ca4c5925faaa#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.createHiveTableByDefault.enabled \| 3.0.0 \| SPARK-30098 \| 58be82ad4b98fc17e821e916e69e77a6aa36209d#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.integralDivide.returnBigint \| 3.0.0 \| SPARK-25457 \| 47d6e80a2e64823fabb596503fb6a6cc6f51f713#diff-9a6b543db706f1a90f790783d6930a13 \| Exists in branch-3.0 branch, but the pom.xml file corresponding to the commit log is 2.5.0-SNAPSHOT spark.sql.legacy.bucketedTableScan.outputOrdering \| 3.0.0 \| SPARK-28595 \| 469423f33887a966aaa33eb75f5e7974a0a97beb#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.parser.havingWithoutGroupByAsWhere \| 2.4.1 \| SPARK-25708 \| 3dba5d41f1a66ae5eb08404d103284110c45a351#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.json.allowEmptyString.enabled \| 3.0.0 \| SPARK-25040 \| d3de7568f32e298442f07b0a28b2c906de72c797#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.createEmptyCollectionUsingStringType \| 3.0.0 \| SPARK-30790 \| 8ab6ae3ede96adb093347470a5cbbf17fe8c04e9#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.allowUntypedScalaUDF \| 3.0.0 \| SPARK-26580 \| bc30a07ce262840c99a752db4fbd3a423f652017#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.truncateTable.ignorePermissionAcl.enabled \| 2.4.6 \| SPARK-30312 \| 830a4ec59b86253f18eb7dfd6ed0bbe0d7920e5b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue \| 3.0.0 \| SPARK-26085 \| ab2eafb3cdc7631452650c6cac03a92629255347#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.debug.maxToStringFields \| 3.0.0 \| SPARK-26066 \| 81550b38e43fb20f89f529d2127575c71a54a538#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.maxPlanStringLength \| 3.0.0 \| SPARK-26103 \| 812ad5546148d2194ab0e4230ee85b8f6a5be2fb#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.setCommandRejectsSparkCoreConfs \| 3.0.0 \| SPARK-26060 \| 1ab3d3e474ce2e36d58aea8ad09fb61f0c73e5c5#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.datetime.java8API.enabled \| 3.0.0 \| SPARK-27008 \| 52671d631d2a64ed1cfa0c6e01168908faf92df8#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sources.binaryFile.maxLength \| 3.0.0 \| SPARK-27588 \| 618d6bff71073c8c93501ab7392c3cc579730f0b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.typeCoercion.datetimeToString.enabled \| 3.0.0 \| SPARK-27638 \| 83d289eef492de8c7f3e5145f9bd75431608b500#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.defaultCatalog \| 3.0.0 \| SPARK-29753 \| 942753a44beeae5f0142ceefa307e90cbc1234c5#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.catalog.$SESSION_CATALOG_NAME \| 3.0.0 \| SPARK-29412 \| 9407fba0375675d6ee6461253f3b8230e8d67509#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.doLooseUpcast \| 3.0.0 \| SPARK-30812 \| b76bc0b1b8b2abd00a84f805af90ca4c5925faaa#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.ctePrecedencePolicy \| 3.0.0 \| SPARK-30829 \| 00943be81afbca6be13e1e72b24536cd98a788d6#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.timeParserPolicy \| 3.1.0 \| SPARK-30668 \| 7db0af578585ecaeee9fd23f8189292289b52a97#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.followThreeValuedLogicInArrayExists \| 3.0.0 \| SPARK-30812 \| b76bc0b1b8b2abd00a84f805af90ca4c5925faaa#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.maven.additionalRemoteRepositories \| 3.0.0 \| SPARK-29175 \| 3d7359ad4202067b26a199657b6a3e1f38be0e4d#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.fromDayTimeString.enabled \| 3.0.0 \| SPARK-29864 and SPARK-29920 \| e933539cdd557297daf97ff5e532a3f098896979#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.notReserveProperties \| 3.0.0 \| SPARK-30812 \| b76bc0b1b8b2abd00a84f805af90ca4c5925faaa#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.addSingleFileInAddFile \| 3.0.0 \| SPARK-30234 \| 8a8d1fbb10af6da481f26831cd519ef46ccbce6c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.mssqlserver.numericMapping.enabled \| 2.4.5 \| SPARK-28152 \| 69de7f31c37a7e0298e66cc814afc1b0aa948bbb#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.csv.filterPushdown.enabled \| 3.0.0 \| SPARK-30323 \| 4e50f0291f032b4a5c0b46ed01fdef14e4cbb050#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.addPartitionInBatch.size \| 3.0.0 \| SPARK-29938 \| 5ccbb38a71890b114c707279e7395d1f6284ebfd#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.timeParser.enabled \| 3.0.0 \| SPARK-30668 \| 92f57237871400ab9d499e1174af22a867c01988#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.allowDuplicatedMapKeys \| 3.0.0 \| SPARK-25829 \| 33329caa81827a245b84158b13234b88a4746e56#diff-9a6b543db706f1a90f790783d6930a13 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27829 from beliefer/add-version-to-sql-config-part-four. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-08 12:33:02 +09:00
beliefer	bc490f383d	[SPARK-31002][CORE][DOC] Add version information to the configuration of Core ### What changes were proposed in this pull request? Add version information to the configuration of `Core`. Note: Because `Core` has a lot of configuration items, I split the items into four PR. Other PR will follows this PR. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.resources.discoveryPlugin \| 3.0.0 \| SPARK-30689 \| 742e35f1d48c2523dda2ce21d73b7ab5ade20582#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.resourcesFile \| 3.0.0 \| SPARK-27835 \| 6748b486a9afe8370786efb64a8c9f3470c62dcf#diff-6bdad48cfc34314e89599655442ff210 \| SparkLauncher.DRIVER_EXTRA_CLASSPATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraClassPath SparkLauncher.DRIVER_EXTRA_JAVA_OPTIONS \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraJavaOptions SparkLauncher.DRIVER_EXTRA_LIBRARY_PATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.extraLibraryPath spark.driver.userClassPathFirst \| 1.3.0 \| SPARK-2996 \| 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.cores \| 1.3.0 \| SPARK-1507 \| 2be82b1e66cd188456bbf1e5abb13af04d1629d5#diff-4d2ab44195558d5a9d5f15b8803ef39d \| SparkLauncher.DRIVER_MEMORY \| 1.1.1 \| SPARK-3243 \| c1ffa3e4cdfbd1f84b5c8d8de5d0fb958a19e211#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.driver.memory spark.driver.memoryOverhead \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.dfsDir \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.layout \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.persistToDfs.enabled \| 3.0.0 \| SPARK-25118 \| 5f11e8c4cb9a5db037ac239b8fcc97f3a746e772#diff-6bdad48cfc34314e89599655442ff210 \| spark.driver.log.allowErasureCoding \| 3.0.0 \| SPARK-29105 \| 276aaaae8d404975f8701089e9f4dfecd16e0d9f#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.enabled \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.dir \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.compress \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.logBlockUpdates.enabled \| 2.3.0 \| SPARK-22050 \| 1437e344ec0c29a44a19f4513986f5f184c44695#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.erasureCoding.enabled \| 3.0.0 \| SPARK-25855 \| 35506dced739ef16136e9f3d5d48c638899d3cec#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.testing \| 1.0.1 \| None \| d4c8af87994acf3707027e6fab25363f51fd4615#diff-e4a5a68c15eed95d038acfed84b0b66a \| spark.eventLog.buffer.kb \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.logStageExecutorMetrics \| 3.0.0 \| SPARK-30812 \| 68d7edf9497bea2f73707d32ab55dd8e53088e7c#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.gcMetrics.youngGenerationGarbageCollectors \| 3.0.0 \| SPARK-25865 \| e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.gcMetrics.oldGenerationGarbageCollectors \| 3.0.0 \| SPARK-25865 \| e5c502c596563dce8eb58f86e42c1aea2c51ed17#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.overwrite \| 1.0.0 \| SPARK-1132 \| 79d07d66040f206708e14de393ab0b80020ed96a#diff-364713d7776956cb8b0a771e9b62f82d \| spark.eventLog.longForm.enabled \| 2.4.0 \| SPARK-23820 \| 71f70130f1b2b4ec70595627f0a02a88e2c0e27d#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.rolling.enabled \| 3.0.0 \| SPARK-28869 \| 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 \| spark.eventLog.rolling.maxFileSize \| 3.0.0 \| SPARK-28869 \| 100fc58da54e026cda87832a10e2d06eaeccdf87#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.id \| 1.2.0 \| SPARK-3377 \| 79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-364713d7776956cb8b0a771e9b62f82d \| SparkLauncher.EXECUTOR_EXTRA_CLASSPATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraClassPath spark.executor.heartbeat.dropZeroAccumulatorUpdates \| 3.0.0 \| SPARK-25449 \| 9362c5cc273fdd09f9b3b512e2f6b64bcefc25ab#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.heartbeatInterval \| 1.1.0 \| SPARK-2099 \| 8d338f64c4eda45d22ae33f61ef7928011cc2846#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.executor.heartbeat.maxFailures \| 1.6.2 \| SPARK-13522 \| 86bf93e65481b8fe5d7532ca6d4cd29cafc9e9dd#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.executor.processTreeMetrics.enabled \| 3.0.0 \| SPARK-27324 \| 387ce89a0631f1a4c6668b90ff2a7bbcf11919cd#diff-6bdad48cfc34314e89599655442ff210 \| spark.executor.metrics.pollingInterval \| 3.0.0 \| SPARK-26329 \| 80ab19b9fd268adfc419457f12b99a5da7b6d1c7#diff-6bdad48cfc34314e89599655442ff210 \| SparkLauncher.EXECUTOR_EXTRA_JAVA_OPTIONS \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraJavaOptions SparkLauncher.EXECUTOR_EXTRA_LIBRARY_PATH \| 1.0.0 \| None \| 29ee101c73bf066bf7f4f8141c475b8d1bd3cf1c#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.extraLibraryPath spark.executor.userClassPathFirst \| 1.3.0 \| SPARK-2996 \| 6a1e0f967286945db13d94aeb6ed19f0a347c236#diff-529fc5c06b9731c1fbda6f3db60b16aa \| SparkLauncher.EXECUTOR_CORES \| 1.0.0 \| SPARK-1126 \| 1617816090e7b20124a512a43860a21232ebf511#diff-4d2ab44195558d5a9d5f15b8803ef39d \| spark.executor.cores SparkLauncher.EXECUTOR_MEMORY \| 0.7.0 \| None \| 696eec32c982ca516c506de33f383a173bcbd131#diff-4f50ad37deb6742ad45472636c9a870b \| spark.executor.memory spark.executor.memoryOverhead \| 2.3.0 \| SPARK-22646 \| 3f4060c340d6bac412e8819c4388ccba226efcf3#diff-6bdad48cfc34314e89599655442ff210 \| spark.cores.max \| 0.6.0 \| None \| 0a472840030e4e7e84fe748f7bfa49f1ece599c5#diff-b6cc54c092b861f645c3cd69ea0f91e2 \| spark.memory.offHeap.enabled \| 1.6.0 \| SPARK-12251 \| 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.offHeap.size \| 1.6.0 \| SPARK-12251 \| 9870e5c7af87190167ca3845ede918671b9420ca#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.storageFraction \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.memory.fraction \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-529fc5c06b9731c1fbda6f3db60b16aa \| spark.storage.safetyFraction \| 1.1.0 \| [SPARK-1777 \| ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.unrollMemoryThreshold \| 1.1.0 \| SPARK-1777 \| ecf30ee7e78ea59c462c54db0fde5328f997466c#diff-692a329b5a7fb4134c55d559457b94e4 \| spark.storage.replication.proactive \| 2.2.0 \| SPARK-15355 \| fa7c582e9442b985a0493fb1dd15b3fb9b6031b4#diff-186864190089a718680accb51de5f0d4 \| spark.storage.memoryMapThreshold \| 0.9.2 \| SPARK-1145 \| 76339495153dd895667ad609815c887b2c8960ea#diff-abd96f2ae793cd6ea6aab5b96a3c1d7a \| spark.storage.replication.policy \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.replication.topologyMapper \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-186864190089a718680accb51de5f0d4 \| spark.storage.cachedPeersTtl \| 1.1.1 \| SPARK-3495 and SPARK-3496 \| be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.maxReplicationFailures \| 1.1.1 \| SPARK-3495 and SPARK-3496 \| be0cc9952d6c8b4cfe9ff10a761e0677cba64489#diff-2b643ea78c1add0381754b1f47eec132 \| spark.storage.replication.topologyFile \| 2.1.0 \| SPARK-15353 \| a26afd52198523dbd51dc94053424494638c7de5#diff-e550ce522c12a31d805a7d0f41e802af \| spark.storage.exceptionOnPinLeak \| 1.6.2 \| SPARK-13566 \| ab006523b840b1d2dbf3f5ff0a238558e7665a1e#diff-5a0de266c82b95adb47d9bca714e1f1b \| spark.storage.blockManagerTimeoutIntervalMs \| 0.7.3 \| None \| 9085ebf3750c7d9bb7c6b5f6b4bdc5b807af93c2#diff-76170a9c8f67b542bc58240a0a12fe08 \| spark.storage.blockManagerSlaveTimeoutMs \| 0.7.0 \| None \| 97434f49b8c029e9b78c91ec5f58557cd1b5c943#diff-2ce6374aac24d70c69182b067216e684 \| spark.storage.cleanupFilesAfterExecutorExit \| 2.4.0 \| SPARK-24340 \| 8ef167a5f9ba8a79bb7ca98a9844fe9cfcfea060#diff-916ca56b663f178f302c265b7ef38499 \| spark.diskStore.subDirectories \| 0.6.0 \| None \| 815d6bd69a0c1ba0e94fc0785f5c3619b37f19c5#diff-e8b73c5b81c403a5e5d581f97624c510 \| spark.block.failures.beforeLocationRefresh \| 2.0.0 \| SPARK-13328 \| ff776b2fc1cd4c571fd542dbf807e6fa3373cb34#diff-2b643ea78c1add0381754b1f47eec132 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27847 from beliefer/add-version-to-core-config-part-one. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-08 12:31:57 +09:00
Huaxin Gao	513f76ac38	[SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release ### What changes were proposed in this pull request? Update ml-guide and ml-migration-guide for 3.0. ### Why are the changes needed? This is required for each release. ### Does this PR introduce any user-facing change? Yes. ![image](https://user-images.githubusercontent.com/13592258/75957386-c8699e80-5e6e-11ea-9dec-7295f8f0bf33.png) ![image](https://user-images.githubusercontent.com/13592258/75957406-cef81600-5e6e-11ea-921f-20509771b49b.png) ![image](https://user-images.githubusercontent.com/13592258/75957423-d4edf700-5e6e-11ea-8e75-d41c532c8ba9.png) ![image](https://user-images.githubusercontent.com/13592258/75957434-da4b4180-5e6e-11ea-899b-f4e080b318ff.png) ### How was this patch tested? Manually build and check. Closes #27785 from huaxingao/spark-30934. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 18:09:00 -06:00
Nicholas Chammas	7892f88f84	[SPARK-30879][DOCS] Refine workflow for building docs ### What changes were proposed in this pull request? This PR makes the following refinements to the workflow for building docs: * Install Python and Ruby consistently using pyenv and rbenv across both the docs README and the release Dockerfile. * Pin the Python and Ruby versions we use. * Pin all direct Python and Ruby dependency versions. * Eliminate any use of `sudo pip`, which the Python community discourages, or `sudo gem`. ### Why are the changes needed? This PR should increase the consistency and reproducibility of the doc-building process by managing Python and Ruby in a more consistent way, and by eliminating unused or outdated code. Here's a possible example of an issue building the docs that would be addressed by the changes in this PR: https://github.com/apache/spark/pull/27459#discussion_r376135719 ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual tests: * I was able to build the Docker image successfully, minus the final part about `RUN useradd`. * I am unable to run `do-release-docker.sh` because I am not a committer and don't have the required GPG key. * I built the docs locally and viewed them in the browser. I think I need a committer to more fully test out these changes. Closes #27534 from nchammas/SPARK-30731-building-docs. Authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 11:43:32 -06:00
Huaxin Gao	4a64901ab7	[SPARK-31012][ML][PYSPARK][DOCS] Updating ML API docs for 3.0 changes ### What changes were proposed in this pull request? Updating ML docs for 3.0 changes ### Why are the changes needed? I am auditing 3.0 ML changes, found some docs are missing or not updated. Need to update these. ### Does this PR introduce any user-facing change? Yes, doc changes ### How was this patch tested? Manually build and check Closes #27762 from huaxingao/spark-doc. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	2020-03-07 11:42:05 -06:00
DB Tsai	7911f95202	[SPARK-31064][SQL] New Parquet Predicate Filter APIs with multi-part Identifier Support ### What changes were proposed in this pull request? Parquet's org.apache.parquet.filter2.predicate.FilterApi uses `dots` as separators to split the column name into multi-parts of nested fields. The drawback is this causes issues when the field name contains `dots`. The new APIs that will be added will take array of string directly for multi-parts of nested fields, so no confusion as using `dots` as separators. ### Why are the changes needed? To support nested predicate pushdown and predicate pushdown for columns containing `dots`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27824 from dbtsai/SPARK-31064. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-03-06 21:09:24 +00:00
iRakson	cba17e07e9	[SPARK-30899][SQL] CreateArray/CreateMap's data type should not depend on SQLConf.get ### What changes were proposed in this pull request? Introduced a new parameter `emptyCollection` for `CreateMap` and `CreateArray` functiion to remove dependency on SQLConf.get. ### Why are the changes needed? This allows to avoid the issue when the configuration change between different phases of planning, and this can silently break a query plan which can lead to crashes or data corruption. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UTs. Closes #27657 from iRakson/SPARK-30899. Authored-by: iRakson <raksonrakesh@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-06 16:45:06 +08:00
Takeshi Yamamuro	71c73d58f6	[SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID ### What changes were proposed in this pull request? This pr intends to support 32 or more grouping attributes for GROUPING_ID. In the current master, an integer overflow can occur to compute grouping IDs; `e75d9afb2f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (L613)` For example, the query below generates wrong grouping IDs in the master; ``` scala> val numCols = 32 // or, 31 scala> val cols = (0 until numCols).map { i => s"c$i" } scala> sql(s"create table test_$numCols (${cols.map(c => s"$c int").mkString(",")}, v int) using parquet") scala> val insertVals = (0 until numCols).map { _ => 1 }.mkString(",") scala> sql(s"insert into test_$numCols values ($insertVals,3)") scala> sql(s"select grouping_id(), sum(v) from test_$numCols group by grouping sets ((${cols.mkString(",")}), (${cols.init.mkString(",")}))").show(10, false) scala> sql(s"drop table test_$numCols") // numCols = 32 +-------------+------+ \|grouping_id()\|sum(v)\| +-------------+------+ \|0 \|3 \| \|0 \|3 \| // Wrong Grouping ID +-------------+------+ // numCols = 31 +-------------+------+ \|grouping_id()\|sum(v)\| +-------------+------+ \|0 \|3 \| \|1 \|3 \| +-------------+------+ ``` To fix this issue, this pr change code to use long values for `GROUPING_ID` instead of int values. ### Why are the changes needed? To support more cases in `GROUPING_ID`. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added unit tests. Closes #26918 from maropu/FixGroupingIdIssue. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	2020-03-06 16:57:03 +09:00
Burak Yavuz	2e3adadc6a	[SPARK-31061][SQL] Provide ability to alter the provider of a table This PR adds functionality to HiveExternalCatalog to be able to change the provider of a table. This is useful for catalogs in Spark 3.0 to be able to use alterTable to change the provider of a table as part of an atomic REPLACE TABLE function. No Unit tests Closes #27822 from brkyvz/externalCat. Authored-by: Burak Yavuz <brkyvz@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 23:42:07 -08:00
Huaxin Gao	6468d6f103	[SPARK-30776][ML] Support FValueSelector for continuous features and continuous labels ### What changes were proposed in this pull request? Add FValueRegressionSelector for continuous features and continuous labels. ### Why are the changes needed? Currently Spark only supports selection of categorical features, while there are many requirements for the selection of continuous distribution features. This PR adds FValueSelector for continuous features and continuous labels. ANOVASelector for continuous features and categorical labels will be added later using a separate PR. ### Does this PR introduce any user-facing change? Yes. Add a new Selector ### How was this patch tested? Add new tests Closes #27679 from huaxingao/spark_30776. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>	2020-03-06 13:24:30 +08:00
Gengliang Wang	1426ad8968	[SPARK-23817][FOLLOWUP][TEST] Add OrcV2QuerySuite ### What changes were proposed in this pull request? Add `OrcV2QuerySuite` which explicitly sets the configuration `USE_V1_SOURCE_LIST` as `""` to use ORC V2 implementation. ### Why are the changes needed? As now file source V2 is disabled by default, the test suite `OrcQuerySuite` is testing V1 implementation as well as the `OrcV1QuerySuite`. We should fix it. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test. Closes #27816 from gengliangwang/orcQuerySuite. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-03-05 21:22:32 -08:00
yi.wu	587266f887	[SPARK-31010][SQL][FOLLOW-UP] Deprecate untyped scala UDF ### What changes were proposed in this pull request? Use scala annotation deprecate to deprecate untyped scala UDF. ### Why are the changes needed? After #27488, it's weird to see the untyped scala UDF will fail by default without deprecation. ### Does this PR introduce any user-facing change? Yes, user will see the warning: ``` <console>:26: warning: method udf in object functions is deprecated (since 3.0.0): Untyped Scala UDF API is deprecated, please use typed Scala UDF API such as 'def udf[RT: TypeTag](f: Function0[RT]): UserDefinedFunction' instead. val myudf = udf(() => Math.random(), DoubleType) ^ ``` ### How was this patch tested? Tested manually. Closes #27794 from Ngone51/deprecate_untyped_scala_udf. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-06 13:00:04 +08:00
Maxim Gekk	59f1e76b82	[SPARK-31020][SPARK-31023][SPARK-31025][SPARK-31044][SQL] Support foldable args by `from_csv/json` and `schema_of_csv/json` ### What changes were proposed in this pull request? In the PR, I propose: 1. To replace matching by `Literal` in `ExprUtils.evalSchemaExpr()` to checking foldable property of the `schema` expression. 2. To replace matching by `Literal` in `ExprUtils.evalTypeExpr()` to checking foldable property of the `schema` expression. 3. To change checking of the input parameter in the `SchemaOfCsv` expression, and allow foldable `child` expression. 4. To change checking of the input parameter in the `SchemaOfJson` expression, and allow foldable `child` expression. ### Why are the changes needed? This should improve Spark SQL UX for `from_csv`/`from_json`. Currently, Spark expects only literals: ```sql spark-sql> select from_csv('1,Moscow', replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '')); Error in query: Schema should be specified in DDL format as a string literal or output of the schema_of_csv function instead of replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '');; line 1 pos 7 spark-sql> select from_json('{"id":1, "city":"Moscow"}', replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '')); Error in query: Schema should be specified in DDL format as a string literal or output of the schema_of_json function instead of replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '');; line 1 pos 7 ``` and only string literals are acceptable as CSV examples by `schema_of_csv`/`schema_of_json`: ```sql spark-sql> select schema_of_csv(concat_ws(',', 0.1, 1)); Error in query: cannot resolve 'schema_of_csv(concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)))' due to data type mismatch: The input csv should be a string literal and not null; however, got concat_ws(',', CAST(0.1BD AS STRING), CAST(1 AS STRING)).; line 1 pos 7; 'Project [unresolvedalias(schema_of_csv(concat_ws(,, cast(0.1 as string), cast(1 as string))), None)] +- OneRowRelation spark-sql> select schema_of_json(regexp_replace('{"item_id": 1, "item_price": 0.1}', 'item_', '')); Error in query: cannot resolve 'schema_of_json(regexp_replace('{"item_id": 1, "item_price": 0.1}', 'item_', ''))' due to data type mismatch: The input json should be a string literal and not null; however, got regexp_replace('{"item_id": 1, "item_price": 0.1}', 'item_', '').; line 1 pos 7; 'Project [unresolvedalias(schema_of_json(regexp_replace({"item_id": 1, "item_price": 0.1}, item_, )), None)] +- OneRowRelation ``` ### Does this PR introduce any user-facing change? Yes, after the changes users can pass any foldable string expression as the `schema` parameter to `from_csv()/from_json()`. For the example above: ```sql spark-sql> select from_csv('1,Moscow', replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '')); {"id":1,"city":"Moscow"} spark-sql> select from_json('{"id":1, "city":"Moscow"}', replace('dpt_org_id INT, dpt_org_city STRING', 'dpt_org_', '')); {"id":1,"city":"Moscow"} ``` After change the `schema_of_csv`/`schema_of_json` functions accept foldable expressions, for example: ```sql spark-sql> select schema_of_csv(concat_ws(',', 0.1, 1)); struct<_c0:double,_c1:int> spark-sql> select schema_of_json(regexp_replace('{"item_id": 1, "item_price": 0.1}', 'item_', '')); struct<id:bigint,price:double> ``` ### How was this patch tested? Added new test to `CsvFunctionsSuite` and to `JsonFunctionsSuite`. Closes #27804 from MaxGekk/foldable-arg-csv-json-func. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-06 12:29:35 +08:00
Dongjoon Hyun	afb84e9d37	[SPARK-30886][SQL] Deprecate two-parameter TRIM/LTRIM/RTRIM functions ### What changes were proposed in this pull request? This PR aims to show a deprecation warning on two-parameter TRIM/LTRIM/RTRIM function usages based on the community decision. - https://lists.apache.org/thread.html/r48b6c2596ab06206b7b7fd4bbafd4099dccd4e2cf9801aaa9034c418%40%3Cdev.spark.apache.org%3E ### Why are the changes needed? For backward compatibility, SPARK-28093 is reverted. However, from Apache Spark 3.0.0, we should give a safe guideline to use SQL syntax instead of the esoteric function signatures. ### Does this PR introduce any user-facing change? Yes. This shows a directional warning. ### How was this patch tested? Pass the Jenkins with a newly added test case. Closes #27643 from dongjoon-hyun/SPARK-30886. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-03-05 20:09:39 -08:00
maryannxue	d705d36c0c	[SPARK-31045][SQL] Add config for AQE logging level ### What changes were proposed in this pull request? This PR adds an internal config for changing the logging level of adaptive execution query plan evolvement. ### Why are the changes needed? To make AQE debugging easier. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UT. Closes #27798 from maryannxue/aqe-log-level. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-06 11:41:45 +08:00
beliefer	d9254b26f1	[SPARK-30841][SQL][DOC][FOLLOW-UP] Add version information to the configuration of SQL ### What changes were proposed in this pull request? This PR follows https://github.com/apache/spark/pull/27691 and https://github.com/apache/spark/pull/27730 I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.sql.execution.useObjectHashAggregateExec \| 2.2.0 \| SPARK-19944 \| 0ee38a39e43dd7ad9d50457e446ae36f64621a1b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.jsonGenerator.ignoreNullFields \| 3.0.0 \| SPARK-29444 \| 78b0cbe265c4e8cc3d4d8bf5d734f2998c04d376#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.fileSink.log.deletion \| 2.0.0 \| SPARK-14678 \| 7bc948557bb6169cbeec335f8400af09375a62d3#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSink.log.compactInterval \| 2.0.0 \| SPARK-14678 \| 7bc948557bb6169cbeec335f8400af09375a62d3#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSink.log.cleanupDelay \| 2.0.0 \| SPARK-14678 \| 7bc948557bb6169cbeec335f8400af09375a62d3#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSource.log.deletion \| 2.0.1 \| SPARK-15698 \| 8d8e2332ca12067817de45a8d3812928150975d0#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSource.log.compactInterval \| 2.0.1 \| SPARK-15698 \| 8d8e2332ca12067817de45a8d3812928150975d0#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSource.log.cleanupDelay \| 2.0.1 \| SPARK-15698 \| 8d8e2332ca12067817de45a8d3812928150975d0#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.fileSource.schema.forceNullable \| 3.0.0 \| SPARK-28651 \| 5bb69945e4aaf519cd10a5c5083332f618039af0#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.fileSource.cleaner.numThreads \| 3.0.0 \| SPARK-29876 \| abf759a91e01497586b8bb6b7a314dd28fd6cff1#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.schemaInference \| 2.0.0 \| SPARK-15458 \| 1fb7b3a0a2e3a5c5f784aab662df93fcc1449c36#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.pollingDelay \| 2.0.0 \| SPARK-16002 \| afa14b71b28d788c53816bd2616ccff0c3967f40#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.stopTimeout \| 3.0.0 \| SPARK-30143 \| 4c37a8a3f4a489b52f1919d2db84f6e32c6a05cd#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.noDataProgressEventInterval \| 2.1.1 \| SPARK-19944 \| 80ebca62cbdb7d5c8606e95a944164ab1a943694#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.noDataMicroBatches.enabled \| 2.4.1 \| SPARK-24157 \| 535bf1cc9e6b54df7059ac3109b8cba30057d040#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.metricsEnabled \| 2.0.2 \| SPARK-17731 \| 881e0eb05782ea74cf92a62954466b14ea9e05b6#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.streaming.numRecentProgressUpdates \| 2.1.1 \| SPARK-19944 \| 80ebca62cbdb7d5c8606e95a944164ab1a943694#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.checkpointFileManagerClass \| 2.4.0 \| SPARK-23966 \| cbb41a0c5b01579c85f06ef42cc0585fbef216c5#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.streaming.checkpoint.escapedPathCheck.enabled \| 3.0.0 \| SPARK-26824 \| 77b99af57330cf2e5016a6acc69642d54041b041#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.statistics.parallelFileListingInStatsComputation.enabled \| 2.4.1 \| SPARK-24626 \| f11f44548903bbab7ab764574d6bed326cf4cd8d#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.defaultSizeInBytes \| 1.1.0 \| SPARK-2393 \| c7db274be79f448fda566208946cb50958ea9b1a#diff-41ef65b9ef5b518f77e2a03559893f4d \| spark.sql.statistics.fallBackToHdfs \| 2.0.0 \| SPARK-15960 \| 5c53442cc098dd618ba1430962727c74b2de2e68#diff-32bb9518401c0948c5ea19377b5069ab \| spark.sql.statistics.ndv.maxError \| 2.1.1 \| SPARK-19944 \| 80ebca62cbdb7d5c8606e95a944164ab1a943694#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.statistics.histogram.enabled \| 2.3.0 \| SPARK-17074 \| 11b60af737a04d931356aa74ebf3c6cf4a6b08d6#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.statistics.histogram.numBins \| 2.3.0 \| SPARK-17074 \| 11b60af737a04d931356aa74ebf3c6cf4a6b08d6#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.statistics.percentile.accuracy \| 2.3.0 \| SPARK-17074 \| 11b60af737a04d931356aa74ebf3c6cf4a6b08d6#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.statistics.size.autoUpdate.enabled \| 2.3.0 \| SPARK-21127 \| d5202259d9aa9ad95d572af253bf4a722b7b437a#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.enabled \| 2.2.0 \| SPARK-19944 \| 0ee38a39e43dd7ad9d50457e446ae36f64621a1b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.planStats.enabled \| 3.0.0 \| SPARK-24690 \| 3f3a18fff116a02ff7996d45a1061f48a2de3102#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.joinReorder.enabled \| 2.2.0 \| SPARK-19944 \| 0ee38a39e43dd7ad9d50457e446ae36f64621a1b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.joinReorder.dp.threshold \| 2.2.0 \| SPARK-19944 \| 0ee38a39e43dd7ad9d50457e446ae36f64621a1b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.joinReorder.card.weight \| 2.2.0 \| SPARK-19915 \| c083b6b7dec337d680b54dabeaa40e7a0f69ae69#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.joinReorder.dp.star.filter \| 2.2.0 \| SPARK-20233 \| fbe4216e1e83d243a7f0521b76bfb20c25278281#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.starSchemaDetection \| 2.2.0 \| SPARK-17791 \| 81639115947a13017d1637549a8f66ba599b27b8#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cbo.starJoinFTRatio \| 2.2.0 \| SPARK-17791 \| 81639115947a13017d1637549a8f66ba599b27b8#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.session.timeZone \| 2.2.0 \| SPARK-19944 \| 0ee38a39e43dd7ad9d50457e446ae36f64621a1b#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.windowExec.buffer.in.memory.threshold \| 2.2.1 \| SPARK-21595 \| 406eb1c2ee670c2f14f2737c32c9aa0b8d35bf7c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.windowExec.buffer.spill.threshold \| 2.2.0 \| SPARK-13450 \| 02c274eaba0a8e7611226e0d4e93d3c36253f4ce#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sortMergeJoinExec.buffer.in.memory.threshold \| 2.2.1 \| SPARK-21595 \| 406eb1c2ee670c2f14f2737c32c9aa0b8d35bf7c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.sortMergeJoinExec.buffer.spill.threshold \| 2.2.0 \| SPARK-13450 \| 02c274eaba0a8e7611226e0d4e93d3c36253f4ce#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cartesianProductExec.buffer.in.memory.threshold \| 2.2.1 \| SPARK-21595 \| 406eb1c2ee670c2f14f2737c32c9aa0b8d35bf7c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.cartesianProductExec.buffer.spill.threshold \| 2.2.0 \| SPARK-13450 \| 02c274eaba0a8e7611226e0d4e93d3c36253f4ce#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.parser.quotedRegexColumnNames \| 2.3.0 \| SPARK-12139 \| 2cbfc975ba937a4eb761de7a6473b7747941f386#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.rangeExchange.sampleSizePerPartition \| 2.3.0 \| SPARK-22160 \| 323806e68f91f3c7521327186a37ddd1436267d0#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.enabled \| 2.3.0 \| SPARK-22159 \| d29d1e87995e02cb57ba3026c945c3cd66bb06e2#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.pyspark.enabled \| 3.0.0 \| SPARK-27834 \| db48da87f02e2e89710ba65fab8b07e9c85b9e74#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.sparkr.enabled \| 3.0.0 \| SPARK-27834 \| db48da87f02e2e89710ba65fab8b07e9c85b9e74#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.fallback.enabled \| 2.4.0 \| SPARK-23380 \| d6632d185e147fcbe6724545488ad80dce20277e#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.pyspark.fallback.enabled \| 3.0.0 \| SPARK-27834 \| db48da87f02e2e89710ba65fab8b07e9c85b9e74#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.arrow.maxRecordsPerBatch \| 2.3.0 \| SPARK-13534 \| d03aebbe6508ba441dc87f9546f27aeb27553d77#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.pandas.udf.buffer.size \| 3.1.0 \| SPARK-27870 \| 692e3ddb4e517638156f7427ade8b62fb37634a7#diff-9a6b543db706f1a90f790783d6930a13 \| Exists in master, not branch-3.0 spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName \| 2.4.1 \| SPARK-24324 \| 3f203050ac764516e68fb43628bba0df5963e44d#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.execution.pandas.convertToArrowArraySafely \| 3.0.0 \| SPARK-30812 \| b76bc0b1b8b2abd00a84f805af90ca4c5925faaa#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.optimizer.replaceExceptWithFilter \| 2.3.0 \| SPARK-22181 \| 01f6ba0e7a12ef818d56e7d5b1bd889b79f2b57c#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.decimalOperations.allowPrecisionLoss \| 2.3.1 \| SPARK-22036 \| 8a98274823a4671cee85081dd19f40146e736325#diff-9a6b543db706f1a90f790783d6930a13 \| spark.sql.legacy.literal.pickMinimumPrecision \| 2.3.3 \| SPARK-25454 \| 26d893a4f64de18222942568f7735114447a6ab7#diff-9a6b543db706f1a90f790783d6930a13 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27770 from beliefer/add-version-to-sql-config-part-three. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 11:19:08 +09:00
beliefer	e36227e2d9	[SPARK-30914][CORE][DOC] Add version information to the configuration of UI ### What changes were proposed in this pull request? 1.Add version information to the configuration of `UI`. 2.Update the docs of `UI`. I sorted out some information show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.ui.showConsoleProgress \| 1.2.1 \| SPARK-4017 \| 04b1bdbae31c3039125100e703121daf7d9dabf5#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.consoleProgress.update.interval \| 2.1.0 \| SPARK-16919 \| e076fb05ac83a3ed6995e29bb03ea07ea05e39db#diff-fbf4e388a66b6a37e984b91cd71a3e2c \| spark.ui.enabled \| 1.1.1 \| SPARK-3490 \| 937de93e80e6d299c4d08be426da2d5bc2d66f98#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.port \| 0.7.0 \| None \| f03d9760fd8ac67fd0865cb355ba75d2eff507fe#diff-ed8dbcebe16fda5ecd6df1a981dc6fee \| spark.ui.filters \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-f79a5ead735b3d0b34b6b94486918e1c \| spark.ui.allowFramingFrom \| 1.6.0 \| SPARK-10589 \| 5dbaf3d3911bbfa003bc75459aaad66b4f6e0c67#diff-f79a5ead735b3d0b34b6b94486918e1c \| spark.ui.reverseProxy \| 2.1.0 \| SPARK-15487 \| 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.reverseProxyUrl \| 2.1.0 \| SPARK-15487 \| 92ce8d4849a0341c4636e70821b7be57ad3055b1#diff-364713d7776956cb8b0a771e9b62f82d \| spark.ui.killEnabled \| 1.0.0 \| SPARK-1202 \| 211f97447b5f078afcb1619a08d2e2349325f61a#diff-a40023c80383451b6e29ee7a6e0593e9 \| spark.ui.threadDumpsEnabled \| 1.2.0 \| SPARK-611 \| 866c7bbe56f9c7fd96d3f4afe8a76405dc877a6e#diff-5d18fb70c572369a0fff0b97de94f265 \| spark.ui.prometheus.enabled \| 3.0.0 \| SPARK-29064 \| bbfaadb280a80b511a98d18881641c6d9851dd51#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| spark.ui.xXssProtection \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.xContentTypeOptions.enabled \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.strictTransportSecurity \| 2.3.0 \| SPARK-22188 \| 5a07aca4d464e96d75ea17bf6768e24b829872ec#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.requestHeaderSize \| 2.2.3 \| SPARK-26118 \| 9ceee6f188e6c3794d31ce15cc61d29f907bebf7#diff-6bdad48cfc34314e89599655442ff210 \| spark.ui.timeline.tasks.maximum \| 1.4.0 \| SPARK-7296 \| a5f7b3b9c7f05598a1cc8e582e5facee1029cd5e#diff-fa4cfb2cce1b925f55f41f2dfa8c8501 \| spark.acls.enable \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls \| 1.0.0 \| SPARK-1189 \| 7edbea41b43e0dc11a2de156be220db8b7952d01#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.view.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.admin.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls \| 1.1.0 \| SPARK-1890 and SPARK-1891 \| e3fe6571decfdc406ec6d505fd92f9f2b85a618c#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.modify.acls.groups \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.user.groups.mapping \| 2.0.0 \| SPARK-4224 \| ae79032dcf160796851ca29116cca146c4d86ada#diff-afd88f677ec5ff8b5e96a5cbbe00cd98 \| spark.ui.proxyRedirectUri \| 3.0.0 \| SPARK-30240 \| a9fbd310300e57ed58818d7347f3c3172701c491#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| spark.ui.custom.executor.log.url \| 3.0.0 \| SPARK-26792 \| d5bda2c9e8dde6afc075cc7f65b15fa9aa82231c#diff-f70174ad0759db1fb4cb36a7ff9324a7 \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27806 from beliefer/add-version-to-UI-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 11:08:57 +09:00
Maxim Gekk	cf7c397ede	[MINOR][SQL] Remove an ignored test from JsonSuite ### What changes were proposed in this pull request? Remove ignored and outdated test `Type conflict in primitive field values (Ignored)` from JsonSuite. ### Why are the changes needed? The test is not maintained for long time. It can be removed to reduce size of JsonSuite, and improve maintainability. ### Does this PR introduce any user-facing change? No ### How was this patch tested? By running the command `./build/sbt "test:testOnly *JsonV2Suite"` Closes #27795 from MaxGekk/remove-ignored-test-in-JsonSuite. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 10:35:44 +09:00
HyukjinKwon	fc12165f48	[SPARK-31036][SQL] Use stringArgs in Expression.toString to respect hidden parameters ### What changes were proposed in this pull request? This PR proposes to respect hidden parameters by using `stringArgs` in `Expression.toString `. By this, we can show the strings properly in some cases such as `NonSQLExpression`. ### Why are the changes needed? To respect "hidden" arguments in the string representation. ### Does this PR introduce any user-facing change? Yes, for example, on the top of https://github.com/apache/spark/pull/27657, ```scala val identify = udf((input: Seq[Int]) => input) spark.range(10).select(identify(array("id"))).show() ``` shows hidden parameter `useStringTypeWhenEmpty`. ``` +---------------------+ \|UDF(array(id, false))\| +---------------------+ \| [0]\| \| [1]\| ... ``` whereas: ```scala spark.range(10).select(array("id")).show() ``` ``` +---------+ \|array(id)\| +---------+ \| [0]\| \| [1]\| ... ``` ### How was this patch tested? Manually tested as below: ```scala val identify = udf((input: Boolean) => input) spark.range(10).select(identify(exists(array(col("id")), _ % 2 === 0))).show() ``` Before: ``` +-------------------------------------------------------------------------------------+ \|UDF(exists(array(id), lambdafunction(((lambda 'x % 2) = 0), lambda 'x, false), true))\| +-------------------------------------------------------------------------------------+ \| true\| \| false\| \| true\| ... ``` After: ``` +-------------------------------------------------------------------------------+ \|UDF(exists(array(id), lambdafunction(((lambda 'x % 2) = 0), lambda 'x, false)))\| +-------------------------------------------------------------------------------+ \| true\| \| false\| \| true\| ... ``` Closes #27788 from HyukjinKwon/arguments-str-repr. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 10:33:20 +09:00
Peter Toth	72b52a3cdf	[SPARK-30563][SQL] Disable using commit coordinator with NoopDataSource ### What changes were proposed in this pull request? This PR disables using commit coordinator with `NoopDataSource`. ### Why are the changes needed? No need for a coordinator in benchmarks. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UTs. Closes #27791 from peter-toth/SPARK-30563-disalbe-commit-coordinator. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 10:30:59 +09:00
Takeshi Yamamuro	ffec7a1964	[SQL][DOCS][MINOR] Fix typos and wrong phrases in docs ### What changes were proposed in this pull request? This PR intends to fix typos and phrases in the `/docs` directory. To find them, I run the Intellij typo checker. ### Why are the changes needed? For better documents. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #27819 from maropu/TypoFix-20200306. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	2020-03-05 16:54:59 -08:00
HyukjinKwon	5b3277f4fc	[SPARK-30994][BUILD][FOLLOW-UP] Change scope of xml-apis to include it and add xerces in SBT as dependency override ### What changes were proposed in this pull request? This PR propose 1. Explicitly include xml-apis. xml-apis is already the part of xerces 2.12.0 (https://repo1.maven.org/maven2/xerces/xercesImpl/2.12.0/xercesImpl-2.12.0.pom). However, we're excluding it by setting `scope` to `test`. This seems causing `spark-shell`, built from Maven, to fail. Seems like previously xml-apis wasn't reached for some reasons but after we upgrade, it seems requiring. Therefore, this PR proposes to include it. 2. Pins `xerces` version in SBT as well. Seems this dependency is resolved differently from Maven. Note that Hadoop 3 does not looks requiring this as they replaced xerces as of [HDFS-12221](https://issues.apache.org/jira/browse/HDFS-12221). ### Why are the changes needed? To make `spark-shell` working from Maven build, and uses the same xerces version. ### Does this PR introduce any user-facing change? No, it's master only. ### How was this patch tested? 1. ```bash ./build/mvn -DskipTests -Psparkr -Phive clean package ./bin/spark-shell ``` Before: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.AbstractDOMParser.startDocument(Unknown Source) at org.apache.xerces.xinclude.XIncludeHandler.startDocument(Unknown Source) at org.apache.xerces.impl.dtd.XMLDTDValidator.startDocument(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl.startEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.startDocumentParsing(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2482) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2470) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2541) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2494) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2407) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) at org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) at org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) at org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.w3c.dom.ElementTraversal at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 42 more ``` After: ``` ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_202) Type in expressions to have them evaluated. Type :help for more information. scala> ``` 2. ``` ./build/sbt dependencyTree -Phadoop-2.7 -Phive-2.3 -Phive-thriftserver -Phive ./build/sbt dependencyTree -Phadoop-3.2 -Phive-2.3 -Phive-thriftserver -Phive ``` Closes #27808 from HyukjinKwon/SPARK-30994. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-06 09:39:02 +09:00
DB Tsai	fe126a6a05	[SPARK-31058][SQL][TEST-HIVE1.2] Consolidate the implementation of `quoteIfNeeded` ### What changes were proposed in this pull request? There are two implementation of quoteIfNeeded. One is in `org.apache.spark.sql.connector.catalog.CatalogV2Implicits.quote` and the other is in `OrcFiltersBase.quoteAttributeNameIfNeeded`. This PR will consolidate them into one. ### Why are the changes needed? Simplify the codebase. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing UTs. Closes #27814 from dbtsai/SPARK-31058. Authored-by: DB Tsai <d_tsai@apple.com> Signed-off-by: DB Tsai <d_tsai@apple.com>	2020-03-06 00:13:57 +00:00
yi.wu	8d5ef2f766	[SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed" ### What changes were proposed in this pull request? This PR fix the flaky test in #27050. ### Why are the changes needed? `SparkListenerStageCompleted` is posted by `listenerBus` asynchronously. So, we should make sure listener has consumed the event before asserting completed stages. See [error message](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119308/testReport/org.apache.spark.scheduler/DAGSchedulerSuite/shuffle_fetch_failed_on_speculative_task__but_original_task_succeed__SPARK_30388_/): ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: List(0, 1, 1) did not equal List(0, 1, 1, 0) at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$88(DAGSchedulerSuite.scala:1976) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Update test and test locally by no failure after running hundreds of times. Note, the failure is easy to reproduce when loop running the test for hundreds of times(e.g 200) Closes #27809 from Ngone51/fix_flaky_spark_30388. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>	2020-03-05 10:56:49 -08:00
Wenchen Fan	ba86524b25	[SPARK-31037][SQL] refine AQE config names ### What changes were proposed in this pull request? When introducing AQE to others, I feel the config names are a bit incoherent and hard to use. This PR refines the config names: 1. remove the "shuffle" prefix. AQE is all about shuffle and we don't need to add the "shuffle" prefix everywhere. 2. `targetPostShuffleInputSize` is obscure, rename to `advisoryShufflePartitionSizeInBytes`. 3. `reducePostShufflePartitions` doesn't match the actual optimization, rename to `coalesceShufflePartitions` 4. `minNumPostShufflePartitions` is obscure, rename it `minPartitionNum` under the `coalesceShufflePartitions` namespace 5. `maxNumPostShufflePartitions` is confusing with the word "max", rename it `initialPartitionNum` 6. `skewedJoinOptimization` is too verbose. skew join is a well-known terminology in database area, we can just say `skewJoin` ### Why are the changes needed? Make the config names easy to understand. ### Does this PR introduce any user-facing change? deprecate the config `spark.sql.adaptive.shuffle.targetPostShuffleInputSize` ### How was this patch tested? N/A Closes #27793 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-06 00:46:34 +08:00
yi.wu	2257ce2443	[SPARK-31034][CORE] ShuffleBlockFetcherIterator should always create request for last block group ### What changes were proposed in this pull request? This is a bug fix of #27280. This PR fix the bug where `ShuffleBlockFetcherIterator` may forget to create request for the last block group. ### Why are the changes needed? When (all blocks).sum < `targetRemoteRequestSize` and (all blocks).length > `maxBlocksInFlightPerAddress` and (last block group).size < `maxBlocksInFlightPerAddress`, `ShuffleBlockFetcherIterator` will not create a request for the last group. Thus, it will lost data for the reduce task. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Updated test. Closes #27786 from Ngone51/fix_no_request_bug. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 21:31:26 +08:00
Maxim Gekk	1fd9a91c66	[SPARK-31005][SQL] Support time zone ids in casting strings to timestamps ### What changes were proposed in this pull request? In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats: - no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone` - -[h]h:[m]m - +[h]h:[m]m - Z - Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS - Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats: - +\|-h[h] - +\|-hh[:]mm - +\|-hh:mm:ss - +\|-hhmmss - Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB). ### Why are the changes needed? - To use `stringToTimestamp` as a substitution of removed `stringToTime`, see https://github.com/apache/spark/pull/27710#discussion_r385020173 - Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); NULL ``` ### Does this PR introduce any user-facing change? Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); 2015-03-18 12:03:17.123456 ``` ### How was this patch tested? - Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`. - Run `CastSuite` and `AnsiCastSuite`. Closes #27753 from MaxGekk/stringToTimestamp-uni-zoneId. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 20:49:43 +08:00
Wenchen Fan	807ea413b4	[SPARK-31019][SQL] make it clear that people can deduplicate map keys ### What changes were proposed in this pull request? rename the config and make it non-internal. ### Why are the changes needed? Now we fail the query if duplicated map keys are detected, and provide a legacy config to deduplicate it. However, we must provide a way to get users out of this situation, instead of just rejecting to run the query. This exit strategy should always be there, while legacy config indicates that it may be removed someday. ### Does this PR introduce any user-facing change? no, just rename a config which was added in 3.0 ### How was this patch tested? add more tests for the fail behavior. Closes #27772 from cloud-fan/map. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-05 20:43:52 +09:00
Kent Yao	f45ae7f2c5	[SPARK-31038][SQL] Add checkValue for spark.sql.session.timeZone ### What changes were proposed in this pull request? The `spark.sql.session.timeZone` config can accept any string value including invalid time zone ids, then it will fail other queries that rely on the time zone. We should do the value checking in the set phase and fail fast if the zone value is invalid. ### Why are the changes needed? improve configuration ### Does this PR introduce any user-facing change? yes, will fail fast if the value is a wrong timezone id ### How was this patch tested? add ut Closes #27792 from yaooqinn/SPARK-31038. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 19:38:20 +08:00
maryannxue	9b602e26d2	[SPARK-31046][SQL] Make more efficient and clean up AQE update UI code ### What changes were proposed in this pull request? This PR avoids sending redundant metrics (those that have been included in previous update) as well as useless metrics (those in future stages) to Spark UI in AQE UI metrics update. ### Why are the changes needed? This change will make UI metrics update more efficient. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manual test in Spark UI. Closes #27799 from maryannxue/aqe-ui-cleanup. Authored-by: maryannxue <maryannxue@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 18:53:01 +08:00
Terry Kim	66b4fd040e	[SPARK-31024][SQL] Allow specifying session catalog name `spark_catalog` in qualified column names for v1 tables ### What changes were proposed in this pull request? Currently, the user cannot specify the session catalog name (`spark_catalog`) in qualified column names for v1 tables: ``` SELECT spark_catalog.default.t.i FROM spark_catalog.default.t ``` fails with `cannot resolve 'spark_catalog.default.t.i`. This is inconsistent with v2 table behavior where catalog name can be used: ``` SELECT testcat.ns1.tbl.id FROM testcat.ns1.tbl.id ``` This PR proposes to fix the inconsistency and allow the user to specify session catalog name in column names for v1 tables. ### Why are the changes needed? Fixing an inconsistent behavior. ### Does this PR introduce any user-facing change? Yes, now the following query works: ``` SELECT spark_catalog.default.t.i FROM spark_catalog.default.t ``` ### How was this patch tested? Added new tests. Closes #27776 from imback82/spark_catalog_col_name_resolution. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 18:33:59 +08:00
yi.wu	0a22f19664	[SPARK-31050][TEST] Disable flaky `Roundtrip` test in KafkaDelegationTokenSuite ### What changes were proposed in this pull request? Disable test `KafkaDelegationTokenSuite`. ### Why are the changes needed? `KafkaDelegationTokenSuite` is too flaky. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass Jenkins. Closes #27789 from Ngone51/retry_kafka. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-03-05 00:21:32 -08:00
Yuanjian Li	7db0af5785	[SPARK-30668][SQL][FOLLOWUP] Raise exception instead of silent change for new DateFormatter ### What changes were proposed in this pull request? This is a follow-up work for #27441. For the cases of new TimestampFormatter return null while legacy formatter can return a value, we need to throw an exception instead of silent change. The legacy config will be referenced in the error message. ### Why are the changes needed? Avoid silent result change for new behavior in 3.0. ### Does this PR introduce any user-facing change? Yes, an exception is thrown when we detect legacy formatter can parse the string and the new formatter return null. ### How was this patch tested? Extend existing UT. Closes #27537 from xuanyuanking/SPARK-30668-follow. Authored-by: Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	2020-03-05 15:29:39 +08:00
Kent Yao	3edab6cc1d	[MINOR][CORE] Expose the alias -c flag of --conf for spark-submit ### What changes were proposed in this pull request? -c is short for --conf, it was introduced since v1.1.0 but hidden from users until now ### Why are the changes needed? ### Does this PR introduce any user-facing change? no expose hidden feature ### How was this patch tested? Nah Closes #27802 from yaooqinn/conf. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	2020-03-04 20:37:51 -08:00
beliefer	ebcff675e0	[SPARK-30889][SPARK-30913][CORE][DOC] Add version information to the configuration of Tests.scala and Worker ### What changes were proposed in this pull request? 1.Add version information to the configuration of `Tests` and `Worker`. 2.Update the docs of `Worker`. I sorted out some information of `Tests` show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.testing.memory \| 1.6.0 \| SPARK-10983 \| b3ffac5178795f2d8e7908b3e77e8e89f50b5f6f#diff-395d07dcd46359cca610ce74357f0bb4 \| spark.testing.dynamicAllocation.scheduleInterval \| 2.3.0 \| SPARK-22864 \| 4e9e6aee44bb2ddb41b567d659358b22fd824222#diff-b096353602813e47074ace09a3890d56 \| spark.testing \| 1.0.1 \| SPARK-1606 \| ce57624b8232159fe3ec6db228afc622133df591#diff-d239aee594001f8391676e1047a0381e \| spark.test.noStageRetry \| 1.2.0 \| SPARK-3796 \| f55218aeb1e9d638df6229b36a59a15ce5363482#diff-6a9ff7fb74fd490a50462d45db2d5e11 \| spark.testing.reservedMemory \| 1.6.0 \| SPARK-12081 \| 84c44b500b5c90dffbe1a6b0aa86f01699b09b96#diff-395d07dcd46359cca610ce74357f0bb4 \| spark.testing.nHosts \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.nExecutorsPerHost \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.nCoresPerExecutor \| 3.0.0 \| SPARK-26491 \| 1a641525e60039cc6b10816e946cb6f44b3e2696#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.resources.warnings.testing \| 3.1.0 \| SPARK-29148 \| 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| spark.testing.resourceProfileManager \| 3.1.0 \| SPARK-29148 \| 496f6ac86001d284cbfb7488a63dd3a168919c0f#diff-8b4ea8f3b0cc1e7ce7e943de1abbb165 \| I sorted out some information of `Worker` show below. Item name \| Since version \| JIRA ID \| Commit ID \| Note -- \| -- \| -- \| -- \| -- spark.worker.resourcesFile \| 3.0.0 \| SPARK-27369 \| 7cbe01e8efc3f6cd3a0cac4bcfadea8fcc74a955#diff-b2fc8d6ab7ac5735085e2d6cfacb95da \| spark.worker.timeout \| 0.6.2 \| None \| e395aa295aeec6767df798bf1002b1f30983c1cd#diff-776a630ac2b2ec5fe85c07ca20a58fc0 \| spark.worker.driverTerminateTimeout \| 2.1.2 \| SPARK-20843 \| ebd72f453aa0b4f68760d28b3e93e6dd33856659#diff-829a8674171f92acd61007bedb1bfa4f \| spark.worker.cleanup.enabled \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.cleanup.interval \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.cleanup.appDataTtl \| 1.0.0 \| SPARK-1154 \| 1440154c27ca48b5a75103eccc9057286d3f6ca8#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.preferConfiguredMasterAddress \| 2.2.1 \| SPARK-20529 \| 75e5ea294c15ecfb7366ae15dce196aa92c87ca4#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.port \| 1.1.0 \| SPARK-2857 \| 12f99cf5f88faf94d9dbfe85cb72d0010a3a25ac#diff-48ca297b6536cb92362bec1487581f05 \| spark.worker.ui.retainedExecutors \| 1.5.0 \| SPARK-9202 \| c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.retainedDrivers \| 1.5.0 \| SPARK-9202 \| c0686668ae6a92b6bb4801a55c3b78aedbee816a#diff-916ca56b663f178f302c265b7ef38499 \| spark.worker.ui.compressedLogFileLengthCacheSize \| 2.0.2 \| SPARK-17711 \| 26e978a93f029e1a1b5c7524d0b52c8141b70997#diff-d239aee594001f8391676e1047a0381e \| spark.worker.decommission.enabled \| 3.1.0 \| SPARK-20628 \| d273a2bb0fac452a97f5670edd69d3e452e3e57e#diff-b2fc8d6ab7ac5735085e2d6cfacb95da \| ### Why are the changes needed? Supplemental configuration version information. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Exists UT Closes #27783 from beliefer/add-version-to-tests-config. Authored-by: beliefer <beliefer@163.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	2020-03-05 11:58:21 +09:00

1 2 3 4 5 ...

26699 commits