spark-instrumented-optimizer

History

HyukjinKwon a30983db57 [SPARK-27512][SQL] Avoid to replace ',' in CSV's decimal type inference for backward compatibility ## What changes were proposed in this pull request? The code below currently infers as decimal but previously it was inferred as string. In branch-2.4, type inference path for decimal and parsing data are different. `2a8343121e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala (L153)` `c284c4e1f6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala (L125)` So the code below: ```scala scala> spark.read.option("delimiter", "\|").option("inferSchema", "true").csv(Seq("1,2").toDS).printSchema() ``` produced string as its type. ``` root \|-- _c0: string (nullable = true) ``` In the current master, it now infers decimal as below: ``` root \|-- _c0: decimal(2,0) (nullable = true) ``` It happened after https://github.com/apache/spark/pull/22979 because, now after this PR, we only have one way to parse decimal: `7a83d71403/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala (L92)` After the fix: ``` root \|-- _c0: string (nullable = true) ``` This PR proposes to restore the previous behaviour back in `CSVInferSchema`. ## How was this patch tested? Manually tested and unit tests were added. Closes #24437 from HyukjinKwon/SPARK-27512. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2019-04-24 16:22:07 +09:00
..
benchmarks	[SPARK-25657][SQL][TEST] Refactor HashBenchmark to use main method	2018-10-07 09:49:37 -07:00
src	[SPARK-27512][SQL] Avoid to replace ',' in CSV's decimal type inference for backward compatibility	2019-04-24 16:22:07 +09:00
pom.xml	[SPARK-27016][SQL][BUILD] Treat all antlr warnings as errors while generating parser from the sql grammar file.	2019-03-03 10:02:25 -06:00