spark-instrumented-optimizer/sql/catalyst
Yuming Wang 3bc13e6412 [SPARK-32706][SQL] Improve cast string to decimal type
### What changes were proposed in this pull request?

This pr makes cast string type to decimal decimal type fast fail if precision larger that 38.

### Why are the changes needed?

It is very slow if precision very large.

Benchmark and benchmark result:
```scala
import org.apache.spark.benchmark.Benchmark
val bd1 = new java.math.BigDecimal("6.0790316E+25569151")
val bd2 = new java.math.BigDecimal("6.0790316E+25");

val benchmark = new Benchmark("Benchmark string to decimal", 1, minNumIters = 2)
benchmark.addCase(bd1.toString) { _ =>
  println(Decimal(bd1).precision)
}
benchmark.addCase(bd2.toString) { _ =>
  println(Decimal(bd2).precision)
}
benchmark.run()
```
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.6
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Benchmark string to decimal:              Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
6.0790316E+25569151                                9340           9381          57          0.0  9340094625.0       1.0X
6.0790316E+25                                         0              0           0          0.5        2150.0 4344230.1X
```
Stacktrace:
![image](https://user-images.githubusercontent.com/5399861/92941705-4c868980-f483-11ea-8a15-b93acde8c0f4.png)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test and benchmark test:
Dataset | Before this pr (Seconds) | After this pr (Seconds)
-- | -- | --
https://issues.apache.org/jira/secure/attachment/13011406/part-00000.parquet | 2640 | 2

Closes #29731 from wangyum/SPARK-32706.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2020-09-16 14:08:59 +00:00
..
benchmarks [SPARK-30413][SQL] Avoid WrappedArray roundtrip in GenericArrayData constructor, plus related optimization in ParquetMapConverter 2020-01-19 19:12:19 -08:00
src [SPARK-32706][SQL] Improve cast string to decimal type 2020-09-16 14:08:59 +00:00
pom.xml [SPARK-32312][SQL][PYTHON][TEST-JAVA11] Upgrade Apache Arrow to version 1.0.1 2020-09-10 14:16:19 +09:00