d1177b5230
### What changes were proposed in this pull request?
On the read-side, the char length check and padding bring issues to CBO and predicate pushdown and other issues to the catalyst.
This PR reverts 6da5cdf1db
that added read side length check) so that we only do length check for the write side, and data sources/vendors are responsible to enforce the char/varchar constraints for data import operations like ADD PARTITION. It doesn't make sense for Spark to report errors on the read-side if the data is already dirty.
This PR also moves the char padding to the write-side, so that it 1) avoids read side issues like CBO and filter pushdown. 2) the data source can preserve char type semantic better even if it's read by systems other than Spark.
### Why are the changes needed?
fix perf regression when tables have char/varchar type columns
closes #31278
### Does this PR introduce _any_ user-facing change?
yes, spark will not raise error for oversized char/varchar values in read side
### How was this patch tested?
modified ut
the dropped read side benchmark
```
================================================================================================
Char Varchar Read Side Perf w/o Tailing Spaces
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20 1564 1573 9 63.9 15.6 1.0X
read char with length 20 1532 1551 18 65.3 15.3 1.0X
read varchar with length 20 1520 1531 13 65.8 15.2 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 40: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40 1573 1613 41 63.6 15.7 1.0X
read char with length 40 1575 1577 2 63.5 15.7 1.0X
read varchar with length 40 1568 1576 11 63.8 15.7 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 60: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60 1526 1540 23 65.5 15.3 1.0X
read char with length 60 1514 1539 23 66.0 15.1 1.0X
read varchar with length 60 1486 1497 10 67.3 14.9 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 80: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80 1531 1542 19 65.3 15.3 1.0X
read char with length 80 1514 1529 15 66.0 15.1 1.0X
read varchar with length 80 1524 1565 42 65.6 15.2 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100 1597 1623 25 62.6 16.0 1.0X
read char with length 100 1499 1512 16 66.7 15.0 1.1X
read varchar with length 100 1517 1524 8 65.9 15.2 1.1X
================================================================================================
Char Varchar Read Side Perf w/ Tailing Spaces
================================================================================================
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 20 1524 1526 1 65.6 15.2 1.0X
read char with length 20 1532 1537 9 65.3 15.3 1.0X
read varchar with length 20 1520 1532 15 65.8 15.2 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 40: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 40 1556 1580 32 64.3 15.6 1.0X
read char with length 40 1600 1611 17 62.5 16.0 1.0X
read varchar with length 40 1648 1716 88 60.7 16.5 0.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 60: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 60 1504 1524 20 66.5 15.0 1.0X
read char with length 60 1509 1512 3 66.2 15.1 1.0X
read varchar with length 60 1519 1535 21 65.8 15.2 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 80: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 80 1640 1652 17 61.0 16.4 1.0X
read char with length 80 1625 1666 35 61.5 16.3 1.0X
read varchar with length 80 1590 1605 13 62.9 15.9 1.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
Intel(R) Core(TM) i9-9980HK CPU 2.40GHz
Read with length 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
read string with length 100 1622 1628 5 61.6 16.2 1.0X
read char with length 100 1614 1646 30 62.0 16.1 1.0X
read varchar with length 100 1594 1606 11 62.7 15.9 1.0X
```
Closes #31281 from yaooqinn/SPARK-34192.
Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
123 lines
10 KiB
Plaintext
123 lines
10 KiB
Plaintext
================================================================================================
|
|
Char Varchar Write Side Perf w/o Tailing Spaces
|
|
================================================================================================
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 5 9415 9961 487 4.2 235.4 1.0X
|
|
write char with length 5 12593 12972 590 3.2 314.8 0.7X
|
|
write varchar with length 5 9556 9753 317 4.2 238.9 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 10 4901 5072 172 4.1 245.0 1.0X
|
|
write char with length 10 7205 7344 122 2.8 360.2 0.7X
|
|
write varchar with length 10 4871 4967 152 4.1 243.6 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 20 2558 2575 17 3.9 255.8 1.0X
|
|
write char with length 20 4594 4807 185 2.2 459.4 0.6X
|
|
write varchar with length 20 2444 2523 94 4.1 244.4 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 40: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 40 1333 1335 2 3.8 266.6 1.0X
|
|
write char with length 40 3588 3654 67 1.4 717.7 0.4X
|
|
write varchar with length 40 1379 1397 20 3.6 275.8 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 60: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 60 932 948 15 3.6 279.6 1.0X
|
|
write char with length 60 3067 3118 61 1.1 920.2 0.3X
|
|
write varchar with length 60 928 938 9 3.6 278.3 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 80: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 80 731 740 13 3.4 292.6 1.0X
|
|
write char with length 80 2829 2838 8 0.9 1131.7 0.3X
|
|
write varchar with length 80 670 676 5 3.7 268.1 1.1X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 100 547 576 32 3.7 273.3 1.0X
|
|
write char with length 100 2744 2750 11 0.7 1371.9 0.2X
|
|
write varchar with length 100 601 604 5 3.3 300.5 0.9X
|
|
|
|
|
|
================================================================================================
|
|
Char Varchar Write Side Perf w/ Tailing Spaces
|
|
================================================================================================
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 5 16122 16432 294 2.5 403.0 1.0X
|
|
write char with length 5 16988 17124 196 2.4 424.7 0.9X
|
|
write varchar with length 5 16773 17067 300 2.4 419.3 1.0X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 10 9006 9242 205 2.2 450.3 1.0X
|
|
write char with length 10 10495 10564 62 1.9 524.7 0.9X
|
|
write varchar with length 10 10313 10515 177 1.9 515.7 0.9X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 20 6265 6301 32 1.6 626.5 1.0X
|
|
write char with length 20 7494 7586 89 1.3 749.4 0.8X
|
|
write varchar with length 20 7374 7565 167 1.4 737.4 0.8X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 40: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 40 4619 4638 20 1.1 923.7 1.0X
|
|
write char with length 40 5826 5964 125 0.9 1165.2 0.8X
|
|
write varchar with length 40 5937 6013 67 0.8 1187.5 0.8X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 60: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 60 4028 4052 28 0.8 1208.5 1.0X
|
|
write char with length 60 5172 5299 145 0.6 1551.7 0.8X
|
|
write varchar with length 60 5277 5399 191 0.6 1583.2 0.8X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 80: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 80 3875 3882 8 0.6 1549.9 1.0X
|
|
write char with length 80 5060 5169 96 0.5 2023.9 0.8X
|
|
write varchar with length 80 5322 5334 17 0.5 2128.7 0.7X
|
|
|
|
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.16
|
|
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
|
|
Write with length 100: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
|
------------------------------------------------------------------------------------------------------------------------
|
|
write string with length 100 3410 3541 115 0.6 1705.2 1.0X
|
|
write char with length 100 4760 4990 200 0.4 2380.0 0.7X
|
|
write varchar with length 100 5166 5185 17 0.4 2583.1 0.7X
|
|
|
|
|