[SPARK-32802][SQL] Avoid using SpecificInternalRow in RunLengthEncoding#Encoder
### What changes were proposed in this pull request? Currently `RunLengthEncoding#Encoder` uses `SpecificInternalRow` as a holder for the current value when calculating compression stats and doing the actual compression. It calls `ColumnType.copyField` and `ColumnType.getField` on the internal row which incurs extra cost comparing to directly operating on the internal type. This proposes to replace the `SpecificInternalRow` with `T#InternalType` to avoid the extra cost. ### Why are the changes needed? Operating on `SpecificInternalRow` carries certain cost and negatively impact performance when using `RunLengthEncoding` for compression. With the change I see some improvements through `CompressionSchemeBenchmark`: ```diff Intel(R) Core(TM) i9-9880H CPU 2.30GHz BOOLEAN Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 1 1 0 51957.0 0.0 1.0X -RunLengthEncoding(2.502) 549 555 9 122.2 8.2 0.0X -BooleanBitSet(0.125) 296 301 3 226.6 4.4 0.0X +PassThrough(1.000) 2 2 0 42985.4 0.0 1.0X +RunLengthEncoding(2.517) 487 500 10 137.7 7.3 0.0X +BooleanBitSet(0.125) 348 353 4 192.8 5.2 0.0X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz SHORT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 3 3 0 22779.9 0.0 1.0X -RunLengthEncoding(1.520) 1186 1192 9 56.6 17.7 0.0X +PassThrough(1.000) 3 4 0 21216.6 0.0 1.0X +RunLengthEncoding(1.493) 882 931 50 76.1 13.1 0.0X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz SHORT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 3 4 0 21352.2 0.0 1.0X -RunLengthEncoding(2.009) 1173 1175 3 57.2 17.5 0.0X +PassThrough(1.000) 3 3 0 22388.6 0.0 1.0X +RunLengthEncoding(2.015) 924 941 23 72.6 13.8 0.0X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz INT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 9 10 1 7410.1 0.1 1.0X -RunLengthEncoding(1.000) 1499 1502 4 44.8 22.3 0.0X -DictionaryEncoding(0.500) 621 630 11 108.0 9.3 0.0X -IntDelta(0.250) 134 149 10 502.0 2.0 0.1X +PassThrough(1.000) 9 10 1 7575.9 0.1 1.0X +RunLengthEncoding(1.002) 952 966 12 70.5 14.2 0.0X +DictionaryEncoding(0.500) 561 567 6 119.7 8.4 0.0X +IntDelta(0.250) 129 134 3 521.9 1.9 0.1X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz INT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 9 10 1 7668.3 0.1 1.0X -RunLengthEncoding(1.332) 1561 1685 175 43.0 23.3 0.0X -DictionaryEncoding(0.501) 616 642 21 108.9 9.2 0.0X -IntDelta(0.250) 126 131 2 533.4 1.9 0.1X +PassThrough(1.000) 9 10 1 7494.1 0.1 1.0X +RunLengthEncoding(1.336) 974 987 13 68.9 14.5 0.0X +DictionaryEncoding(0.501) 709 719 10 94.6 10.6 0.0X +IntDelta(0.250) 127 132 4 528.4 1.9 0.1X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz LONG Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 18 19 1 3803.0 0.3 1.0X -RunLengthEncoding(0.754) 1526 1540 20 44.0 22.7 0.0X -DictionaryEncoding(0.250) 735 759 33 91.3 11.0 0.0X -LongDelta(0.125) 126 129 2 530.8 1.9 0.1X +PassThrough(1.000) 19 21 1 3543.5 0.3 1.0X +RunLengthEncoding(0.747) 1049 1058 12 63.9 15.6 0.0X +DictionaryEncoding(0.250) 620 634 17 108.2 9.2 0.0X +LongDelta(0.125) 129 132 2 520.1 1.9 0.1X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz LONG Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 18 20 1 3705.4 0.3 1.0X -RunLengthEncoding(1.002) 1665 1669 6 40.3 24.8 0.0X -DictionaryEncoding(0.251) 890 901 11 75.4 13.3 0.0X -LongDelta(0.125) 125 130 3 537.2 1.9 0.1X +PassThrough(1.000) 18 20 2 3726.8 0.3 1.0X +RunLengthEncoding(0.999) 1076 1077 2 62.4 16.0 0.0X +DictionaryEncoding(0.251) 904 919 19 74.3 13.5 0.0X +LongDelta(0.125) 125 131 4 536.5 1.9 0.1X OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5 Intel(R) Core(TM) i9-9880H CPU 2.30GHz STRING Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -PassThrough(1.000) 27 30 2 2497.1 0.4 1.0X -RunLengthEncoding(0.892) 3443 3587 204 19.5 51.3 0.0X -DictionaryEncoding(0.167) 2286 2290 6 29.4 34.1 0.0X +PassThrough(1.000) 28 31 2 2430.2 0.4 1.0X +RunLengthEncoding(0.889) 1798 1800 3 37.3 26.8 0.0X +DictionaryEncoding(0.167) 1956 1959 4 34.3 29.1 0.0X ``` In the above diff, new results are with changes in this PR. It can be seen that encoding performance has improved quite a lot especially for string type. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Relies on existing unit tests. Closes #29654 from sunchao/SPARK-32802. Authored-by: Chao Sun <sunchao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
This commit is contained in:
parent
0549c20c6f
commit
a6d6ea3efe
|
@ -2,136 +2,136 @@
|
|||
Compression Scheme Benchmark
|
||||
================================================================================================
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
BOOLEAN Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 3 3 0 21087.3 0.0 1.0X
|
||||
RunLengthEncoding(2.514) 739 739 1 90.8 11.0 0.0X
|
||||
BooleanBitSet(0.125) 378 379 1 177.4 5.6 0.0X
|
||||
PassThrough(1.000) 1 1 0 53450.1 0.0 1.0X
|
||||
RunLengthEncoding(2.496) 533 545 10 125.8 7.9 0.0X
|
||||
BooleanBitSet(0.125) 287 293 6 234.2 4.3 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
BOOLEAN Decode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 147 147 1 456.1 2.2 1.0X
|
||||
RunLengthEncoding 731 732 1 91.8 10.9 0.2X
|
||||
BooleanBitSet 1410 1411 1 47.6 21.0 0.1X
|
||||
PassThrough 105 108 2 638.6 1.6 1.0X
|
||||
RunLengthEncoding 490 497 6 136.8 7.3 0.2X
|
||||
BooleanBitSet 911 914 4 73.7 13.6 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 7 7 0 9729.9 0.1 1.0X
|
||||
RunLengthEncoding(1.491) 1576 1576 1 42.6 23.5 0.0X
|
||||
PassThrough(1.000) 3 3 0 20673.0 0.0 1.0X
|
||||
RunLengthEncoding(1.495) 750 757 9 89.5 11.2 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1151 1152 1 58.3 17.2 1.0X
|
||||
RunLengthEncoding 1619 1621 3 41.4 24.1 0.7X
|
||||
PassThrough 637 647 7 105.3 9.5 1.0X
|
||||
RunLengthEncoding 1056 1069 17 63.5 15.7 0.6X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 7 7 0 10135.7 0.1 1.0X
|
||||
RunLengthEncoding(2.010) 1659 1660 0 40.4 24.7 0.0X
|
||||
PassThrough(1.000) 3 3 0 21332.2 0.0 1.0X
|
||||
RunLengthEncoding(2.004) 768 783 15 87.4 11.4 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1151 1151 1 58.3 17.2 1.0X
|
||||
RunLengthEncoding 1655 1655 0 40.5 24.7 0.7X
|
||||
PassThrough 640 643 4 104.9 9.5 1.0X
|
||||
RunLengthEncoding 1073 1078 6 62.5 16.0 0.6X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 23 23 0 2952.8 0.3 1.0X
|
||||
RunLengthEncoding(0.997) 2356 2356 0 28.5 35.1 0.0X
|
||||
DictionaryEncoding(0.500) 1402 1402 0 47.9 20.9 0.0X
|
||||
IntDelta(0.250) 213 213 0 315.2 3.2 0.1X
|
||||
PassThrough(1.000) 9 9 1 7640.9 0.1 1.0X
|
||||
RunLengthEncoding(1.003) 882 883 2 76.1 13.1 0.0X
|
||||
DictionaryEncoding(0.500) 587 624 33 114.3 8.7 0.0X
|
||||
IntDelta(0.250) 122 127 5 549.8 1.8 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1319 1319 1 50.9 19.7 1.0X
|
||||
RunLengthEncoding 1803 1806 5 37.2 26.9 0.7X
|
||||
DictionaryEncoding 931 931 0 72.1 13.9 1.4X
|
||||
IntDelta 817 821 4 82.2 12.2 1.6X
|
||||
PassThrough 684 709 27 98.1 10.2 1.0X
|
||||
RunLengthEncoding 1068 1075 10 62.8 15.9 0.6X
|
||||
DictionaryEncoding 517 526 6 129.8 7.7 1.3X
|
||||
IntDelta 541 545 4 124.0 8.1 1.3X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 23 23 0 2976.8 0.3 1.0X
|
||||
RunLengthEncoding(1.337) 2552 2552 1 26.3 38.0 0.0X
|
||||
DictionaryEncoding(0.501) 1377 1377 0 48.7 20.5 0.0X
|
||||
IntDelta(0.250) 213 214 2 315.3 3.2 0.1X
|
||||
PassThrough(1.000) 9 10 1 7475.0 0.1 1.0X
|
||||
RunLengthEncoding(1.339) 908 922 12 73.9 13.5 0.0X
|
||||
DictionaryEncoding(0.501) 629 652 16 106.6 9.4 0.0X
|
||||
IntDelta(0.250) 124 128 3 542.5 1.8 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1438 1439 1 46.7 21.4 1.0X
|
||||
RunLengthEncoding 1987 1988 1 33.8 29.6 0.7X
|
||||
DictionaryEncoding 1249 1250 0 53.7 18.6 1.2X
|
||||
IntDelta 1135 1136 3 59.2 16.9 1.3X
|
||||
PassThrough 778 783 8 86.3 11.6 1.0X
|
||||
RunLengthEncoding 1217 1217 1 55.2 18.1 0.6X
|
||||
DictionaryEncoding 690 704 12 97.2 10.3 1.1X
|
||||
IntDelta 691 699 13 97.1 10.3 1.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 45 45 0 1487.9 0.7 1.0X
|
||||
RunLengthEncoding(0.750) 2496 2496 1 26.9 37.2 0.0X
|
||||
DictionaryEncoding(0.250) 1433 1433 1 46.8 21.4 0.0X
|
||||
LongDelta(0.125) 215 215 0 312.6 3.2 0.2X
|
||||
PassThrough(1.000) 18 19 1 3772.0 0.3 1.0X
|
||||
RunLengthEncoding(0.750) 985 987 2 68.1 14.7 0.0X
|
||||
DictionaryEncoding(0.250) 665 668 4 100.9 9.9 0.0X
|
||||
LongDelta(0.125) 124 128 2 539.4 1.9 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1468 1469 1 45.7 21.9 1.0X
|
||||
RunLengthEncoding 1977 1981 6 33.9 29.5 0.7X
|
||||
DictionaryEncoding 1248 1250 3 53.8 18.6 1.2X
|
||||
LongDelta 838 840 2 80.1 12.5 1.8X
|
||||
PassThrough 837 841 7 80.2 12.5 1.0X
|
||||
RunLengthEncoding 1177 1180 4 57.0 17.5 0.7X
|
||||
DictionaryEncoding 741 747 7 90.6 11.0 1.1X
|
||||
LongDelta 509 520 13 131.8 7.6 1.6X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 47 47 0 1437.2 0.7 1.0X
|
||||
RunLengthEncoding(1.002) 2743 2744 0 24.5 40.9 0.0X
|
||||
DictionaryEncoding(0.251) 2016 2016 0 33.3 30.0 0.0X
|
||||
LongDelta(0.125) 215 217 5 312.1 3.2 0.2X
|
||||
PassThrough(1.000) 18 20 1 3769.4 0.3 1.0X
|
||||
RunLengthEncoding(1.005) 1016 1054 54 66.1 15.1 0.0X
|
||||
DictionaryEncoding(0.251) 923 928 4 72.7 13.8 0.0X
|
||||
LongDelta(0.125) 125 127 2 538.8 1.9 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1468 1468 0 45.7 21.9 1.0X
|
||||
RunLengthEncoding 2020 2021 2 33.2 30.1 0.7X
|
||||
DictionaryEncoding 1248 1248 0 53.8 18.6 1.2X
|
||||
LongDelta 1131 1134 4 59.4 16.8 1.3X
|
||||
PassThrough 842 846 5 79.7 12.5 1.0X
|
||||
RunLengthEncoding 1222 1264 59 54.9 18.2 0.7X
|
||||
DictionaryEncoding 757 776 20 88.7 11.3 1.1X
|
||||
LongDelta 681 686 4 98.5 10.2 1.2X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
STRING Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 71 71 0 939.6 1.1 1.0X
|
||||
RunLengthEncoding(0.890) 6050 6052 2 11.1 90.2 0.0X
|
||||
DictionaryEncoding(0.167) 3723 3725 2 18.0 55.5 0.0X
|
||||
PassThrough(1.000) 27 29 2 2510.4 0.4 1.0X
|
||||
RunLengthEncoding(0.888) 1651 1663 18 40.7 24.6 0.0X
|
||||
DictionaryEncoding(0.167) 1851 1863 17 36.3 27.6 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 11.0.4+11-LTS on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 11.0.8+10-LTS on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
STRING Decode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 2804 2815 16 23.9 41.8 1.0X
|
||||
RunLengthEncoding 3390 3391 1 19.8 50.5 0.8X
|
||||
DictionaryEncoding 2901 2905 5 23.1 43.2 1.0X
|
||||
PassThrough 1485 1495 15 45.2 22.1 1.0X
|
||||
RunLengthEncoding 2010 2066 80 33.4 30.0 0.7X
|
||||
DictionaryEncoding 1788 1790 4 37.5 26.6 0.8X
|
||||
|
||||
|
||||
|
|
|
@ -2,136 +2,136 @@
|
|||
Compression Scheme Benchmark
|
||||
================================================================================================
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
BOOLEAN Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 3 3 0 21114.6 0.0 1.0X
|
||||
RunLengthEncoding(2.505) 694 696 4 96.7 10.3 0.0X
|
||||
BooleanBitSet(0.125) 366 366 0 183.4 5.5 0.0X
|
||||
PassThrough(1.000) 1 2 0 49671.6 0.0 1.0X
|
||||
RunLengthEncoding(2.501) 470 487 25 142.7 7.0 0.0X
|
||||
BooleanBitSet(0.125) 358 362 4 187.6 5.3 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
BOOLEAN Decode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 145 145 0 464.2 2.2 1.0X
|
||||
RunLengthEncoding 735 735 0 91.3 10.9 0.2X
|
||||
BooleanBitSet 1437 1437 1 46.7 21.4 0.1X
|
||||
PassThrough 90 95 5 746.2 1.3 1.0X
|
||||
RunLengthEncoding 550 559 8 122.0 8.2 0.2X
|
||||
BooleanBitSet 1082 1087 7 62.0 16.1 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 7 7 0 9336.6 0.1 1.0X
|
||||
RunLengthEncoding(1.494) 1912 1917 7 35.1 28.5 0.0X
|
||||
PassThrough(1.000) 3 4 0 20595.0 0.0 1.0X
|
||||
RunLengthEncoding(1.495) 1074 1087 19 62.5 16.0 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1104 1104 0 60.8 16.4 1.0X
|
||||
RunLengthEncoding 1627 1628 0 41.2 24.3 0.7X
|
||||
PassThrough 807 844 33 83.1 12.0 1.0X
|
||||
RunLengthEncoding 1077 1078 1 62.3 16.0 0.7X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 7 7 0 9710.6 0.1 1.0X
|
||||
RunLengthEncoding(2.003) 2021 2027 9 33.2 30.1 0.0X
|
||||
PassThrough(1.000) 3 3 0 23144.6 0.0 1.0X
|
||||
RunLengthEncoding(2.001) 1067 1073 8 62.9 15.9 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
SHORT Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1104 1104 0 60.8 16.5 1.0X
|
||||
RunLengthEncoding 1621 1621 0 41.4 24.1 0.7X
|
||||
PassThrough 793 811 16 84.7 11.8 1.0X
|
||||
RunLengthEncoding 1099 1123 33 61.1 16.4 0.7X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 24 24 0 2854.3 0.4 1.0X
|
||||
RunLengthEncoding(1.005) 2395 2396 2 28.0 35.7 0.0X
|
||||
DictionaryEncoding(0.500) 1366 1366 0 49.1 20.3 0.0X
|
||||
IntDelta(0.250) 286 287 0 234.2 4.3 0.1X
|
||||
PassThrough(1.000) 10 11 1 6979.9 0.1 1.0X
|
||||
RunLengthEncoding(1.000) 985 994 9 68.1 14.7 0.0X
|
||||
DictionaryEncoding(0.500) 896 903 10 74.9 13.4 0.0X
|
||||
IntDelta(0.250) 237 244 6 283.5 3.5 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1248 1248 0 53.8 18.6 1.0X
|
||||
RunLengthEncoding 1738 1739 2 38.6 25.9 0.7X
|
||||
DictionaryEncoding 969 970 0 69.2 14.4 1.3X
|
||||
IntDelta 777 779 1 86.3 11.6 1.6X
|
||||
PassThrough 791 795 3 84.8 11.8 1.0X
|
||||
RunLengthEncoding 1111 1114 5 60.4 16.6 0.7X
|
||||
DictionaryEncoding 641 650 17 104.7 9.6 1.2X
|
||||
IntDelta 560 575 24 119.8 8.4 1.4X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 23 23 0 2874.4 0.3 1.0X
|
||||
RunLengthEncoding(1.334) 2581 2581 0 26.0 38.5 0.0X
|
||||
DictionaryEncoding(0.501) 1490 1490 0 45.0 22.2 0.0X
|
||||
IntDelta(0.250) 286 286 0 234.5 4.3 0.1X
|
||||
PassThrough(1.000) 9 10 1 7181.9 0.1 1.0X
|
||||
RunLengthEncoding(1.336) 1006 1006 1 66.7 15.0 0.0X
|
||||
DictionaryEncoding(0.501) 1034 1045 15 64.9 15.4 0.0X
|
||||
IntDelta(0.250) 235 238 2 285.7 3.5 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
INT Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1389 1389 0 48.3 20.7 1.0X
|
||||
RunLengthEncoding 1903 1903 0 35.3 28.4 0.7X
|
||||
DictionaryEncoding 1231 1232 1 54.5 18.3 1.1X
|
||||
IntDelta 1103 1108 7 60.8 16.4 1.3X
|
||||
PassThrough 829 832 3 81.0 12.3 1.0X
|
||||
RunLengthEncoding 1199 1207 11 56.0 17.9 0.7X
|
||||
DictionaryEncoding 725 726 1 92.6 10.8 1.1X
|
||||
IntDelta 680 683 5 98.6 10.1 1.2X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Encode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 48 48 0 1405.2 0.7 1.0X
|
||||
RunLengthEncoding(0.757) 2525 2525 1 26.6 37.6 0.0X
|
||||
DictionaryEncoding(0.250) 1380 1381 1 48.6 20.6 0.0X
|
||||
LongDelta(0.125) 474 474 0 141.7 7.1 0.1X
|
||||
PassThrough(1.000) 20 22 1 3405.6 0.3 1.0X
|
||||
RunLengthEncoding(0.747) 1097 1102 7 61.2 16.3 0.0X
|
||||
DictionaryEncoding(0.250) 854 933 74 78.6 12.7 0.0X
|
||||
LongDelta(0.125) 322 328 11 208.5 4.8 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Decode (Lower Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1348 1349 0 49.8 20.1 1.0X
|
||||
RunLengthEncoding 1850 1851 2 36.3 27.6 0.7X
|
||||
DictionaryEncoding 1190 1192 3 56.4 17.7 1.1X
|
||||
LongDelta 801 801 0 83.8 11.9 1.7X
|
||||
PassThrough 839 843 4 80.0 12.5 1.0X
|
||||
RunLengthEncoding 1234 1234 1 54.4 18.4 0.7X
|
||||
DictionaryEncoding 806 809 3 83.3 12.0 1.0X
|
||||
LongDelta 550 558 6 122.0 8.2 1.5X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Encode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 46 46 0 1451.2 0.7 1.0X
|
||||
RunLengthEncoding(1.003) 2742 2743 1 24.5 40.9 0.0X
|
||||
DictionaryEncoding(0.251) 1714 1715 0 39.1 25.5 0.0X
|
||||
LongDelta(0.125) 476 476 0 140.9 7.1 0.1X
|
||||
PassThrough(1.000) 20 22 1 3319.5 0.3 1.0X
|
||||
RunLengthEncoding(1.005) 1153 1169 24 58.2 17.2 0.0X
|
||||
DictionaryEncoding(0.251) 923 930 9 72.7 13.7 0.0X
|
||||
LongDelta(0.125) 327 332 4 205.0 4.9 0.1X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
LONG Decode (Higher Skew): Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 1362 1363 1 49.3 20.3 1.0X
|
||||
RunLengthEncoding 1862 1863 1 36.0 27.7 0.7X
|
||||
DictionaryEncoding 1190 1192 3 56.4 17.7 1.1X
|
||||
LongDelta 1079 1082 4 62.2 16.1 1.3X
|
||||
PassThrough 854 864 16 78.6 12.7 1.0X
|
||||
RunLengthEncoding 1242 1244 3 54.0 18.5 0.7X
|
||||
DictionaryEncoding 823 823 1 81.6 12.3 1.0X
|
||||
LongDelta 640 651 8 104.8 9.5 1.3X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
STRING Encode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough(1.000) 67 67 0 994.8 1.0 1.0X
|
||||
RunLengthEncoding(0.888) 6135 6137 2 10.9 91.4 0.0X
|
||||
DictionaryEncoding(0.167) 3747 3748 0 17.9 55.8 0.0X
|
||||
PassThrough(1.000) 29 32 1 2279.8 0.4 1.0X
|
||||
RunLengthEncoding(0.886) 1723 1734 15 38.9 25.7 0.0X
|
||||
DictionaryEncoding(0.167) 2667 2690 33 25.2 39.7 0.0X
|
||||
|
||||
OpenJDK 64-Bit Server VM 1.8.0_222-b10 on Linux 3.10.0-862.3.2.el7.x86_64
|
||||
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
|
||||
OpenJDK 64-Bit Server VM 1.8.0_265-b01 on Mac OS X 10.15.5
|
||||
Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
|
||||
STRING Decode: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
|
||||
------------------------------------------------------------------------------------------------------------------------
|
||||
PassThrough 3180 3185 8 21.1 47.4 1.0X
|
||||
RunLengthEncoding 3658 3660 3 18.3 54.5 0.9X
|
||||
DictionaryEncoding 3292 3295 4 20.4 49.1 1.0X
|
||||
PassThrough 1847 1892 64 36.3 27.5 1.0X
|
||||
RunLengthEncoding 2305 2332 38 29.1 34.3 0.8X
|
||||
DictionaryEncoding 2134 2150 22 31.5 31.8 0.9X
|
||||
|
||||
|
||||
|
|
|
@ -23,7 +23,6 @@ import java.nio.ByteOrder
|
|||
import scala.collection.mutable
|
||||
|
||||
import org.apache.spark.sql.catalyst.InternalRow
|
||||
import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
|
||||
import org.apache.spark.sql.execution.columnar._
|
||||
import org.apache.spark.sql.execution.vectorized.WritableColumnVector
|
||||
import org.apache.spark.sql.types._
|
||||
|
@ -182,8 +181,7 @@ private[columnar] case object RunLengthEncoding extends CompressionScheme {
|
|||
private var _uncompressedSize = 0
|
||||
private var _compressedSize = 0
|
||||
|
||||
// Using `MutableRow` to store the last value to avoid boxing/unboxing cost.
|
||||
private val lastValue = new SpecificInternalRow(Seq(columnType.dataType))
|
||||
private var lastValue: T#InternalType = _
|
||||
private var lastRun = 0
|
||||
|
||||
override def uncompressedSize: Int = _uncompressedSize
|
||||
|
@ -195,16 +193,16 @@ private[columnar] case object RunLengthEncoding extends CompressionScheme {
|
|||
val actualSize = columnType.actualSize(row, ordinal)
|
||||
_uncompressedSize += actualSize
|
||||
|
||||
if (lastValue.isNullAt(0)) {
|
||||
columnType.copyField(row, ordinal, lastValue, 0)
|
||||
if (lastValue == null) {
|
||||
lastValue = columnType.clone(value)
|
||||
lastRun = 1
|
||||
_compressedSize += actualSize + 4
|
||||
} else {
|
||||
if (columnType.getField(lastValue, 0) == value) {
|
||||
if (lastValue == value) {
|
||||
lastRun += 1
|
||||
} else {
|
||||
_compressedSize += actualSize + 4
|
||||
columnType.copyField(row, ordinal, lastValue, 0)
|
||||
lastValue = columnType.clone(value)
|
||||
lastRun = 1
|
||||
}
|
||||
}
|
||||
|
@ -214,30 +212,27 @@ private[columnar] case object RunLengthEncoding extends CompressionScheme {
|
|||
to.putInt(RunLengthEncoding.typeId)
|
||||
|
||||
if (from.hasRemaining) {
|
||||
val currentValue = new SpecificInternalRow(Seq(columnType.dataType))
|
||||
var currentRun = 1
|
||||
val value = new SpecificInternalRow(Seq(columnType.dataType))
|
||||
|
||||
columnType.extract(from, currentValue, 0)
|
||||
var currentValue = columnType.extract(from)
|
||||
|
||||
while (from.hasRemaining) {
|
||||
columnType.extract(from, value, 0)
|
||||
val value = columnType.extract(from)
|
||||
|
||||
if (value.get(0, columnType.dataType) == currentValue.get(0, columnType.dataType)) {
|
||||
if (value == currentValue) {
|
||||
currentRun += 1
|
||||
} else {
|
||||
// Writes current run
|
||||
columnType.append(currentValue, 0, to)
|
||||
columnType.append(currentValue, to)
|
||||
to.putInt(currentRun)
|
||||
|
||||
// Resets current run
|
||||
columnType.copyField(value, 0, currentValue, 0)
|
||||
currentValue = value
|
||||
currentRun = 1
|
||||
}
|
||||
}
|
||||
|
||||
// Writes the last run
|
||||
columnType.append(currentValue, 0, to)
|
||||
columnType.append(currentValue, to)
|
||||
to.putInt(currentRun)
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in a new issue