Ludovic Henry
b52d47a920
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0
...
### What changes were proposed in this pull request?
Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2).
### Why are the changes needed?
Great performance improvement for ML-related workload on vanilla-distributions of Spark.
### Does this PR introduce _any_ user-facing change?
Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2).
### How was this patch tested?
Spark test-suite + dev.ludovic.netlib testsuite.
#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 220 226 6 454.9 2.2 1.0X
[info] java 221 228 5 451.9 2.2 1.0X
[info] native 209 215 5 478.7 2.1 1.1X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 121 125 3 823.3 1.2 1.0X
[info] java 121 125 3 824.3 1.2 1.0X
[info] native 101 105 3 988.4 1.0 1.2X
[info]
[info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 212 219 6 470.9 2.1 1.0X
[info] java 208 212 4 481.0 2.1 1.0X
[info] native 209 215 5 478.5 2.1 1.0X
[info]
[info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 114 119 3 878.9 1.1 1.0X
[info] java 99 105 3 1011.4 1.0 1.2X
[info] native 97 103 3 1026.7 1.0 1.2X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 108 111 2 925.9 1.1 1.0X
[info] java 71 73 2 1414.9 0.7 1.5X
[info] native 54 56 2 1847.0 0.5 2.0X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 96 97 2 1046.8 1.0 1.0X
[info] java 47 48 1 2129.8 0.5 2.0X
[info] native 29 30 1 3404.7 0.3 3.3X
[info]
[info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 139 143 2 718.2 1.4 1.0X
[info] java 46 47 1 2171.2 0.5 3.0X
[info] native 44 46 2 2261.8 0.4 3.1X
[info]
[info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 154 157 4 651.0 1.5 1.0X
[info] java 40 42 1 2469.3 0.4 3.8X
[info] native 26 27 1 3787.6 0.3 5.8X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 185 195 8 541.0 1.8 1.0X
[info] java 186 196 7 538.5 1.9 1.0X
[info] native 177 187 7 564.1 1.8 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 98 102 3 1016.2 1.0 1.0X
[info] java 98 102 3 1017.8 1.0 1.0X
[info] native 87 91 3 1143.2 0.9 1.1X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 68 70 1 1474.7 0.7 1.0X
[info] java 51 52 1 1973.0 0.5 1.3X
[info] native 30 32 1 3298.8 0.3 2.2X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 96 99 2 1037.9 1.0 1.0X
[info] java 50 51 1 1999.6 0.5 1.9X
[info] native 30 31 1 3368.1 0.3 3.2X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 59 61 1 1688.7 0.6 1.0X
[info] java 41 42 1 2461.9 0.4 1.5X
[info] native 15 16 1 6593.0 0.2 3.9X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 90 92 1 1116.2 0.9 1.0X
[info] java 39 40 1 2565.8 0.4 2.3X
[info] native 15 16 1 6594.2 0.2 5.9X
[info]
[info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 192 202 7 520.5 1.9 1.0X
[info] java 203 214 7 491.9 2.0 0.9X
[info] native 176 187 7 568.8 1.8 1.1X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 59 61 1 846.1 1.2 1.0X
[info] java 38 39 1 1313.5 0.8 1.6X
[info] native 24 27 1 2047.8 0.5 2.4X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 97 101 3 515.4 1.9 1.0X
[info] java 97 101 2 515.1 1.9 1.0X
[info] native 88 91 3 569.1 1.8 1.1X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 169 174 3 295.4 3.4 1.0X
[info] java 169 174 3 295.4 3.4 1.0X
[info] native 160 165 4 312.2 3.2 1.1X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 561 577 13 1782.3 0.6 1.0X
[info] java 225 231 4 4446.2 0.2 2.5X
[info] native 31 32 3 32473.1 0.0 18.2X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 570 584 9 1754.8 0.6 1.0X
[info] java 224 230 4 4457.3 0.2 2.5X
[info] native 31 32 1 32493.4 0.0 18.5X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 855 866 6 1169.2 0.9 1.0X
[info] java 224 228 3 4466.9 0.2 3.8X
[info] native 31 32 1 32395.5 0.0 27.7X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 1328 1344 8 752.8 1.3 1.0X
[info] java 224 230 4 4458.9 0.2 5.9X
[info] native 31 32 1 32201.8 0.0 42.8X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 534 541 5 1873.0 0.5 1.0X
[info] java 220 224 3 4542.8 0.2 2.4X
[info] native 15 16 1 66803.1 0.0 35.7X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 544 551 6 1839.6 0.5 1.0X
[info] java 220 224 4 4538.2 0.2 2.5X
[info] native 15 16 1 65589.9 0.0 35.7X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 833 845 21 1201.0 0.8 1.0X
[info] java 220 224 3 4548.7 0.2 3.8X
[info] native 15 16 1 66603.2 0.0 55.5X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 899 907 5 1112.9 0.9 1.0X
[info] java 221 224 2 4531.6 0.2 4.1X
[info] native 15 16 1 65944.9 0.0 59.3X
```
#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 195 200 3 512.2 2.0 1.0X
[info] java 197 202 3 507.0 2.0 1.0X
[info] native 184 189 4 543.0 1.8 1.1X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 108 112 3 921.8 1.1 1.0X
[info] java 101 105 3 989.4 1.0 1.1X
[info] native 87 91 3 1147.1 0.9 1.2X
[info]
[info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 187 191 3 535.1 1.9 1.0X
[info] java 182 188 3 548.8 1.8 1.0X
[info] native 178 182 3 562.2 1.8 1.1X
[info]
[info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 110 114 3 909.3 1.1 1.0X
[info] java 86 93 4 1159.3 0.9 1.3X
[info] native 86 90 3 1162.4 0.9 1.3X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 106 108 2 943.6 1.1 1.0X
[info] java 70 71 2 1426.8 0.7 1.5X
[info] native 54 56 2 1835.4 0.5 1.9X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 96 97 1 1047.1 1.0 1.0X
[info] java 43 44 1 2331.9 0.4 2.2X
[info] native 29 30 1 3392.1 0.3 3.2X
[info]
[info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 114 115 2 880.7 1.1 1.0X
[info] java 42 43 1 2398.1 0.4 2.7X
[info] native 45 46 1 2233.3 0.4 2.5X
[info]
[info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 140 143 2 714.6 1.4 1.0X
[info] java 28 29 1 3531.0 0.3 4.9X
[info] native 26 27 1 3820.0 0.3 5.3X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 156 166 7 641.3 1.6 1.0X
[info] java 158 167 6 633.2 1.6 1.0X
[info] native 150 160 7 664.8 1.5 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 85 88 2 1181.7 0.8 1.0X
[info] java 85 88 2 1176.0 0.9 1.0X
[info] native 75 78 2 1333.2 0.8 1.1X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 58 59 1 1731.1 0.6 1.0X
[info] java 41 43 1 2415.5 0.4 1.4X
[info] native 30 31 1 3293.9 0.3 1.9X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 94 96 1 1063.4 0.9 1.0X
[info] java 41 42 1 2435.8 0.4 2.3X
[info] native 30 30 1 3379.8 0.3 3.2X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 44 45 1 2278.9 0.4 1.0X
[info] java 37 38 0 2686.8 0.4 1.2X
[info] native 15 16 1 6555.4 0.2 2.9X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 88 89 1 1142.1 0.9 1.0X
[info] java 33 34 1 3010.7 0.3 2.6X
[info] native 15 16 1 6553.9 0.2 5.7X
[info]
[info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 164 172 4 609.4 1.6 1.0X
[info] java 163 172 5 612.6 1.6 1.0X
[info] native 150 159 4 667.0 1.5 1.1X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 49 50 1 1029.4 1.0 1.0X
[info] java 41 42 1 1209.4 0.8 1.2X
[info] native 25 27 1 2029.2 0.5 2.0X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 80 85 3 622.2 1.6 1.0X
[info] java 80 85 3 622.4 1.6 1.0X
[info] native 75 79 3 668.7 1.5 1.1X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 137 142 3 364.1 2.7 1.0X
[info] java 139 142 2 360.4 2.8 1.0X
[info] native 131 135 3 380.4 2.6 1.0X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 517 525 5 1935.5 0.5 1.0X
[info] java 213 216 3 4704.8 0.2 2.4X
[info] native 31 31 1 32705.6 0.0 16.9X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 589 601 6 1698.6 0.6 1.0X
[info] java 213 217 3 4693.3 0.2 2.8X
[info] native 31 32 1 32498.9 0.0 19.1X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 851 865 6 1175.3 0.9 1.0X
[info] java 212 216 3 4717.0 0.2 4.0X
[info] native 30 32 1 32903.0 0.0 28.0X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 1301 1316 6 768.4 1.3 1.0X
[info] java 212 216 2 4717.4 0.2 6.1X
[info] native 31 32 1 32606.0 0.0 42.4X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 454 460 2 2203.0 0.5 1.0X
[info] java 208 212 3 4803.8 0.2 2.2X
[info] native 15 16 0 66586.0 0.0 30.2X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 529 536 4 1889.7 0.5 1.0X
[info] java 208 212 3 4798.6 0.2 2.5X
[info] native 15 16 1 66751.4 0.0 35.3X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 830 840 5 1205.1 0.8 1.0X
[info] java 208 211 2 4814.1 0.2 4.0X
[info] native 15 15 1 67676.4 0.0 56.2X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 894 907 7 1118.7 0.9 1.0X
[info] java 208 211 3 4809.6 0.2 4.3X
[info] native 15 16 1 66675.2 0.0 59.6X
```
#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.VectorBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 193 199 3 517.5 1.9 1.0X
[info] java 181 186 4 553.2 1.8 1.1X
[info] native 181 185 5 553.6 1.8 1.1X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 108 112 2 925.1 1.1 1.0X
[info] java 88 91 3 1138.6 0.9 1.2X
[info] native 87 91 3 1144.2 0.9 1.2X
[info]
[info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 184 189 3 542.5 1.8 1.0X
[info] java 181 185 3 552.8 1.8 1.0X
[info] native 179 183 2 558.0 1.8 1.0X
[info]
[info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 97 101 3 1031.6 1.0 1.0X
[info] java 86 90 2 1163.7 0.9 1.1X
[info] native 85 88 2 1182.9 0.8 1.1X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 107 109 2 932.4 1.1 1.0X
[info] java 54 56 2 1846.7 0.5 2.0X
[info] native 54 56 2 1846.7 0.5 2.0X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 96 97 1 1043.6 1.0 1.0X
[info] java 29 30 1 3439.3 0.3 3.3X
[info] native 29 30 1 3423.9 0.3 3.3X
[info]
[info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 121 123 2 829.8 1.2 1.0X
[info] java 32 32 1 3171.3 0.3 3.8X
[info] native 45 46 1 2246.2 0.4 2.7X
[info]
[info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 142 144 2 705.9 1.4 1.0X
[info] java 15 16 1 6585.8 0.2 9.3X
[info] native 26 27 1 3839.5 0.3 5.4X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 157 165 5 635.6 1.6 1.0X
[info] java 151 159 5 664.0 1.5 1.0X
[info] native 151 160 5 663.6 1.5 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 85 89 2 1172.3 0.9 1.0X
[info] java 75 79 3 1337.3 0.7 1.1X
[info] native 75 79 2 1335.5 0.7 1.1X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 58 59 1 1731.5 0.6 1.0X
[info] java 28 29 1 3544.2 0.3 2.0X
[info] native 30 31 1 3306.2 0.3 1.9X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 90 92 1 1108.3 0.9 1.0X
[info] java 28 28 1 3622.5 0.3 3.3X
[info] native 30 31 1 3381.3 0.3 3.1X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 44 45 1 2284.7 0.4 1.0X
[info] java 14 15 1 7034.0 0.1 3.1X
[info] native 15 16 1 6643.7 0.2 2.9X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 85 86 1 1177.4 0.8 1.0X
[info] java 15 15 1 6886.1 0.1 5.8X
[info] native 15 16 1 6560.1 0.2 5.6X
[info]
[info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 164 173 6 608.1 1.6 1.0X
[info] java 148 157 5 675.2 1.5 1.1X
[info] native 152 160 5 659.9 1.5 1.1X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 61 63 1 815.4 1.2 1.0X
[info] java 16 17 1 3104.3 0.3 3.8X
[info] native 24 27 1 2071.9 0.5 2.5X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 81 85 2 616.4 1.6 1.0X
[info] java 81 85 2 614.7 1.6 1.0X
[info] native 75 78 2 669.5 1.5 1.1X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 138 141 3 362.7 2.8 1.0X
[info] java 137 140 2 365.3 2.7 1.0X
[info] native 131 134 2 382.9 2.6 1.1X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 525 544 8 1906.2 0.5 1.0X
[info] java 61 68 3 16358.1 0.1 8.6X
[info] native 31 32 1 32623.7 0.0 17.1X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 580 598 12 1724.5 0.6 1.0X
[info] java 61 68 4 16302.5 0.1 9.5X
[info] native 30 32 1 32962.8 0.0 19.1X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 829 838 4 1206.2 0.8 1.0X
[info] java 61 69 3 16339.7 0.1 13.5X
[info] native 30 31 1 33231.9 0.0 27.6X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 1352 1363 5 739.6 1.4 1.0X
[info] java 61 69 3 16347.0 0.1 22.1X
[info] native 31 32 1 32740.3 0.0 44.3X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 482 493 7 2073.1 0.5 1.0X
[info] java 35 38 2 28315.3 0.0 13.7X
[info] native 15 15 1 67579.7 0.0 32.6X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 472 482 4 2119.0 0.5 1.0X
[info] java 36 38 2 28138.1 0.0 13.3X
[info] native 15 16 1 66616.5 0.0 31.4X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 823 830 5 1215.2 0.8 1.0X
[info] java 35 38 2 28681.4 0.0 23.6X
[info] native 15 15 1 67908.4 0.0 55.9X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j 896 908 7 1115.8 0.9 1.0X
[info] java 35 38 2 28402.0 0.0 25.5X
[info] native 15 16 0 66691.2 0.0 59.8X
```
TODO:
- [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib`
- [ ] merge https://github.com/luhenry/netlib/pull/1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`.
Closes #32415 from luhenry/master.
Authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 08:59:36 -05:00
Ludovic Henry
5b77ebb57b
[SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib
...
### What changes were proposed in this pull request?
Following https://github.com/apache/spark/pull/30810 , I've continued looking for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate work done in the [`dev.ludovic.netlib`](https://github.com/luhenry/netlib/ ) Maven package.
The `dev.ludovic.netlib` library wraps the original `com.github.fommil.netlib` library and focus on accelerating the linear algebra routines in use in Spark. When running the `org.apache.spark.ml.linalg.BLASBenchmark` benchmarking suite, I get the results at [1] on an Intel machine. Moreover, this library is thoroughly tested to return the exact same results as the reference implementation.
Under the hood, it reimplements the necessary algorithms in pure autovectorization-friendly Java 8, as well as takes advantage of the Vector API and Foreign Linker API introduced in JDK 16 when available.
A table summarising which version gets loaded in which case:
```
| | BLAS.nativeBLAS | BLAS.javaBLAS |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| with -Pnetlib-lgpl | 1. dev.ludovic.netlib.blas.NetlibNativeBLAS, a | 1. dev.ludovic.netlib.blas.VectorizedBLAS |
| | wrapper for com.github.fommil:all | (JDK16+, relies on the Vector API, requires |
| | 2. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+, | `--add-modules=jdk.incubator.vector` on JDK16) |
| | relies on the Foreign Linker API, requires | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+) |
| | `--add-modules=jdk.incubator.foreign | 3. dev.ludovic.netlib.blas.JavaBLAS |
| | -Dforeign.restricted=warn`) | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a |
| | 3. fails to load, falls back to BLAS.javaBLAS in | wrapper for com.github.fommil:core |
| | org.apache.spark.ml.linalg.BLAS | |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| without -Pnetlib-lgpl | 1. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+, | 1. dev.ludovic.netlib.blas.VectorizedBLAS |
| | relies on the Foreign Linker API, requires | (JDK16+, relies on the Vector API, requires |
| | `--add-modules=jdk.incubator.foreign | `--add-modules=jdk.incubator.vector` on JDK16) |
| | -Dforeign.restricted=warn`) | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+) |
| | 2. fails to load, falls back to BLAS.javaBLAS in | 3. dev.ludovic.netlib.blas.JavaBLAS |
| | org.apache.spark.ml.linalg.BLAS | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a |
| | | wrapper for com.github.fommil:core |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
```
### Why are the changes needed?
Accelerates linear algebra operations when the pure-java fallback method is in use. Transparently falls back to native implementation (OpenBLAS, MKL) when available.
### Does this PR introduce _any_ user-facing change?
No, all changes are transparent to the user.
### How was this patch tested?
The `dev.ludovic.netlib` library has its own test suite [2]. It has also been validated by running the Spark test suite and benchmarking suite.
[1] Results for `org.apache.spark.ml.linalg.BLASBenchmark`:
#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java8BLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 223 232 8 448.0 2.2 1.0X
[info] java 221 228 7 453.0 2.2 1.0X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 122 128 4 821.2 1.2 1.0X
[info] java 122 128 4 822.3 1.2 1.0X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 109 112 2 921.4 1.1 1.0X
[info] java 70 74 3 1423.5 0.7 1.5X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 96 98 2 1046.1 1.0 1.0X
[info] java 47 49 2 2121.7 0.5 2.0X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 184 195 8 544.3 1.8 1.0X
[info] java 185 196 7 539.5 1.9 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 99 104 4 1011.9 1.0 1.0X
[info] java 99 104 4 1010.4 1.0 1.0X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 947.2 1.1 1.0X
[info] java 0 0 0 1584.8 0.6 1.7X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 867.4 1.2 1.0X
[info] java 1 1 0 865.0 1.2 1.0X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 485.9 2.1 1.0X
[info] java 1 1 0 486.8 2.1 1.0X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1843.0 0.5 1.0X
[info] java 0 0 0 2690.6 0.4 1.5X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1214.7 0.8 1.0X
[info] java 0 0 0 2536.8 0.4 2.1X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1895.9 0.5 1.0X
[info] java 0 0 0 2961.1 0.3 1.6X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1223.4 0.8 1.0X
[info] java 0 0 0 3091.4 0.3 2.5X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 560 575 20 1787.1 0.6 1.0X
[info] java 226 232 5 4432.4 0.2 2.5X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 570 586 23 1755.2 0.6 1.0X
[info] java 227 232 4 4410.1 0.2 2.5X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 863 879 17 1158.4 0.9 1.0X
[info] java 227 231 3 4407.9 0.2 3.8X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1282 1305 23 780.0 1.3 1.0X
[info] java 227 232 4 4413.4 0.2 5.7X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 538 548 8 1858.6 0.5 1.0X
[info] java 221 226 3 4521.1 0.2 2.4X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 549 558 10 1819.9 0.5 1.0X
[info] java 222 229 7 4503.5 0.2 2.5X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 838 852 12 1193.0 0.8 1.0X
[info] java 222 229 5 4500.5 0.2 3.8X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 905 919 18 1104.8 0.9 1.0X
[info] java 221 228 5 4521.3 0.2 4.1X
```
#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java11BLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 195 204 10 512.7 2.0 1.0X
[info] java 195 202 7 512.4 2.0 1.0X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 108 113 4 923.3 1.1 1.0X
[info] java 102 107 4 984.4 1.0 1.1X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 107 110 3 938.1 1.1 1.0X
[info] java 69 72 3 1447.1 0.7 1.5X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 96 98 2 1046.5 1.0 1.0X
[info] java 43 45 2 2317.1 0.4 2.2X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 155 168 8 644.2 1.6 1.0X
[info] java 158 169 8 632.8 1.6 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 85 90 4 1178.1 0.8 1.0X
[info] java 86 90 4 1167.7 0.9 1.0X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 0 0 0 1182.1 0.8 1.0X
[info] java 0 0 0 1432.1 0.7 1.2X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 898.7 1.1 1.0X
[info] java 1 1 0 891.5 1.1 1.0X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 495.4 2.0 1.0X
[info] java 1 1 0 495.7 2.0 1.0X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 0 0 0 2271.6 0.4 1.0X
[info] java 0 0 0 3648.1 0.3 1.6X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1229.3 0.8 1.0X
[info] java 0 0 0 2711.3 0.4 2.2X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 0 0 0 2677.5 0.4 1.0X
[info] java 0 0 0 3288.2 0.3 1.2X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1233.0 0.8 1.0X
[info] java 0 0 0 2766.3 0.4 2.2X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 520 536 16 1923.6 0.5 1.0X
[info] java 214 221 7 4669.5 0.2 2.4X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 593 612 17 1686.5 0.6 1.0X
[info] java 215 219 3 4643.3 0.2 2.8X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 853 870 16 1172.8 0.9 1.0X
[info] java 215 218 3 4659.7 0.2 4.0X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1350 1370 23 740.8 1.3 1.0X
[info] java 215 219 4 4656.6 0.2 6.3X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 460 468 6 2173.2 0.5 1.0X
[info] java 210 213 2 4752.7 0.2 2.2X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 535 544 8 1869.3 0.5 1.0X
[info] java 210 215 5 4761.8 0.2 2.5X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 843 853 11 1186.8 0.8 1.0X
[info] java 209 214 4 4793.4 0.2 4.0X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 891 904 15 1122.0 0.9 1.0X
[info] java 209 214 4 4777.2 0.2 4.3X
```
#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz
[info]
[info] f2jBLAS = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS = dev.ludovic.netlib.blas.VectorizedBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.VectorizedBLAS
[info]
[info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 194 199 7 515.7 1.9 1.0X
[info] java 181 186 3 551.1 1.8 1.1X
[info]
[info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 109 115 4 915.0 1.1 1.0X
[info] java 88 92 3 1138.8 0.9 1.2X
[info]
[info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 108 110 2 922.6 1.1 1.0X
[info] java 54 56 2 1839.2 0.5 2.0X
[info]
[info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 96 97 2 1046.1 1.0 1.0X
[info] java 29 30 1 3393.4 0.3 3.2X
[info]
[info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 156 165 5 643.0 1.6 1.0X
[info] java 150 159 5 667.1 1.5 1.0X
[info]
[info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 85 91 6 1171.0 0.9 1.0X
[info] java 75 79 3 1340.6 0.7 1.1X
[info]
[info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 917.0 1.1 1.0X
[info] java 0 0 0 8147.2 0.1 8.9X
[info]
[info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 859.3 1.2 1.0X
[info] java 1 1 0 859.3 1.2 1.0X
[info]
[info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 482.1 2.1 1.0X
[info] java 1 1 0 482.6 2.1 1.0X
[info]
[info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 0 0 0 2214.2 0.5 1.0X
[info] java 0 0 0 7975.8 0.1 3.6X
[info]
[info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1231.4 0.8 1.0X
[info] java 0 0 0 8680.9 0.1 7.0X
[info]
[info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 0 0 0 2684.3 0.4 1.0X
[info] java 0 0 0 18527.1 0.1 6.9X
[info]
[info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1 1 0 1235.4 0.8 1.0X
[info] java 0 0 0 17347.9 0.1 14.0X
[info]
[info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 530 552 18 1887.5 0.5 1.0X
[info] java 58 64 3 17143.9 0.1 9.1X
[info]
[info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 598 620 17 1671.1 0.6 1.0X
[info] java 58 64 3 17196.6 0.1 10.3X
[info]
[info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 834 847 14 1199.4 0.8 1.0X
[info] java 57 63 4 17486.9 0.1 14.6X
[info]
[info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 1338 1366 22 747.3 1.3 1.0X
[info] java 58 63 3 17356.6 0.1 23.2X
[info]
[info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 489 501 9 2045.5 0.5 1.0X
[info] java 36 38 2 27721.9 0.0 13.6X
[info]
[info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 478 488 9 2094.0 0.5 1.0X
[info] java 36 38 2 27813.2 0.0 13.3X
[info]
[info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 825 837 10 1211.6 0.8 1.0X
[info] java 35 38 2 28433.1 0.0 23.5X
[info]
[info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j 900 918 15 1111.6 0.9 1.0X
[info] java 36 38 2 28073.0 0.0 25.3X
```
[2] https://github.com/luhenry/netlib/tree/master/blas/src/test/java/dev/ludovic/netlib/blas
Closes #32253 from luhenry/master.
Authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-04-27 14:00:59 -05:00