Commit graph

3 commits

Author SHA1 Message Date
Ludovic Henry b52d47a920 [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0
### What changes were proposed in this pull request?

Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2).

### Why are the changes needed?

Great performance improvement for ML-related workload on vanilla-distributions of Spark.

### Does this PR introduce _any_ user-facing change?

Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2).

### How was this patch tested?

Spark test-suite + dev.ludovic.netlib testsuite.

#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        220            226           6        454.9           2.2       1.0X
[info] java                       221            228           5        451.9           2.2       1.0X
[info] native                     209            215           5        478.7           2.1       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        121            125           3        823.3           1.2       1.0X
[info] java                       121            125           3        824.3           1.2       1.0X
[info] native                     101            105           3        988.4           1.0       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        212            219           6        470.9           2.1       1.0X
[info] java                       208            212           4        481.0           2.1       1.0X
[info] native                     209            215           5        478.5           2.1       1.0X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        114            119           3        878.9           1.1       1.0X
[info] java                        99            105           3       1011.4           1.0       1.2X
[info] native                      97            103           3       1026.7           1.0       1.2X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            111           2        925.9           1.1       1.0X
[info] java                        71             73           2       1414.9           0.7       1.5X
[info] native                      54             56           2       1847.0           0.5       2.0X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           2       1046.8           1.0       1.0X
[info] java                        47             48           1       2129.8           0.5       2.0X
[info] native                      29             30           1       3404.7           0.3       3.3X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        139            143           2        718.2           1.4       1.0X
[info] java                        46             47           1       2171.2           0.5       3.0X
[info] native                      44             46           2       2261.8           0.4       3.1X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        154            157           4        651.0           1.5       1.0X
[info] java                        40             42           1       2469.3           0.4       3.8X
[info] native                      26             27           1       3787.6           0.3       5.8X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        185            195           8        541.0           1.8       1.0X
[info] java                       186            196           7        538.5           1.9       1.0X
[info] native                     177            187           7        564.1           1.8       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         98            102           3       1016.2           1.0       1.0X
[info] java                        98            102           3       1017.8           1.0       1.0X
[info] native                      87             91           3       1143.2           0.9       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         68             70           1       1474.7           0.7       1.0X
[info] java                        51             52           1       1973.0           0.5       1.3X
[info] native                      30             32           1       3298.8           0.3       2.2X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             99           2       1037.9           1.0       1.0X
[info] java                        50             51           1       1999.6           0.5       1.9X
[info] native                      30             31           1       3368.1           0.3       3.2X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         59             61           1       1688.7           0.6       1.0X
[info] java                        41             42           1       2461.9           0.4       1.5X
[info] native                      15             16           1       6593.0           0.2       3.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         90             92           1       1116.2           0.9       1.0X
[info] java                        39             40           1       2565.8           0.4       2.3X
[info] native                      15             16           1       6594.2           0.2       5.9X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        192            202           7        520.5           1.9       1.0X
[info] java                       203            214           7        491.9           2.0       0.9X
[info] native                     176            187           7        568.8           1.8       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         59             61           1        846.1           1.2       1.0X
[info] java                        38             39           1       1313.5           0.8       1.6X
[info] native                      24             27           1       2047.8           0.5       2.4X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         97            101           3        515.4           1.9       1.0X
[info] java                        97            101           2        515.1           1.9       1.0X
[info] native                      88             91           3        569.1           1.8       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        169            174           3        295.4           3.4       1.0X
[info] java                       169            174           3        295.4           3.4       1.0X
[info] native                     160            165           4        312.2           3.2       1.1X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        561            577          13       1782.3           0.6       1.0X
[info] java                       225            231           4       4446.2           0.2       2.5X
[info] native                      31             32           3      32473.1           0.0      18.2X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        570            584           9       1754.8           0.6       1.0X
[info] java                       224            230           4       4457.3           0.2       2.5X
[info] native                      31             32           1      32493.4           0.0      18.5X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        855            866           6       1169.2           0.9       1.0X
[info] java                       224            228           3       4466.9           0.2       3.8X
[info] native                      31             32           1      32395.5           0.0      27.7X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1328           1344           8        752.8           1.3       1.0X
[info] java                       224            230           4       4458.9           0.2       5.9X
[info] native                      31             32           1      32201.8           0.0      42.8X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        534            541           5       1873.0           0.5       1.0X
[info] java                       220            224           3       4542.8           0.2       2.4X
[info] native                      15             16           1      66803.1           0.0      35.7X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        544            551           6       1839.6           0.5       1.0X
[info] java                       220            224           4       4538.2           0.2       2.5X
[info] native                      15             16           1      65589.9           0.0      35.7X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        833            845          21       1201.0           0.8       1.0X
[info] java                       220            224           3       4548.7           0.2       3.8X
[info] native                      15             16           1      66603.2           0.0      55.5X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        899            907           5       1112.9           0.9       1.0X
[info] java                       221            224           2       4531.6           0.2       4.1X
[info] native                      15             16           1      65944.9           0.0      59.3X
```

#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        195            200           3        512.2           2.0       1.0X
[info] java                       197            202           3        507.0           2.0       1.0X
[info] native                     184            189           4        543.0           1.8       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            112           3        921.8           1.1       1.0X
[info] java                       101            105           3        989.4           1.0       1.1X
[info] native                      87             91           3       1147.1           0.9       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        187            191           3        535.1           1.9       1.0X
[info] java                       182            188           3        548.8           1.8       1.0X
[info] native                     178            182           3        562.2           1.8       1.1X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        110            114           3        909.3           1.1       1.0X
[info] java                        86             93           4       1159.3           0.9       1.3X
[info] native                      86             90           3       1162.4           0.9       1.3X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        106            108           2        943.6           1.1       1.0X
[info] java                        70             71           2       1426.8           0.7       1.5X
[info] native                      54             56           2       1835.4           0.5       1.9X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           1       1047.1           1.0       1.0X
[info] java                        43             44           1       2331.9           0.4       2.2X
[info] native                      29             30           1       3392.1           0.3       3.2X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        114            115           2        880.7           1.1       1.0X
[info] java                        42             43           1       2398.1           0.4       2.7X
[info] native                      45             46           1       2233.3           0.4       2.5X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        140            143           2        714.6           1.4       1.0X
[info] java                        28             29           1       3531.0           0.3       4.9X
[info] native                      26             27           1       3820.0           0.3       5.3X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        156            166           7        641.3           1.6       1.0X
[info] java                       158            167           6        633.2           1.6       1.0X
[info] native                     150            160           7        664.8           1.5       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             88           2       1181.7           0.8       1.0X
[info] java                        85             88           2       1176.0           0.9       1.0X
[info] native                      75             78           2       1333.2           0.8       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         58             59           1       1731.1           0.6       1.0X
[info] java                        41             43           1       2415.5           0.4       1.4X
[info] native                      30             31           1       3293.9           0.3       1.9X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         94             96           1       1063.4           0.9       1.0X
[info] java                        41             42           1       2435.8           0.4       2.3X
[info] native                      30             30           1       3379.8           0.3       3.2X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         44             45           1       2278.9           0.4       1.0X
[info] java                        37             38           0       2686.8           0.4       1.2X
[info] native                      15             16           1       6555.4           0.2       2.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         88             89           1       1142.1           0.9       1.0X
[info] java                        33             34           1       3010.7           0.3       2.6X
[info] native                      15             16           1       6553.9           0.2       5.7X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        164            172           4        609.4           1.6       1.0X
[info] java                       163            172           5        612.6           1.6       1.0X
[info] native                     150            159           4        667.0           1.5       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         49             50           1       1029.4           1.0       1.0X
[info] java                        41             42           1       1209.4           0.8       1.2X
[info] native                      25             27           1       2029.2           0.5       2.0X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         80             85           3        622.2           1.6       1.0X
[info] java                        80             85           3        622.4           1.6       1.0X
[info] native                      75             79           3        668.7           1.5       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        137            142           3        364.1           2.7       1.0X
[info] java                       139            142           2        360.4           2.8       1.0X
[info] native                     131            135           3        380.4           2.6       1.0X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        517            525           5       1935.5           0.5       1.0X
[info] java                       213            216           3       4704.8           0.2       2.4X
[info] native                      31             31           1      32705.6           0.0      16.9X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        589            601           6       1698.6           0.6       1.0X
[info] java                       213            217           3       4693.3           0.2       2.8X
[info] native                      31             32           1      32498.9           0.0      19.1X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        851            865           6       1175.3           0.9       1.0X
[info] java                       212            216           3       4717.0           0.2       4.0X
[info] native                      30             32           1      32903.0           0.0      28.0X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1301           1316           6        768.4           1.3       1.0X
[info] java                       212            216           2       4717.4           0.2       6.1X
[info] native                      31             32           1      32606.0           0.0      42.4X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        454            460           2       2203.0           0.5       1.0X
[info] java                       208            212           3       4803.8           0.2       2.2X
[info] native                      15             16           0      66586.0           0.0      30.2X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        529            536           4       1889.7           0.5       1.0X
[info] java                       208            212           3       4798.6           0.2       2.5X
[info] native                      15             16           1      66751.4           0.0      35.3X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        830            840           5       1205.1           0.8       1.0X
[info] java                       208            211           2       4814.1           0.2       4.0X
[info] native                      15             15           1      67676.4           0.0      56.2X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        894            907           7       1118.7           0.9       1.0X
[info] java                       208            211           3       4809.6           0.2       4.3X
[info] native                      15             16           1      66675.2           0.0      59.6X
```

#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.F2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.VectorBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS
[info]
[info] daxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        193            199           3        517.5           1.9       1.0X
[info] java                       181            186           4        553.2           1.8       1.1X
[info] native                     181            185           5        553.6           1.8       1.1X
[info]
[info] saxpy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        108            112           2        925.1           1.1       1.0X
[info] java                        88             91           3       1138.6           0.9       1.2X
[info] native                      87             91           3       1144.2           0.9       1.2X
[info]
[info] dcopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        184            189           3        542.5           1.8       1.0X
[info] java                       181            185           3        552.8           1.8       1.0X
[info] native                     179            183           2        558.0           1.8       1.0X
[info]
[info] scopy:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         97            101           3       1031.6           1.0       1.0X
[info] java                        86             90           2       1163.7           0.9       1.1X
[info] native                      85             88           2       1182.9           0.8       1.1X
[info]
[info] ddot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        107            109           2        932.4           1.1       1.0X
[info] java                        54             56           2       1846.7           0.5       2.0X
[info] native                      54             56           2       1846.7           0.5       2.0X
[info]
[info] sdot:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         96             97           1       1043.6           1.0       1.0X
[info] java                        29             30           1       3439.3           0.3       3.3X
[info] native                      29             30           1       3423.9           0.3       3.3X
[info]
[info] dnrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        121            123           2        829.8           1.2       1.0X
[info] java                        32             32           1       3171.3           0.3       3.8X
[info] native                      45             46           1       2246.2           0.4       2.7X
[info]
[info] snrm2:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        142            144           2        705.9           1.4       1.0X
[info] java                        15             16           1       6585.8           0.2       9.3X
[info] native                      26             27           1       3839.5           0.3       5.4X
[info]
[info] dscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        157            165           5        635.6           1.6       1.0X
[info] java                       151            159           5        664.0           1.5       1.0X
[info] native                     151            160           5        663.6           1.5       1.0X
[info]
[info] sscal:           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             89           2       1172.3           0.9       1.0X
[info] java                        75             79           3       1337.3           0.7       1.1X
[info] native                      75             79           2       1335.5           0.7       1.1X
[info]
[info] dgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         58             59           1       1731.5           0.6       1.0X
[info] java                        28             29           1       3544.2           0.3       2.0X
[info] native                      30             31           1       3306.2           0.3       1.9X
[info]
[info] dgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         90             92           1       1108.3           0.9       1.0X
[info] java                        28             28           1       3622.5           0.3       3.3X
[info] native                      30             31           1       3381.3           0.3       3.1X
[info]
[info] sgemv[N]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         44             45           1       2284.7           0.4       1.0X
[info] java                        14             15           1       7034.0           0.1       3.1X
[info] native                      15             16           1       6643.7           0.2       2.9X
[info]
[info] sgemv[T]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         85             86           1       1177.4           0.8       1.0X
[info] java                        15             15           1       6886.1           0.1       5.8X
[info] native                      15             16           1       6560.1           0.2       5.6X
[info]
[info] dger:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        164            173           6        608.1           1.6       1.0X
[info] java                       148            157           5        675.2           1.5       1.1X
[info] native                     152            160           5        659.9           1.5       1.1X
[info]
[info] dspmv[U]:        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         61             63           1        815.4           1.2       1.0X
[info] java                        16             17           1       3104.3           0.3       3.8X
[info] native                      24             27           1       2071.9           0.5       2.5X
[info]
[info] dspr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                         81             85           2        616.4           1.6       1.0X
[info] java                        81             85           2        614.7           1.6       1.0X
[info] native                      75             78           2        669.5           1.5       1.1X
[info]
[info] dsyr[U]:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        138            141           3        362.7           2.8       1.0X
[info] java                       137            140           2        365.3           2.7       1.0X
[info] native                     131            134           2        382.9           2.6       1.1X
[info]
[info] dgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        525            544           8       1906.2           0.5       1.0X
[info] java                        61             68           3      16358.1           0.1       8.6X
[info] native                      31             32           1      32623.7           0.0      17.1X
[info]
[info] dgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        580            598          12       1724.5           0.6       1.0X
[info] java                        61             68           4      16302.5           0.1       9.5X
[info] native                      30             32           1      32962.8           0.0      19.1X
[info]
[info] dgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        829            838           4       1206.2           0.8       1.0X
[info] java                        61             69           3      16339.7           0.1      13.5X
[info] native                      30             31           1      33231.9           0.0      27.6X
[info]
[info] dgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                       1352           1363           5        739.6           1.4       1.0X
[info] java                        61             69           3      16347.0           0.1      22.1X
[info] native                      31             32           1      32740.3           0.0      44.3X
[info]
[info] sgemm[N,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        482            493           7       2073.1           0.5       1.0X
[info] java                        35             38           2      28315.3           0.0      13.7X
[info] native                      15             15           1      67579.7           0.0      32.6X
[info]
[info] sgemm[N,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        472            482           4       2119.0           0.5       1.0X
[info] java                        36             38           2      28138.1           0.0      13.3X
[info] native                      15             16           1      66616.5           0.0      31.4X
[info]
[info] sgemm[T,N]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        823            830           5       1215.2           0.8       1.0X
[info] java                        35             38           2      28681.4           0.0      23.6X
[info] native                      15             15           1      67908.4           0.0      55.9X
[info]
[info] sgemm[T,T]:      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] -----------------------------------------------------------------------------------------------
[info] f2j                        896            908           7       1115.8           0.9       1.0X
[info] java                        35             38           2      28402.0           0.0      25.5X
[info] native                      15             16           0      66691.2           0.0      59.8X
```

TODO:
- [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib`
- [ ] merge https://github.com/luhenry/netlib/pull/1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`.

Closes #32415 from luhenry/master.

Authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 08:59:36 -05:00
Ludovic Henry 5b77ebb57b [SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib
### What changes were proposed in this pull request?

Following https://github.com/apache/spark/pull/30810, I've continued looking for ways to accelerate the usage of BLAS in Spark. With this PR, I integrate work done in the [`dev.ludovic.netlib`](https://github.com/luhenry/netlib/) Maven package.

The `dev.ludovic.netlib` library wraps the original `com.github.fommil.netlib` library and focus on accelerating the linear algebra routines in use in Spark. When running the `org.apache.spark.ml.linalg.BLASBenchmark` benchmarking suite, I get the results at [1] on an Intel machine. Moreover, this library is thoroughly tested to return the exact same results as the reference implementation.

Under the hood, it reimplements the necessary algorithms in pure autovectorization-friendly Java 8, as well as takes advantage of the Vector API and Foreign Linker API introduced in JDK 16 when available.

A table summarising which version gets loaded in which case:

```
|                       | BLAS.nativeBLAS                                    | BLAS.javaBLAS                                      |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| with -Pnetlib-lgpl    | 1. dev.ludovic.netlib.blas.NetlibNativeBLAS, a     | 1. dev.ludovic.netlib.blas.VectorizedBLAS          |
|                       |     wrapper for com.github.fommil:all              |    (JDK16+, relies on the Vector API, requires     |
|                       | 2. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+,    |     `--add-modules=jdk.incubator.vector` on JDK16) |
|                       |    relies on the Foreign Linker API, requires      | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+)     |
|                       |    `--add-modules=jdk.incubator.foreign            | 3. dev.ludovic.netlib.blas.JavaBLAS                |
|                       |     -Dforeign.restricted=warn`)                    | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a        |
|                       | 3. fails to load, falls back to BLAS.javaBLAS in   |     wrapper for com.github.fommil:core             |
|                       |     org.apache.spark.ml.linalg.BLAS                |                                                    |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
| without -Pnetlib-lgpl | 1. dev.ludovic.netlib.blas.ForeignBLAS (JDK16+,    | 1. dev.ludovic.netlib.blas.VectorizedBLAS          |
|                       |    relies on the Foreign Linker API, requires      |    (JDK16+, relies on the Vector API, requires     |
|                       |    `--add-modules=jdk.incubator.foreign            |     `--add-modules=jdk.incubator.vector` on JDK16) |
|                       |     -Dforeign.restricted=warn`)                    | 2. dev.ludovic.netlib.blas.Java11BLAS (JDK11+)     |
|                       | 2. fails to load, falls back to BLAS.javaBLAS in   | 3. dev.ludovic.netlib.blas.JavaBLAS                |
|                       |     org.apache.spark.ml.linalg.BLAS                | 4. dev.ludovic.netlib.blas.NetlibF2jBLAS, a        |
|                       |                                                    |     wrapper for com.github.fommil:core             |
| --------------------- | -------------------------------------------------- | -------------------------------------------------- |
```

### Why are the changes needed?

Accelerates linear algebra operations when the pure-java fallback method is in use. Transparently falls back to native implementation (OpenBLAS, MKL) when available.

### Does this PR introduce _any_ user-facing change?

No, all changes are transparent to the user.

### How was this patch tested?

The `dev.ludovic.netlib` library has its own test suite [2]. It has also been validated by running the Spark test suite and benchmarking suite.

[1] Results for `org.apache.spark.ml.linalg.BLASBenchmark`:
#### JDK8:
```
[info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java8BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java8BLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 223            232           8        448.0           2.2       1.0X
[info] java                                                221            228           7        453.0           2.2       1.0X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 122            128           4        821.2           1.2       1.0X
[info] java                                                122            128           4        822.3           1.2       1.0X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 109            112           2        921.4           1.1       1.0X
[info] java                                                 70             74           3       1423.5           0.7       1.5X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             98           2       1046.1           1.0       1.0X
[info] java                                                 47             49           2       2121.7           0.5       2.0X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 184            195           8        544.3           1.8       1.0X
[info] java                                                185            196           7        539.5           1.9       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  99            104           4       1011.9           1.0       1.0X
[info] java                                                 99            104           4       1010.4           1.0       1.0X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        947.2           1.1       1.0X
[info] java                                                  0              0           0       1584.8           0.6       1.7X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        867.4           1.2       1.0X
[info] java                                                  1              1           0        865.0           1.2       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        485.9           2.1       1.0X
[info] java                                                  1              1           0        486.8           2.1       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1843.0           0.5       1.0X
[info] java                                                  0              0           0       2690.6           0.4       1.5X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1214.7           0.8       1.0X
[info] java                                                  0              0           0       2536.8           0.4       2.1X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1895.9           0.5       1.0X
[info] java                                                  0              0           0       2961.1           0.3       1.6X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1223.4           0.8       1.0X
[info] java                                                  0              0           0       3091.4           0.3       2.5X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 560            575          20       1787.1           0.6       1.0X
[info] java                                                226            232           5       4432.4           0.2       2.5X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 570            586          23       1755.2           0.6       1.0X
[info] java                                                227            232           4       4410.1           0.2       2.5X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 863            879          17       1158.4           0.9       1.0X
[info] java                                                227            231           3       4407.9           0.2       3.8X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1282           1305          23        780.0           1.3       1.0X
[info] java                                                227            232           4       4413.4           0.2       5.7X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 538            548           8       1858.6           0.5       1.0X
[info] java                                                221            226           3       4521.1           0.2       2.4X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 549            558          10       1819.9           0.5       1.0X
[info] java                                                222            229           7       4503.5           0.2       2.5X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 838            852          12       1193.0           0.8       1.0X
[info] java                                                222            229           5       4500.5           0.2       3.8X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 905            919          18       1104.8           0.9       1.0X
[info] java                                                221            228           5       4521.3           0.2       4.1X
```

#### JDK11:
```
[info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.Java11BLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.Java11BLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 195            204          10        512.7           2.0       1.0X
[info] java                                                195            202           7        512.4           2.0       1.0X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 108            113           4        923.3           1.1       1.0X
[info] java                                                102            107           4        984.4           1.0       1.1X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 107            110           3        938.1           1.1       1.0X
[info] java                                                 69             72           3       1447.1           0.7       1.5X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             98           2       1046.5           1.0       1.0X
[info] java                                                 43             45           2       2317.1           0.4       2.2X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 155            168           8        644.2           1.6       1.0X
[info] java                                                158            169           8        632.8           1.6       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  85             90           4       1178.1           0.8       1.0X
[info] java                                                 86             90           4       1167.7           0.9       1.0X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       1182.1           0.8       1.0X
[info] java                                                  0              0           0       1432.1           0.7       1.2X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        898.7           1.1       1.0X
[info] java                                                  1              1           0        891.5           1.1       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        495.4           2.0       1.0X
[info] java                                                  1              1           0        495.7           2.0       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2271.6           0.4       1.0X
[info] java                                                  0              0           0       3648.1           0.3       1.6X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1229.3           0.8       1.0X
[info] java                                                  0              0           0       2711.3           0.4       2.2X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2677.5           0.4       1.0X
[info] java                                                  0              0           0       3288.2           0.3       1.2X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1233.0           0.8       1.0X
[info] java                                                  0              0           0       2766.3           0.4       2.2X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 520            536          16       1923.6           0.5       1.0X
[info] java                                                214            221           7       4669.5           0.2       2.4X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 593            612          17       1686.5           0.6       1.0X
[info] java                                                215            219           3       4643.3           0.2       2.8X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 853            870          16       1172.8           0.9       1.0X
[info] java                                                215            218           3       4659.7           0.2       4.0X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1350           1370          23        740.8           1.3       1.0X
[info] java                                                215            219           4       4656.6           0.2       6.3X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 460            468           6       2173.2           0.5       1.0X
[info] java                                                210            213           2       4752.7           0.2       2.2X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 535            544           8       1869.3           0.5       1.0X
[info] java                                                210            215           5       4761.8           0.2       2.5X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 843            853          11       1186.8           0.8       1.0X
[info] java                                                209            214           4       4793.4           0.2       4.0X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 891            904          15       1122.0           0.9       1.0X
[info] java                                                209            214           4       4777.2           0.2       4.3X
```

#### JDK16:
```
[info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic
[info] Intel(R) Xeon(R) E-2276G CPU  3.80GHz
[info]
[info] f2jBLAS    = dev.ludovic.netlib.blas.NetlibF2jBLAS
[info] javaBLAS   = dev.ludovic.netlib.blas.VectorizedBLAS
[info] nativeBLAS = dev.ludovic.netlib.blas.VectorizedBLAS
[info]
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 194            199           7        515.7           1.9       1.0X
[info] java                                                181            186           3        551.1           1.8       1.1X
[info]
[info] saxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 109            115           4        915.0           1.1       1.0X
[info] java                                                 88             92           3       1138.8           0.9       1.2X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 108            110           2        922.6           1.1       1.0X
[info] java                                                 54             56           2       1839.2           0.5       2.0X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  96             97           2       1046.1           1.0       1.0X
[info] java                                                 29             30           1       3393.4           0.3       3.2X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 156            165           5        643.0           1.6       1.0X
[info] java                                                150            159           5        667.1           1.5       1.0X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  85             91           6       1171.0           0.9       1.0X
[info] java                                                 75             79           3       1340.6           0.7       1.1X
[info]
[info] dspmv[U]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        917.0           1.1       1.0X
[info] java                                                  0              0           0       8147.2           0.1       8.9X
[info]
[info] dspr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        859.3           1.2       1.0X
[info] java                                                  1              1           0        859.3           1.2       1.0X
[info]
[info] dsyr[U]:                                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0        482.1           2.1       1.0X
[info] java                                                  1              1           0        482.6           2.1       1.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2214.2           0.5       1.0X
[info] java                                                  0              0           0       7975.8           0.1       3.6X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1231.4           0.8       1.0X
[info] java                                                  0              0           0       8680.9           0.1       7.0X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   0              0           0       2684.3           0.4       1.0X
[info] java                                                  0              0           0      18527.1           0.1       6.9X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                   1              1           0       1235.4           0.8       1.0X
[info] java                                                  0              0           0      17347.9           0.1      14.0X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 530            552          18       1887.5           0.5       1.0X
[info] java                                                 58             64           3      17143.9           0.1       9.1X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 598            620          17       1671.1           0.6       1.0X
[info] java                                                 58             64           3      17196.6           0.1      10.3X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 834            847          14       1199.4           0.8       1.0X
[info] java                                                 57             63           4      17486.9           0.1      14.6X
[info]
[info] dgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                1338           1366          22        747.3           1.3       1.0X
[info] java                                                 58             63           3      17356.6           0.1      23.2X
[info]
[info] sgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 489            501           9       2045.5           0.5       1.0X
[info] java                                                 36             38           2      27721.9           0.0      13.6X
[info]
[info] sgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 478            488           9       2094.0           0.5       1.0X
[info] java                                                 36             38           2      27813.2           0.0      13.3X
[info]
[info] sgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 825            837          10       1211.6           0.8       1.0X
[info] java                                                 35             38           2      28433.1           0.0      23.5X
[info]
[info] sgemm[T,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 900            918          15       1111.6           0.9       1.0X
[info] java                                                 36             38           2      28073.0           0.0      25.3X
```

[2] https://github.com/luhenry/netlib/tree/master/blas/src/test/java/dev/ludovic/netlib/blas

Closes #32253 from luhenry/master.

Authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-04-27 14:00:59 -05:00
Xiaochang Wu 44c868b73a [SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
### What changes were proposed in this pull request?
Rewrite a clearer and complete BLAS native acceleration enabling guide.

### Why are the changes needed?
The document of enabling BLAS native acceleration in ML guide (https://spark.apache.org/docs/latest/ml-guide.html#dependencies) is incomplete and unclear to the user.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
N/A

Closes #29139 from xwu99/blas-doc.

Lead-authored-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Huaxin Gao <huaxing@us.ibm.com>
2020-07-28 08:36:11 -07:00