spark-instrumented-optimizer/docs/ml-guide.md

107 lines
7.4 KiB
Markdown
Raw Permalink Normal View History

[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
---
layout: global
2016-07-15 16:38:23 -04:00
title: "MLlib: Main Guide"
displayTitle: "Machine Learning Library (MLlib) Guide"
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
---
2016-07-15 16:38:23 -04:00
MLlib is Spark's machine learning (ML) library.
Its goal is to make practical machine learning scalable and easy.
At a high level, it provides tools such as:
2016-07-15 16:38:23 -04:00
* ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering
* Featurization: feature extraction, transformation, dimensionality reduction, and selection
* Pipelines: tools for constructing, evaluating, and tuning ML Pipelines
* Persistence: saving and load algorithms, models, and Pipelines
* Utilities: linear algebra, statistics, data handling, etc.
[SPARK-7555] [DOCS] Add doc for elastic net in ml-guide and mllib-guide jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`. dbtsai I left the code tab for you to add example code. Do you think it is the right place? Author: Shuo Xiang <shuoxiangpub@gmail.com> Closes #6504 from coderxiang/elasticnet and squashes the following commits: f6061ee [Shuo Xiang] typo 90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet 0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods 8747190 [Shuo Xiang] merge master 706d3f7 [Shuo Xiang] add python code 9bc2b4c [Shuo Xiang] typo db32a60 [Shuo Xiang] java code sample aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet a0dae07 [Shuo Xiang] simplify code d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md 78d9366 [Shuo Xiang] address comments 8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet 8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 9262a72 [Shuo Xiang] update 7e07d12 [Shuo Xiang] update b32f21a [Shuo Xiang] add doc for elastic net in sparkml 937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 98804c9 [Shuo Xiang] fix bug in topBykey and update test
2015-07-15 15:10:53 -04:00
2016-07-15 16:38:23 -04:00
# Announcement: DataFrame-based API is primary API
[SPARK-7555] [DOCS] Add doc for elastic net in ml-guide and mllib-guide jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`. dbtsai I left the code tab for you to add example code. Do you think it is the right place? Author: Shuo Xiang <shuoxiangpub@gmail.com> Closes #6504 from coderxiang/elasticnet and squashes the following commits: f6061ee [Shuo Xiang] typo 90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet 0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods 8747190 [Shuo Xiang] merge master 706d3f7 [Shuo Xiang] add python code 9bc2b4c [Shuo Xiang] typo db32a60 [Shuo Xiang] java code sample aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet a0dae07 [Shuo Xiang] simplify code d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md 78d9366 [Shuo Xiang] address comments 8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet 8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc 89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 9262a72 [Shuo Xiang] update 7e07d12 [Shuo Xiang] update b32f21a [Shuo Xiang] add doc for elastic net in sparkml 937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc 180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master' 98804c9 [Shuo Xiang] fix bug in topBykey and update test
2015-07-15 15:10:53 -04:00
2016-07-15 16:38:23 -04:00
**The MLlib RDD-based API is now in maintenance mode.**
As of Spark 2.0, the [RDD](rdd-programming-guide.html#resilient-distributed-datasets-rdds)-based APIs in the `spark.mllib` package have entered maintenance mode.
2016-07-15 16:38:23 -04:00
The primary Machine Learning API for Spark is now the [DataFrame](sql-programming-guide.html)-based API in the `spark.ml` package.
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
2016-07-15 16:38:23 -04:00
*What are the implications?*
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
2016-07-15 16:38:23 -04:00
* MLlib will still support the RDD-based API in `spark.mllib` with bug fixes.
* MLlib will not add new features to the RDD-based API.
* In the Spark 2.x releases, MLlib will add features to the DataFrames-based API to reach feature parity with the RDD-based API.
2016-07-15 16:38:23 -04:00
*Why is MLlib switching to the DataFrame-based API?*
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
2016-07-15 16:38:23 -04:00
* DataFrames provide a more user-friendly API than RDDs. The many benefits of DataFrames include Spark Datasources, SQL/DataFrame queries, Tungsten and Catalyst optimizations, and uniform APIs across languages.
* The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across multiple languages.
* DataFrames facilitate practical ML Pipelines, particularly feature transformations. See the [Pipelines guide](ml-pipeline.html) for details.
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
*What is "Spark ML"?*
* "Spark ML" is not an official name but occasionally used to refer to the MLlib DataFrame-based API.
This is majorly due to the `org.apache.spark.ml` Scala package name used by the DataFrame-based API,
and the "Spark ML Pipelines" term we used initially to emphasize the pipeline concept.
*Is MLlib deprecated?*
* No. MLlib includes both the RDD-based API and the DataFrame-based API.
The RDD-based API is now in maintenance mode.
But neither API is deprecated, nor MLlib as a whole.
2016-07-15 16:38:23 -04:00
# Dependencies
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0 ### What changes were proposed in this pull request? Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2). ### Why are the changes needed? Great performance improvement for ML-related workload on vanilla-distributions of Spark. ### Does this PR introduce _any_ user-facing change? Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2). ### How was this patch tested? Spark test-suite + dev.ludovic.netlib testsuite. #### JDK8: ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java8BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 220 226 6 454.9 2.2 1.0X [info] java 221 228 5 451.9 2.2 1.0X [info] native 209 215 5 478.7 2.1 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 125 3 823.3 1.2 1.0X [info] java 121 125 3 824.3 1.2 1.0X [info] native 101 105 3 988.4 1.0 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 212 219 6 470.9 2.1 1.0X [info] java 208 212 4 481.0 2.1 1.0X [info] native 209 215 5 478.5 2.1 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 119 3 878.9 1.1 1.0X [info] java 99 105 3 1011.4 1.0 1.2X [info] native 97 103 3 1026.7 1.0 1.2X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 111 2 925.9 1.1 1.0X [info] java 71 73 2 1414.9 0.7 1.5X [info] native 54 56 2 1847.0 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 2 1046.8 1.0 1.0X [info] java 47 48 1 2129.8 0.5 2.0X [info] native 29 30 1 3404.7 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 139 143 2 718.2 1.4 1.0X [info] java 46 47 1 2171.2 0.5 3.0X [info] native 44 46 2 2261.8 0.4 3.1X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 154 157 4 651.0 1.5 1.0X [info] java 40 42 1 2469.3 0.4 3.8X [info] native 26 27 1 3787.6 0.3 5.8X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 185 195 8 541.0 1.8 1.0X [info] java 186 196 7 538.5 1.9 1.0X [info] native 177 187 7 564.1 1.8 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 98 102 3 1016.2 1.0 1.0X [info] java 98 102 3 1017.8 1.0 1.0X [info] native 87 91 3 1143.2 0.9 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 68 70 1 1474.7 0.7 1.0X [info] java 51 52 1 1973.0 0.5 1.3X [info] native 30 32 1 3298.8 0.3 2.2X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 99 2 1037.9 1.0 1.0X [info] java 50 51 1 1999.6 0.5 1.9X [info] native 30 31 1 3368.1 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 1688.7 0.6 1.0X [info] java 41 42 1 2461.9 0.4 1.5X [info] native 15 16 1 6593.0 0.2 3.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1116.2 0.9 1.0X [info] java 39 40 1 2565.8 0.4 2.3X [info] native 15 16 1 6594.2 0.2 5.9X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 192 202 7 520.5 1.9 1.0X [info] java 203 214 7 491.9 2.0 0.9X [info] native 176 187 7 568.8 1.8 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 846.1 1.2 1.0X [info] java 38 39 1 1313.5 0.8 1.6X [info] native 24 27 1 2047.8 0.5 2.4X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 515.4 1.9 1.0X [info] java 97 101 2 515.1 1.9 1.0X [info] native 88 91 3 569.1 1.8 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 169 174 3 295.4 3.4 1.0X [info] java 169 174 3 295.4 3.4 1.0X [info] native 160 165 4 312.2 3.2 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 561 577 13 1782.3 0.6 1.0X [info] java 225 231 4 4446.2 0.2 2.5X [info] native 31 32 3 32473.1 0.0 18.2X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 570 584 9 1754.8 0.6 1.0X [info] java 224 230 4 4457.3 0.2 2.5X [info] native 31 32 1 32493.4 0.0 18.5X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 855 866 6 1169.2 0.9 1.0X [info] java 224 228 3 4466.9 0.2 3.8X [info] native 31 32 1 32395.5 0.0 27.7X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1328 1344 8 752.8 1.3 1.0X [info] java 224 230 4 4458.9 0.2 5.9X [info] native 31 32 1 32201.8 0.0 42.8X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 534 541 5 1873.0 0.5 1.0X [info] java 220 224 3 4542.8 0.2 2.4X [info] native 15 16 1 66803.1 0.0 35.7X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 544 551 6 1839.6 0.5 1.0X [info] java 220 224 4 4538.2 0.2 2.5X [info] native 15 16 1 65589.9 0.0 35.7X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 833 845 21 1201.0 0.8 1.0X [info] java 220 224 3 4548.7 0.2 3.8X [info] native 15 16 1 66603.2 0.0 55.5X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 899 907 5 1112.9 0.9 1.0X [info] java 221 224 2 4531.6 0.2 4.1X [info] native 15 16 1 65944.9 0.0 59.3X ``` #### JDK11: ``` [info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java11BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 195 200 3 512.2 2.0 1.0X [info] java 197 202 3 507.0 2.0 1.0X [info] native 184 189 4 543.0 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 3 921.8 1.1 1.0X [info] java 101 105 3 989.4 1.0 1.1X [info] native 87 91 3 1147.1 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 187 191 3 535.1 1.9 1.0X [info] java 182 188 3 548.8 1.8 1.0X [info] native 178 182 3 562.2 1.8 1.1X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 110 114 3 909.3 1.1 1.0X [info] java 86 93 4 1159.3 0.9 1.3X [info] native 86 90 3 1162.4 0.9 1.3X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 106 108 2 943.6 1.1 1.0X [info] java 70 71 2 1426.8 0.7 1.5X [info] native 54 56 2 1835.4 0.5 1.9X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1047.1 1.0 1.0X [info] java 43 44 1 2331.9 0.4 2.2X [info] native 29 30 1 3392.1 0.3 3.2X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 115 2 880.7 1.1 1.0X [info] java 42 43 1 2398.1 0.4 2.7X [info] native 45 46 1 2233.3 0.4 2.5X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 140 143 2 714.6 1.4 1.0X [info] java 28 29 1 3531.0 0.3 4.9X [info] native 26 27 1 3820.0 0.3 5.3X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 156 166 7 641.3 1.6 1.0X [info] java 158 167 6 633.2 1.6 1.0X [info] native 150 160 7 664.8 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 88 2 1181.7 0.8 1.0X [info] java 85 88 2 1176.0 0.9 1.0X [info] native 75 78 2 1333.2 0.8 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.1 0.6 1.0X [info] java 41 43 1 2415.5 0.4 1.4X [info] native 30 31 1 3293.9 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 94 96 1 1063.4 0.9 1.0X [info] java 41 42 1 2435.8 0.4 2.3X [info] native 30 30 1 3379.8 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2278.9 0.4 1.0X [info] java 37 38 0 2686.8 0.4 1.2X [info] native 15 16 1 6555.4 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 88 89 1 1142.1 0.9 1.0X [info] java 33 34 1 3010.7 0.3 2.6X [info] native 15 16 1 6553.9 0.2 5.7X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 172 4 609.4 1.6 1.0X [info] java 163 172 5 612.6 1.6 1.0X [info] native 150 159 4 667.0 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 49 50 1 1029.4 1.0 1.0X [info] java 41 42 1 1209.4 0.8 1.2X [info] native 25 27 1 2029.2 0.5 2.0X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 80 85 3 622.2 1.6 1.0X [info] java 80 85 3 622.4 1.6 1.0X [info] native 75 79 3 668.7 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 137 142 3 364.1 2.7 1.0X [info] java 139 142 2 360.4 2.8 1.0X [info] native 131 135 3 380.4 2.6 1.0X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 517 525 5 1935.5 0.5 1.0X [info] java 213 216 3 4704.8 0.2 2.4X [info] native 31 31 1 32705.6 0.0 16.9X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 589 601 6 1698.6 0.6 1.0X [info] java 213 217 3 4693.3 0.2 2.8X [info] native 31 32 1 32498.9 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 851 865 6 1175.3 0.9 1.0X [info] java 212 216 3 4717.0 0.2 4.0X [info] native 30 32 1 32903.0 0.0 28.0X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1301 1316 6 768.4 1.3 1.0X [info] java 212 216 2 4717.4 0.2 6.1X [info] native 31 32 1 32606.0 0.0 42.4X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 454 460 2 2203.0 0.5 1.0X [info] java 208 212 3 4803.8 0.2 2.2X [info] native 15 16 0 66586.0 0.0 30.2X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 529 536 4 1889.7 0.5 1.0X [info] java 208 212 3 4798.6 0.2 2.5X [info] native 15 16 1 66751.4 0.0 35.3X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 830 840 5 1205.1 0.8 1.0X [info] java 208 211 2 4814.1 0.2 4.0X [info] native 15 15 1 67676.4 0.0 56.2X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 894 907 7 1118.7 0.9 1.0X [info] java 208 211 3 4809.6 0.2 4.3X [info] native 15 16 1 66675.2 0.0 59.6X ``` #### JDK16: ``` [info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.VectorBLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 193 199 3 517.5 1.9 1.0X [info] java 181 186 4 553.2 1.8 1.1X [info] native 181 185 5 553.6 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 2 925.1 1.1 1.0X [info] java 88 91 3 1138.6 0.9 1.2X [info] native 87 91 3 1144.2 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 184 189 3 542.5 1.8 1.0X [info] java 181 185 3 552.8 1.8 1.0X [info] native 179 183 2 558.0 1.8 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 1031.6 1.0 1.0X [info] java 86 90 2 1163.7 0.9 1.1X [info] native 85 88 2 1182.9 0.8 1.1X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 107 109 2 932.4 1.1 1.0X [info] java 54 56 2 1846.7 0.5 2.0X [info] native 54 56 2 1846.7 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1043.6 1.0 1.0X [info] java 29 30 1 3439.3 0.3 3.3X [info] native 29 30 1 3423.9 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 123 2 829.8 1.2 1.0X [info] java 32 32 1 3171.3 0.3 3.8X [info] native 45 46 1 2246.2 0.4 2.7X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 142 144 2 705.9 1.4 1.0X [info] java 15 16 1 6585.8 0.2 9.3X [info] native 26 27 1 3839.5 0.3 5.4X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 157 165 5 635.6 1.6 1.0X [info] java 151 159 5 664.0 1.5 1.0X [info] native 151 160 5 663.6 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 89 2 1172.3 0.9 1.0X [info] java 75 79 3 1337.3 0.7 1.1X [info] native 75 79 2 1335.5 0.7 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.5 0.6 1.0X [info] java 28 29 1 3544.2 0.3 2.0X [info] native 30 31 1 3306.2 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1108.3 0.9 1.0X [info] java 28 28 1 3622.5 0.3 3.3X [info] native 30 31 1 3381.3 0.3 3.1X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2284.7 0.4 1.0X [info] java 14 15 1 7034.0 0.1 3.1X [info] native 15 16 1 6643.7 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 86 1 1177.4 0.8 1.0X [info] java 15 15 1 6886.1 0.1 5.8X [info] native 15 16 1 6560.1 0.2 5.6X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 173 6 608.1 1.6 1.0X [info] java 148 157 5 675.2 1.5 1.1X [info] native 152 160 5 659.9 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 61 63 1 815.4 1.2 1.0X [info] java 16 17 1 3104.3 0.3 3.8X [info] native 24 27 1 2071.9 0.5 2.5X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 81 85 2 616.4 1.6 1.0X [info] java 81 85 2 614.7 1.6 1.0X [info] native 75 78 2 669.5 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 138 141 3 362.7 2.8 1.0X [info] java 137 140 2 365.3 2.7 1.0X [info] native 131 134 2 382.9 2.6 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 525 544 8 1906.2 0.5 1.0X [info] java 61 68 3 16358.1 0.1 8.6X [info] native 31 32 1 32623.7 0.0 17.1X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 580 598 12 1724.5 0.6 1.0X [info] java 61 68 4 16302.5 0.1 9.5X [info] native 30 32 1 32962.8 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 829 838 4 1206.2 0.8 1.0X [info] java 61 69 3 16339.7 0.1 13.5X [info] native 30 31 1 33231.9 0.0 27.6X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1352 1363 5 739.6 1.4 1.0X [info] java 61 69 3 16347.0 0.1 22.1X [info] native 31 32 1 32740.3 0.0 44.3X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 482 493 7 2073.1 0.5 1.0X [info] java 35 38 2 28315.3 0.0 13.7X [info] native 15 15 1 67579.7 0.0 32.6X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 472 482 4 2119.0 0.5 1.0X [info] java 36 38 2 28138.1 0.0 13.3X [info] native 15 16 1 66616.5 0.0 31.4X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 823 830 5 1215.2 0.8 1.0X [info] java 35 38 2 28681.4 0.0 23.6X [info] native 15 15 1 67908.4 0.0 55.9X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 896 908 7 1115.8 0.9 1.0X [info] java 35 38 2 28402.0 0.0 25.5X [info] native 15 16 0 66691.2 0.0 59.8X ``` TODO: - [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib` - [ ] merge https://github.com/luhenry/netlib/pull/1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`. Closes #32415 from luhenry/master. Authored-by: Ludovic Henry <git@ludovic.dev> Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 09:59:36 -04:00
MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/), [dev.ludovic.netlib](https://github.com/luhenry/netlib), and [netlib-java](https://github.com/fommil/netlib-java) for optimised numerical processing[^1]. Those packages may call native acceleration libraries such as [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html) or [OpenBLAS](http://www.openblas.net) if they are available as system libraries or in runtime library paths.
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0 ### What changes were proposed in this pull request? Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2). ### Why are the changes needed? Great performance improvement for ML-related workload on vanilla-distributions of Spark. ### Does this PR introduce _any_ user-facing change? Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2). ### How was this patch tested? Spark test-suite + dev.ludovic.netlib testsuite. #### JDK8: ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java8BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 220 226 6 454.9 2.2 1.0X [info] java 221 228 5 451.9 2.2 1.0X [info] native 209 215 5 478.7 2.1 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 125 3 823.3 1.2 1.0X [info] java 121 125 3 824.3 1.2 1.0X [info] native 101 105 3 988.4 1.0 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 212 219 6 470.9 2.1 1.0X [info] java 208 212 4 481.0 2.1 1.0X [info] native 209 215 5 478.5 2.1 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 119 3 878.9 1.1 1.0X [info] java 99 105 3 1011.4 1.0 1.2X [info] native 97 103 3 1026.7 1.0 1.2X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 111 2 925.9 1.1 1.0X [info] java 71 73 2 1414.9 0.7 1.5X [info] native 54 56 2 1847.0 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 2 1046.8 1.0 1.0X [info] java 47 48 1 2129.8 0.5 2.0X [info] native 29 30 1 3404.7 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 139 143 2 718.2 1.4 1.0X [info] java 46 47 1 2171.2 0.5 3.0X [info] native 44 46 2 2261.8 0.4 3.1X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 154 157 4 651.0 1.5 1.0X [info] java 40 42 1 2469.3 0.4 3.8X [info] native 26 27 1 3787.6 0.3 5.8X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 185 195 8 541.0 1.8 1.0X [info] java 186 196 7 538.5 1.9 1.0X [info] native 177 187 7 564.1 1.8 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 98 102 3 1016.2 1.0 1.0X [info] java 98 102 3 1017.8 1.0 1.0X [info] native 87 91 3 1143.2 0.9 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 68 70 1 1474.7 0.7 1.0X [info] java 51 52 1 1973.0 0.5 1.3X [info] native 30 32 1 3298.8 0.3 2.2X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 99 2 1037.9 1.0 1.0X [info] java 50 51 1 1999.6 0.5 1.9X [info] native 30 31 1 3368.1 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 1688.7 0.6 1.0X [info] java 41 42 1 2461.9 0.4 1.5X [info] native 15 16 1 6593.0 0.2 3.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1116.2 0.9 1.0X [info] java 39 40 1 2565.8 0.4 2.3X [info] native 15 16 1 6594.2 0.2 5.9X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 192 202 7 520.5 1.9 1.0X [info] java 203 214 7 491.9 2.0 0.9X [info] native 176 187 7 568.8 1.8 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 846.1 1.2 1.0X [info] java 38 39 1 1313.5 0.8 1.6X [info] native 24 27 1 2047.8 0.5 2.4X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 515.4 1.9 1.0X [info] java 97 101 2 515.1 1.9 1.0X [info] native 88 91 3 569.1 1.8 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 169 174 3 295.4 3.4 1.0X [info] java 169 174 3 295.4 3.4 1.0X [info] native 160 165 4 312.2 3.2 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 561 577 13 1782.3 0.6 1.0X [info] java 225 231 4 4446.2 0.2 2.5X [info] native 31 32 3 32473.1 0.0 18.2X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 570 584 9 1754.8 0.6 1.0X [info] java 224 230 4 4457.3 0.2 2.5X [info] native 31 32 1 32493.4 0.0 18.5X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 855 866 6 1169.2 0.9 1.0X [info] java 224 228 3 4466.9 0.2 3.8X [info] native 31 32 1 32395.5 0.0 27.7X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1328 1344 8 752.8 1.3 1.0X [info] java 224 230 4 4458.9 0.2 5.9X [info] native 31 32 1 32201.8 0.0 42.8X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 534 541 5 1873.0 0.5 1.0X [info] java 220 224 3 4542.8 0.2 2.4X [info] native 15 16 1 66803.1 0.0 35.7X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 544 551 6 1839.6 0.5 1.0X [info] java 220 224 4 4538.2 0.2 2.5X [info] native 15 16 1 65589.9 0.0 35.7X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 833 845 21 1201.0 0.8 1.0X [info] java 220 224 3 4548.7 0.2 3.8X [info] native 15 16 1 66603.2 0.0 55.5X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 899 907 5 1112.9 0.9 1.0X [info] java 221 224 2 4531.6 0.2 4.1X [info] native 15 16 1 65944.9 0.0 59.3X ``` #### JDK11: ``` [info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java11BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 195 200 3 512.2 2.0 1.0X [info] java 197 202 3 507.0 2.0 1.0X [info] native 184 189 4 543.0 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 3 921.8 1.1 1.0X [info] java 101 105 3 989.4 1.0 1.1X [info] native 87 91 3 1147.1 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 187 191 3 535.1 1.9 1.0X [info] java 182 188 3 548.8 1.8 1.0X [info] native 178 182 3 562.2 1.8 1.1X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 110 114 3 909.3 1.1 1.0X [info] java 86 93 4 1159.3 0.9 1.3X [info] native 86 90 3 1162.4 0.9 1.3X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 106 108 2 943.6 1.1 1.0X [info] java 70 71 2 1426.8 0.7 1.5X [info] native 54 56 2 1835.4 0.5 1.9X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1047.1 1.0 1.0X [info] java 43 44 1 2331.9 0.4 2.2X [info] native 29 30 1 3392.1 0.3 3.2X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 115 2 880.7 1.1 1.0X [info] java 42 43 1 2398.1 0.4 2.7X [info] native 45 46 1 2233.3 0.4 2.5X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 140 143 2 714.6 1.4 1.0X [info] java 28 29 1 3531.0 0.3 4.9X [info] native 26 27 1 3820.0 0.3 5.3X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 156 166 7 641.3 1.6 1.0X [info] java 158 167 6 633.2 1.6 1.0X [info] native 150 160 7 664.8 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 88 2 1181.7 0.8 1.0X [info] java 85 88 2 1176.0 0.9 1.0X [info] native 75 78 2 1333.2 0.8 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.1 0.6 1.0X [info] java 41 43 1 2415.5 0.4 1.4X [info] native 30 31 1 3293.9 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 94 96 1 1063.4 0.9 1.0X [info] java 41 42 1 2435.8 0.4 2.3X [info] native 30 30 1 3379.8 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2278.9 0.4 1.0X [info] java 37 38 0 2686.8 0.4 1.2X [info] native 15 16 1 6555.4 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 88 89 1 1142.1 0.9 1.0X [info] java 33 34 1 3010.7 0.3 2.6X [info] native 15 16 1 6553.9 0.2 5.7X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 172 4 609.4 1.6 1.0X [info] java 163 172 5 612.6 1.6 1.0X [info] native 150 159 4 667.0 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 49 50 1 1029.4 1.0 1.0X [info] java 41 42 1 1209.4 0.8 1.2X [info] native 25 27 1 2029.2 0.5 2.0X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 80 85 3 622.2 1.6 1.0X [info] java 80 85 3 622.4 1.6 1.0X [info] native 75 79 3 668.7 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 137 142 3 364.1 2.7 1.0X [info] java 139 142 2 360.4 2.8 1.0X [info] native 131 135 3 380.4 2.6 1.0X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 517 525 5 1935.5 0.5 1.0X [info] java 213 216 3 4704.8 0.2 2.4X [info] native 31 31 1 32705.6 0.0 16.9X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 589 601 6 1698.6 0.6 1.0X [info] java 213 217 3 4693.3 0.2 2.8X [info] native 31 32 1 32498.9 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 851 865 6 1175.3 0.9 1.0X [info] java 212 216 3 4717.0 0.2 4.0X [info] native 30 32 1 32903.0 0.0 28.0X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1301 1316 6 768.4 1.3 1.0X [info] java 212 216 2 4717.4 0.2 6.1X [info] native 31 32 1 32606.0 0.0 42.4X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 454 460 2 2203.0 0.5 1.0X [info] java 208 212 3 4803.8 0.2 2.2X [info] native 15 16 0 66586.0 0.0 30.2X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 529 536 4 1889.7 0.5 1.0X [info] java 208 212 3 4798.6 0.2 2.5X [info] native 15 16 1 66751.4 0.0 35.3X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 830 840 5 1205.1 0.8 1.0X [info] java 208 211 2 4814.1 0.2 4.0X [info] native 15 15 1 67676.4 0.0 56.2X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 894 907 7 1118.7 0.9 1.0X [info] java 208 211 3 4809.6 0.2 4.3X [info] native 15 16 1 66675.2 0.0 59.6X ``` #### JDK16: ``` [info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.VectorBLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 193 199 3 517.5 1.9 1.0X [info] java 181 186 4 553.2 1.8 1.1X [info] native 181 185 5 553.6 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 2 925.1 1.1 1.0X [info] java 88 91 3 1138.6 0.9 1.2X [info] native 87 91 3 1144.2 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 184 189 3 542.5 1.8 1.0X [info] java 181 185 3 552.8 1.8 1.0X [info] native 179 183 2 558.0 1.8 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 1031.6 1.0 1.0X [info] java 86 90 2 1163.7 0.9 1.1X [info] native 85 88 2 1182.9 0.8 1.1X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 107 109 2 932.4 1.1 1.0X [info] java 54 56 2 1846.7 0.5 2.0X [info] native 54 56 2 1846.7 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1043.6 1.0 1.0X [info] java 29 30 1 3439.3 0.3 3.3X [info] native 29 30 1 3423.9 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 123 2 829.8 1.2 1.0X [info] java 32 32 1 3171.3 0.3 3.8X [info] native 45 46 1 2246.2 0.4 2.7X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 142 144 2 705.9 1.4 1.0X [info] java 15 16 1 6585.8 0.2 9.3X [info] native 26 27 1 3839.5 0.3 5.4X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 157 165 5 635.6 1.6 1.0X [info] java 151 159 5 664.0 1.5 1.0X [info] native 151 160 5 663.6 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 89 2 1172.3 0.9 1.0X [info] java 75 79 3 1337.3 0.7 1.1X [info] native 75 79 2 1335.5 0.7 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.5 0.6 1.0X [info] java 28 29 1 3544.2 0.3 2.0X [info] native 30 31 1 3306.2 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1108.3 0.9 1.0X [info] java 28 28 1 3622.5 0.3 3.3X [info] native 30 31 1 3381.3 0.3 3.1X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2284.7 0.4 1.0X [info] java 14 15 1 7034.0 0.1 3.1X [info] native 15 16 1 6643.7 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 86 1 1177.4 0.8 1.0X [info] java 15 15 1 6886.1 0.1 5.8X [info] native 15 16 1 6560.1 0.2 5.6X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 173 6 608.1 1.6 1.0X [info] java 148 157 5 675.2 1.5 1.1X [info] native 152 160 5 659.9 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 61 63 1 815.4 1.2 1.0X [info] java 16 17 1 3104.3 0.3 3.8X [info] native 24 27 1 2071.9 0.5 2.5X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 81 85 2 616.4 1.6 1.0X [info] java 81 85 2 614.7 1.6 1.0X [info] native 75 78 2 669.5 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 138 141 3 362.7 2.8 1.0X [info] java 137 140 2 365.3 2.7 1.0X [info] native 131 134 2 382.9 2.6 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 525 544 8 1906.2 0.5 1.0X [info] java 61 68 3 16358.1 0.1 8.6X [info] native 31 32 1 32623.7 0.0 17.1X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 580 598 12 1724.5 0.6 1.0X [info] java 61 68 4 16302.5 0.1 9.5X [info] native 30 32 1 32962.8 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 829 838 4 1206.2 0.8 1.0X [info] java 61 69 3 16339.7 0.1 13.5X [info] native 30 31 1 33231.9 0.0 27.6X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1352 1363 5 739.6 1.4 1.0X [info] java 61 69 3 16347.0 0.1 22.1X [info] native 31 32 1 32740.3 0.0 44.3X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 482 493 7 2073.1 0.5 1.0X [info] java 35 38 2 28315.3 0.0 13.7X [info] native 15 15 1 67579.7 0.0 32.6X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 472 482 4 2119.0 0.5 1.0X [info] java 36 38 2 28138.1 0.0 13.3X [info] native 15 16 1 66616.5 0.0 31.4X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 823 830 5 1215.2 0.8 1.0X [info] java 35 38 2 28681.4 0.0 23.6X [info] native 15 15 1 67908.4 0.0 55.9X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 896 908 7 1115.8 0.9 1.0X [info] java 35 38 2 28402.0 0.0 25.5X [info] native 15 16 0 66691.2 0.0 59.8X ``` TODO: - [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib` - [ ] merge https://github.com/luhenry/netlib/pull/1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`. Closes #32415 from luhenry/master. Authored-by: Ludovic Henry <git@ludovic.dev> Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 09:59:36 -04:00
However, native acceleration libraries can't be distributed with Spark. See [MLlib Linear Algebra Acceleration Guide](ml-linalg-guide.html) for how to enable accelerated linear algebra processing. If accelerated native libraries are not enabled, you will see a warning message like below and a pure JVM implementation will be used instead:
```
[SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0 ### What changes were proposed in this pull request? Bump to `dev.ludovic.netlib:2.0` which provides JNI-based wrappers for BLAS, ARPACK, and LAPACK. Theseare not taking dependencies on GPL or LGPL libraries, allowing to provide out-of-the-box support for hardware acceleration when a native library is present (this is still up to the end-user to install such library on their system, like OpenBLAS, Intel MKL, and libarpack2). ### Why are the changes needed? Great performance improvement for ML-related workload on vanilla-distributions of Spark. ### Does this PR introduce _any_ user-facing change? Users now take advantage of hardware acceleration as long as a native library is installed (like OpenBLAS, Intel MKL and libarpack2). ### How was this patch tested? Spark test-suite + dev.ludovic.netlib testsuite. #### JDK8: ``` [info] OpenJDK 64-Bit Server VM 1.8.0_292-b10 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java8BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 220 226 6 454.9 2.2 1.0X [info] java 221 228 5 451.9 2.2 1.0X [info] native 209 215 5 478.7 2.1 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 125 3 823.3 1.2 1.0X [info] java 121 125 3 824.3 1.2 1.0X [info] native 101 105 3 988.4 1.0 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 212 219 6 470.9 2.1 1.0X [info] java 208 212 4 481.0 2.1 1.0X [info] native 209 215 5 478.5 2.1 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 119 3 878.9 1.1 1.0X [info] java 99 105 3 1011.4 1.0 1.2X [info] native 97 103 3 1026.7 1.0 1.2X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 111 2 925.9 1.1 1.0X [info] java 71 73 2 1414.9 0.7 1.5X [info] native 54 56 2 1847.0 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 2 1046.8 1.0 1.0X [info] java 47 48 1 2129.8 0.5 2.0X [info] native 29 30 1 3404.7 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 139 143 2 718.2 1.4 1.0X [info] java 46 47 1 2171.2 0.5 3.0X [info] native 44 46 2 2261.8 0.4 3.1X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 154 157 4 651.0 1.5 1.0X [info] java 40 42 1 2469.3 0.4 3.8X [info] native 26 27 1 3787.6 0.3 5.8X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 185 195 8 541.0 1.8 1.0X [info] java 186 196 7 538.5 1.9 1.0X [info] native 177 187 7 564.1 1.8 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 98 102 3 1016.2 1.0 1.0X [info] java 98 102 3 1017.8 1.0 1.0X [info] native 87 91 3 1143.2 0.9 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 68 70 1 1474.7 0.7 1.0X [info] java 51 52 1 1973.0 0.5 1.3X [info] native 30 32 1 3298.8 0.3 2.2X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 99 2 1037.9 1.0 1.0X [info] java 50 51 1 1999.6 0.5 1.9X [info] native 30 31 1 3368.1 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 1688.7 0.6 1.0X [info] java 41 42 1 2461.9 0.4 1.5X [info] native 15 16 1 6593.0 0.2 3.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1116.2 0.9 1.0X [info] java 39 40 1 2565.8 0.4 2.3X [info] native 15 16 1 6594.2 0.2 5.9X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 192 202 7 520.5 1.9 1.0X [info] java 203 214 7 491.9 2.0 0.9X [info] native 176 187 7 568.8 1.8 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 59 61 1 846.1 1.2 1.0X [info] java 38 39 1 1313.5 0.8 1.6X [info] native 24 27 1 2047.8 0.5 2.4X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 515.4 1.9 1.0X [info] java 97 101 2 515.1 1.9 1.0X [info] native 88 91 3 569.1 1.8 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 169 174 3 295.4 3.4 1.0X [info] java 169 174 3 295.4 3.4 1.0X [info] native 160 165 4 312.2 3.2 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 561 577 13 1782.3 0.6 1.0X [info] java 225 231 4 4446.2 0.2 2.5X [info] native 31 32 3 32473.1 0.0 18.2X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 570 584 9 1754.8 0.6 1.0X [info] java 224 230 4 4457.3 0.2 2.5X [info] native 31 32 1 32493.4 0.0 18.5X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 855 866 6 1169.2 0.9 1.0X [info] java 224 228 3 4466.9 0.2 3.8X [info] native 31 32 1 32395.5 0.0 27.7X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1328 1344 8 752.8 1.3 1.0X [info] java 224 230 4 4458.9 0.2 5.9X [info] native 31 32 1 32201.8 0.0 42.8X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 534 541 5 1873.0 0.5 1.0X [info] java 220 224 3 4542.8 0.2 2.4X [info] native 15 16 1 66803.1 0.0 35.7X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 544 551 6 1839.6 0.5 1.0X [info] java 220 224 4 4538.2 0.2 2.5X [info] native 15 16 1 65589.9 0.0 35.7X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 833 845 21 1201.0 0.8 1.0X [info] java 220 224 3 4548.7 0.2 3.8X [info] native 15 16 1 66603.2 0.0 55.5X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 899 907 5 1112.9 0.9 1.0X [info] java 221 224 2 4531.6 0.2 4.1X [info] native 15 16 1 65944.9 0.0 59.3X ``` #### JDK11: ``` [info] OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.Java11BLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 195 200 3 512.2 2.0 1.0X [info] java 197 202 3 507.0 2.0 1.0X [info] native 184 189 4 543.0 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 3 921.8 1.1 1.0X [info] java 101 105 3 989.4 1.0 1.1X [info] native 87 91 3 1147.1 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 187 191 3 535.1 1.9 1.0X [info] java 182 188 3 548.8 1.8 1.0X [info] native 178 182 3 562.2 1.8 1.1X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 110 114 3 909.3 1.1 1.0X [info] java 86 93 4 1159.3 0.9 1.3X [info] native 86 90 3 1162.4 0.9 1.3X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 106 108 2 943.6 1.1 1.0X [info] java 70 71 2 1426.8 0.7 1.5X [info] native 54 56 2 1835.4 0.5 1.9X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1047.1 1.0 1.0X [info] java 43 44 1 2331.9 0.4 2.2X [info] native 29 30 1 3392.1 0.3 3.2X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 114 115 2 880.7 1.1 1.0X [info] java 42 43 1 2398.1 0.4 2.7X [info] native 45 46 1 2233.3 0.4 2.5X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 140 143 2 714.6 1.4 1.0X [info] java 28 29 1 3531.0 0.3 4.9X [info] native 26 27 1 3820.0 0.3 5.3X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 156 166 7 641.3 1.6 1.0X [info] java 158 167 6 633.2 1.6 1.0X [info] native 150 160 7 664.8 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 88 2 1181.7 0.8 1.0X [info] java 85 88 2 1176.0 0.9 1.0X [info] native 75 78 2 1333.2 0.8 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.1 0.6 1.0X [info] java 41 43 1 2415.5 0.4 1.4X [info] native 30 31 1 3293.9 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 94 96 1 1063.4 0.9 1.0X [info] java 41 42 1 2435.8 0.4 2.3X [info] native 30 30 1 3379.8 0.3 3.2X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2278.9 0.4 1.0X [info] java 37 38 0 2686.8 0.4 1.2X [info] native 15 16 1 6555.4 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 88 89 1 1142.1 0.9 1.0X [info] java 33 34 1 3010.7 0.3 2.6X [info] native 15 16 1 6553.9 0.2 5.7X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 172 4 609.4 1.6 1.0X [info] java 163 172 5 612.6 1.6 1.0X [info] native 150 159 4 667.0 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 49 50 1 1029.4 1.0 1.0X [info] java 41 42 1 1209.4 0.8 1.2X [info] native 25 27 1 2029.2 0.5 2.0X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 80 85 3 622.2 1.6 1.0X [info] java 80 85 3 622.4 1.6 1.0X [info] native 75 79 3 668.7 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 137 142 3 364.1 2.7 1.0X [info] java 139 142 2 360.4 2.8 1.0X [info] native 131 135 3 380.4 2.6 1.0X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 517 525 5 1935.5 0.5 1.0X [info] java 213 216 3 4704.8 0.2 2.4X [info] native 31 31 1 32705.6 0.0 16.9X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 589 601 6 1698.6 0.6 1.0X [info] java 213 217 3 4693.3 0.2 2.8X [info] native 31 32 1 32498.9 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 851 865 6 1175.3 0.9 1.0X [info] java 212 216 3 4717.0 0.2 4.0X [info] native 30 32 1 32903.0 0.0 28.0X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1301 1316 6 768.4 1.3 1.0X [info] java 212 216 2 4717.4 0.2 6.1X [info] native 31 32 1 32606.0 0.0 42.4X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 454 460 2 2203.0 0.5 1.0X [info] java 208 212 3 4803.8 0.2 2.2X [info] native 15 16 0 66586.0 0.0 30.2X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 529 536 4 1889.7 0.5 1.0X [info] java 208 212 3 4798.6 0.2 2.5X [info] native 15 16 1 66751.4 0.0 35.3X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 830 840 5 1205.1 0.8 1.0X [info] java 208 211 2 4814.1 0.2 4.0X [info] native 15 15 1 67676.4 0.0 56.2X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 894 907 7 1118.7 0.9 1.0X [info] java 208 211 3 4809.6 0.2 4.3X [info] native 15 16 1 66675.2 0.0 59.6X ``` #### JDK16: ``` [info] OpenJDK 64-Bit Server VM 16+36 on Linux 5.8.0-50-generic [info] Intel(R) Xeon(R) E-2276G CPU 3.80GHz [info] [info] f2jBLAS = dev.ludovic.netlib.blas.F2jBLAS [info] javaBLAS = dev.ludovic.netlib.blas.VectorBLAS [info] nativeBLAS = dev.ludovic.netlib.blas.JNIBLAS [info] [info] daxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 193 199 3 517.5 1.9 1.0X [info] java 181 186 4 553.2 1.8 1.1X [info] native 181 185 5 553.6 1.8 1.1X [info] [info] saxpy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 108 112 2 925.1 1.1 1.0X [info] java 88 91 3 1138.6 0.9 1.2X [info] native 87 91 3 1144.2 0.9 1.2X [info] [info] dcopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 184 189 3 542.5 1.8 1.0X [info] java 181 185 3 552.8 1.8 1.0X [info] native 179 183 2 558.0 1.8 1.0X [info] [info] scopy: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 97 101 3 1031.6 1.0 1.0X [info] java 86 90 2 1163.7 0.9 1.1X [info] native 85 88 2 1182.9 0.8 1.1X [info] [info] ddot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 107 109 2 932.4 1.1 1.0X [info] java 54 56 2 1846.7 0.5 2.0X [info] native 54 56 2 1846.7 0.5 2.0X [info] [info] sdot: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 96 97 1 1043.6 1.0 1.0X [info] java 29 30 1 3439.3 0.3 3.3X [info] native 29 30 1 3423.9 0.3 3.3X [info] [info] dnrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 121 123 2 829.8 1.2 1.0X [info] java 32 32 1 3171.3 0.3 3.8X [info] native 45 46 1 2246.2 0.4 2.7X [info] [info] snrm2: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 142 144 2 705.9 1.4 1.0X [info] java 15 16 1 6585.8 0.2 9.3X [info] native 26 27 1 3839.5 0.3 5.4X [info] [info] dscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 157 165 5 635.6 1.6 1.0X [info] java 151 159 5 664.0 1.5 1.0X [info] native 151 160 5 663.6 1.5 1.0X [info] [info] sscal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 89 2 1172.3 0.9 1.0X [info] java 75 79 3 1337.3 0.7 1.1X [info] native 75 79 2 1335.5 0.7 1.1X [info] [info] dgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 58 59 1 1731.5 0.6 1.0X [info] java 28 29 1 3544.2 0.3 2.0X [info] native 30 31 1 3306.2 0.3 1.9X [info] [info] dgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 90 92 1 1108.3 0.9 1.0X [info] java 28 28 1 3622.5 0.3 3.3X [info] native 30 31 1 3381.3 0.3 3.1X [info] [info] sgemv[N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 44 45 1 2284.7 0.4 1.0X [info] java 14 15 1 7034.0 0.1 3.1X [info] native 15 16 1 6643.7 0.2 2.9X [info] [info] sgemv[T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 85 86 1 1177.4 0.8 1.0X [info] java 15 15 1 6886.1 0.1 5.8X [info] native 15 16 1 6560.1 0.2 5.6X [info] [info] dger: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 164 173 6 608.1 1.6 1.0X [info] java 148 157 5 675.2 1.5 1.1X [info] native 152 160 5 659.9 1.5 1.1X [info] [info] dspmv[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 61 63 1 815.4 1.2 1.0X [info] java 16 17 1 3104.3 0.3 3.8X [info] native 24 27 1 2071.9 0.5 2.5X [info] [info] dspr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 81 85 2 616.4 1.6 1.0X [info] java 81 85 2 614.7 1.6 1.0X [info] native 75 78 2 669.5 1.5 1.1X [info] [info] dsyr[U]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 138 141 3 362.7 2.8 1.0X [info] java 137 140 2 365.3 2.7 1.0X [info] native 131 134 2 382.9 2.6 1.1X [info] [info] dgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 525 544 8 1906.2 0.5 1.0X [info] java 61 68 3 16358.1 0.1 8.6X [info] native 31 32 1 32623.7 0.0 17.1X [info] [info] dgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 580 598 12 1724.5 0.6 1.0X [info] java 61 68 4 16302.5 0.1 9.5X [info] native 30 32 1 32962.8 0.0 19.1X [info] [info] dgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 829 838 4 1206.2 0.8 1.0X [info] java 61 69 3 16339.7 0.1 13.5X [info] native 30 31 1 33231.9 0.0 27.6X [info] [info] dgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 1352 1363 5 739.6 1.4 1.0X [info] java 61 69 3 16347.0 0.1 22.1X [info] native 31 32 1 32740.3 0.0 44.3X [info] [info] sgemm[N,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 482 493 7 2073.1 0.5 1.0X [info] java 35 38 2 28315.3 0.0 13.7X [info] native 15 15 1 67579.7 0.0 32.6X [info] [info] sgemm[N,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 472 482 4 2119.0 0.5 1.0X [info] java 36 38 2 28138.1 0.0 13.3X [info] native 15 16 1 66616.5 0.0 31.4X [info] [info] sgemm[T,N]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 823 830 5 1215.2 0.8 1.0X [info] java 35 38 2 28681.4 0.0 23.6X [info] native 15 15 1 67908.4 0.0 55.9X [info] [info] sgemm[T,T]: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative [info] ----------------------------------------------------------------------------------------------- [info] f2j 896 908 7 1115.8 0.9 1.0X [info] java 35 38 2 28402.0 0.0 25.5X [info] native 15 16 0 66691.2 0.0 59.8X ``` TODO: - [x] update documentation in `docs/` and `docs/ml-linalg-guide.md` refering `com.github.fommil.netlib` - [ ] merge https://github.com/luhenry/netlib/pull/1 with all feedback from this PR + remove references to snapshot repositories in `pom.xml` and `project/SparkBuild.scala`. Closes #32415 from luhenry/master. Authored-by: Ludovic Henry <git@ludovic.dev> Signed-off-by: Sean Owen <srowen@gmail.com>
2021-05-12 09:59:36 -04:00
WARN BLAS: Failed to load implementation from:dev.ludovic.netlib.blas.JNIBLAS
```
2016-07-15 16:38:23 -04:00
To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer.
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
2016-07-15 16:38:23 -04:00
[^1]: To learn more about the benefits and background of system optimised natives, you may wish to
watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/).
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
# Highlights in 3.0
The list below highlights some of the new features and enhancements added to MLlib in the `3.0`
release of Spark:
* Multiple columns support was added to `Binarizer` ([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)), `StringIndexer` ([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)), `StopWordsRemover` ([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)) and PySpark `QuantileDiscretizer` ([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
* Tree-Based Feature Transformation was added
([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
* Two new evaluators `MultilabelClassificationEvaluator` ([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)) and `RankingEvaluator` ([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)) were added.
* Sample weights support was added in `DecisionTreeClassifier/Regressor` ([SPARK-19591](https://issues.apache.org/jira/browse/SPARK-19591)), `RandomForestClassifier/Regressor` ([SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478)), `GBTClassifier/Regressor` ([SPARK-9612](https://issues.apache.org/jira/browse/SPARK-9612)), `MulticlassClassificationEvaluator` ([SPARK-24101](https://issues.apache.org/jira/browse/SPARK-24101)), `RegressionEvaluator` ([SPARK-24102](https://issues.apache.org/jira/browse/SPARK-24102)), `BinaryClassificationEvaluator` ([SPARK-24103](https://issues.apache.org/jira/browse/SPARK-24103)), `BisectingKMeans` ([SPARK-30351](https://issues.apache.org/jira/browse/SPARK-30351)), `KMeans` ([SPARK-29967](https://issues.apache.org/jira/browse/SPARK-29967)) and `GaussianMixture` ([SPARK-30102](https://issues.apache.org/jira/browse/SPARK-30102)).
* R API for `PowerIterationClustering` was added
([SPARK-19827](https://issues.apache.org/jira/browse/SPARK-19827)).
* Added Spark ML listener for tracking ML pipeline status
([SPARK-23674](https://issues.apache.org/jira/browse/SPARK-23674)).
* Fit with validation set was added to Gradient Boosted Trees in Python
([SPARK-24333](https://issues.apache.org/jira/browse/SPARK-24333)).
* [`RobustScaler`](ml-features.html#robustscaler) transformer was added
([SPARK-28399](https://issues.apache.org/jira/browse/SPARK-28399)).
* [`Factorization Machines`](ml-classification-regression.html#factorization-machines) classifier and regressor were added
([SPARK-29224](https://issues.apache.org/jira/browse/SPARK-29224)).
* Gaussian Naive Bayes Classifier ([SPARK-16872](https://issues.apache.org/jira/browse/SPARK-16872)) and Complement Naive Bayes Classifier ([SPARK-29942](https://issues.apache.org/jira/browse/SPARK-29942)) were added.
* ML function parity between Scala and Python
([SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958)).
* `predictRaw` is made public in all the Classification models. `predictProbability` is made public in all the Classification models except `LinearSVCModel`
([SPARK-30358](https://issues.apache.org/jira/browse/SPARK-30358)).
[SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation ### What changes were proposed in this pull request? Currently, there is no migration section for PySpark, SparkCore and Structured Streaming. It is difficult for users to know what to do when they upgrade. This PR proposes to create create a "Migration Guide" tap at Spark documentation. ![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png) ![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png) This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring. There are some new information added, which I will leave a comment inlined for easier review. 1. **MLlib** Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html) ``` 'docs/ml-guide.md' ↓ Merge new/old migration guides 'docs/ml-migration-guide.md' ``` 2. **PySpark** Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html ``` 'docs/sql-migration-guide-upgrade.md' ↓ Extract PySpark specific items 'docs/pyspark-migration-guide.md' ``` 3. **SparkR** Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) ``` 'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md' Move migration guide section ↘ ↙ Extract SparkR specific items docs/sparkr-migration-guide.md ``` 4. **Core** Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 5. **Structured Streaming** Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 6. **SQL** Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html) ``` 'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md' Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items 'docs/sql-migration-guide.md' ``` ### Why are the changes needed? In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating. ### Does this PR introduce any user-facing change? Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html. ### How was this patch tested? Manually build the doc. This can be verified as below: ```bash cd docs SKIP_API=1 jekyll build open _site/index.html ``` Closes #25757 from HyukjinKwon/migration-doc. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 14:17:30 -04:00
# Migration Guide
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00
[SPARK-29052][DOCS][ML][PYTHON][CORE][R][SQL][SS] Create a Migration Guide tap in Spark documentation ### What changes were proposed in this pull request? Currently, there is no migration section for PySpark, SparkCore and Structured Streaming. It is difficult for users to know what to do when they upgrade. This PR proposes to create create a "Migration Guide" tap at Spark documentation. ![Screen Shot 2019-09-11 at 7 02 05 PM](https://user-images.githubusercontent.com/6477701/64688126-ad712f80-d4c6-11e9-8672-9a2c56c05bf8.png) ![Screen Shot 2019-09-11 at 7 27 15 PM](https://user-images.githubusercontent.com/6477701/64689915-389ff480-d4ca-11e9-8c54-7f46095d0d23.png) This page will contain migration guides for Spark SQL, PySpark, SparkR, MLlib, Structured Streaming and Core. Basically it is a refactoring. There are some new information added, which I will leave a comment inlined for easier review. 1. **MLlib** Merge [ml-guide.html#migration-guide](https://spark.apache.org/docs/latest/ml-guide.html#migration-guide) and [ml-migration-guides.html](https://spark.apache.org/docs/latest/ml-migration-guides.html) ``` 'docs/ml-guide.md' ↓ Merge new/old migration guides 'docs/ml-migration-guide.md' ``` 2. **PySpark** Extract PySpark specific items from https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html ``` 'docs/sql-migration-guide-upgrade.md' ↓ Extract PySpark specific items 'docs/pyspark-migration-guide.md' ``` 3. **SparkR** Move [sparkr.html#migration-guide](https://spark.apache.org/docs/latest/sparkr.html#migration-guide) into a separate file, and extract from [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) ``` 'docs/sparkr.md' 'docs/sql-migration-guide-upgrade.md' Move migration guide section ↘ ↙ Extract SparkR specific items docs/sparkr-migration-guide.md ``` 4. **Core** Newly created at `'docs/core-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 5. **Structured Streaming** Newly created at `'docs/ss-migration-guide.md'`. I skimmed resolved JIRAs at 3.0.0 and found some items to note. 6. **SQL** Merged [sql-migration-guide-upgrade.html](https://spark.apache.org/docs/latest/sql-migration-guide-upgrade.html) and [sql-migration-guide-hive-compatibility.html](https://spark.apache.org/docs/latest/sql-migration-guide-hive-compatibility.html) ``` 'docs/sql-migration-guide-hive-compatibility.md' 'docs/sql-migration-guide-upgrade.md' Move Hive compatibility section ↘ ↙ Left over after filtering PySpark and SparkR items 'docs/sql-migration-guide.md' ``` ### Why are the changes needed? In order for users in production to effectively migrate to higher versions, and detect behaviour or breaking changes before upgrading and/or migrating. ### Does this PR introduce any user-facing change? Yes, this changes Spark's documentation at https://spark.apache.org/docs/latest/index.html. ### How was this patch tested? Manually build the doc. This can be verified as below: ```bash cd docs SKIP_API=1 jekyll build open _site/index.html ``` Closes #25757 from HyukjinKwon/migration-doc. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2019-09-15 14:17:30 -04:00
The migration guide is now archived [on this page](ml-migration-guide.html).
[SPARK-4575] [mllib] [docs] spark.ml pipelines doc + bug fixes Documentation: * Added ml-guide.md, linked from mllib-guide.md * Updated mllib-guide.md with small section pointing to ml-guide.md Examples: * CrossValidatorExample * SimpleParamsExample * (I copied these + the SimpleTextClassificationPipeline example into the ml-guide.md) Bug fixes: * PipelineModel: did not use ParamMaps correctly * UnaryTransformer: issues with TypeTag serialization (Thanks to mengxr for that fix!) CC: mengxr shivaram etrain Documentation for Pipelines: I know the docs are not complete, but the goal is to have enough to let interested people get started using spark.ml and to add more docs once the package is more established/complete. Author: Joseph K. Bradley <joseph@databricks.com> Author: jkbradley <joseph.kurata.bradley@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #3588 from jkbradley/ml-package-docs and squashes the following commits: d393b5c [Joseph K. Bradley] fixed bug in Pipeline (typo from last commit). updated examples for CV and Params for spark.ml c38469c [Joseph K. Bradley] Updated ml-guide with CV examples 99f88c2 [Joseph K. Bradley] Fixed bug in PipelineModel.transform* with usage of params. Updated CrossValidatorExample to use more training examples so it is less likely to get a 0-size fold. ea34dc6 [jkbradley] Merge pull request #4 from mengxr/ml-package-docs 3b83ec0 [Xiangrui Meng] replace TypeTag with explicit datatype 41ad9b1 [Joseph K. Bradley] Added examples for spark.ml: SimpleParamsExample + Java version, CrossValidatorExample + Java version. CrossValidatorExample not working yet. Added programming guide for spark.ml, but need to add CrossValidatorExample to it once CrossValidatorExample works.
2014-12-04 04:00:06 -05:00