diff --git a/conf/spark-env.sh.template b/conf/spark-env.sh.template index b9aab5a371..1663019ee5 100755 --- a/conf/spark-env.sh.template +++ b/conf/spark-env.sh.template @@ -61,3 +61,7 @@ # - SPARK_IDENT_STRING A string representing this instance of spark. (Default: $USER) # - SPARK_NICENESS The scheduling priority for daemons. (Default: 0) # - SPARK_NO_DAEMONIZE Run the proposed command in the foreground. It will not output a PID file. +# Options for native BLAS, like Intel MKL, OpenBLAS, and so on. +# You might get better performance to enable these options if using native BLAS (see SPARK-21305). +# - MKL_NUM_THREADS=1 Disable multi-threading of Intel MKL +# - OPENBLAS_NUM_THREADS=1 Disable multi-threading of OpenBLAS diff --git a/docs/ml-guide.md b/docs/ml-guide.md index fb4621389a..adb1c9aaef 100644 --- a/docs/ml-guide.md +++ b/docs/ml-guide.md @@ -61,6 +61,12 @@ To configure `netlib-java` / Breeze to use system optimised binaries, include project and read the [netlib-java](https://github.com/fommil/netlib-java) documentation for your platform's additional installation instructions. +The most popular native BLAS such as [Intel MKL](https://software.intel.com/en-us/mkl), [OpenBLAS](http://www.openblas.net), can use multiple threads in a single operation, which can conflict with Spark's execution model. + +Configuring these BLAS implementations to use a single thread for operations may actually improve performance (see [SPARK-21305](https://issues.apache.org/jira/browse/SPARK-21305)). It is usually optimal to match this to the number of cores each Spark task is configured to use, which is 1 by default and typically left at 1. + +Please refer to resources like the following to understand how to configure the number of threads these BLAS implementations use: [Intel MKL](https://software.intel.com/en-us/articles/recommended-settings-for-calling-intel-mkl-routines-from-multi-threaded-applications) and [OpenBLAS](https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded). + To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. [^1]: To learn more about the benefits and background of system optimised natives, you may wish to