spark-instrumented-optimizer/docs/mllib-guide.md
Matei Zaharia 0b85516781 SPARK-1421. Make MLlib work on Python 2.6
The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well.

Author: Matei Zaharia <matei@databricks.com>

Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits:

f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7
a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6
2014-04-05 20:52:05 -07:00

1.7 KiB

layout title
global Machine Learning Library (MLlib)

MLlib is a Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators. MLlib currently supports four common types of machine learning problem settings, namely, binary classification, regression, clustering and collaborative filtering, as well as an underlying gradient descent optimization primitive.

Available Methods

The following links provide a detailed explanation of the methods and usage examples for each of them:

Dependencies

MLlib uses the jblas linear algebra library, which itself depends on native Fortran routines. You may need to install the gfortran runtime library if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically.

To use MLlib in Python, you will need NumPy version 1.7 or newer.