The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well. Author: Matei Zaharia <matei@databricks.com> Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits: f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7 a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6
1.7 KiB
layout | title |
---|---|
global | Machine Learning Library (MLlib) |
MLlib is a Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators. MLlib currently supports four common types of machine learning problem settings, namely, binary classification, regression, clustering and collaborative filtering, as well as an underlying gradient descent optimization primitive.
Available Methods
The following links provide a detailed explanation of the methods and usage examples for each of them:
- Classification and Regression
- Binary Classification
- SVM (L1 and L2 regularized)
- Logistic Regression (L1 and L2 regularized)
- Linear Regression
- Least Squares
- Lasso
- Ridge Regression
- Binary Classification
- Clustering
- k-Means
- Collaborative Filtering
- Matrix Factorization using Alternating Least Squares
- Optimization
- Gradient Descent and Stochastic Gradient Descent
- Linear Algebra
- Singular Value Decomposition
- Principal Component Analysis
Dependencies
MLlib uses the jblas linear algebra library, which itself depends on native Fortran routines. You may need to install the gfortran runtime library if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically.
To use MLlib in Python, you will need NumPy version 1.7 or newer.