Matei Zaharia 0b85516781 SPARK-1421. Make MLlib work on Python 2.6

The reason it wasn't working was passing a bytearray to stream.write(), which is not supported in Python 2.6 but is in 2.7. (This array came from NumPy when we converted data to send it over to Java). Now we just convert those bytearrays to strings of bytes, which preserves nonprintable characters as well.

Author: Matei Zaharia <matei@databricks.com>

Closes #335 from mateiz/mllib-python-2.6 and squashes the following commits:

f26c59f [Matei Zaharia] Update docs to no longer say we need Python 2.7
a84d6af [Matei Zaharia] SPARK-1421. Make MLlib work on Python 2.6

2014-04-05 20:52:05 -07:00

1.7 KiB

Raw Blame History

layout	title
global	Machine Learning Library (MLlib)

MLlib is a Spark implementation of some common machine learning (ML) functionality, as well associated tests and data generators. MLlib currently supports four common types of machine learning problem settings, namely, binary classification, regression, clustering and collaborative filtering, as well as an underlying gradient descent optimization primitive.

Available Methods

The following links provide a detailed explanation of the methods and usage examples for each of them:

Classification and Regression
- Binary Classification
  - SVM (L1 and L2 regularized)
  - Logistic Regression (L1 and L2 regularized)
- Linear Regression
  - Least Squares
  - Lasso
  - Ridge Regression
Clustering
- k-Means
Collaborative Filtering
- Matrix Factorization using Alternating Least Squares
Optimization
- Gradient Descent and Stochastic Gradient Descent
Linear Algebra
- Singular Value Decomposition
- Principal Component Analysis

Dependencies

MLlib uses the jblas linear algebra library, which itself depends on native Fortran routines. You may need to install the gfortran runtime library if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically.

To use MLlib in Python, you will need NumPy version 1.7 or newer.

1.7 KiB Raw Blame History

Available Methods

Dependencies

1.7 KiB

Raw Blame History