spark-instrumented-optimizer/python
Matei Zaharia c344ed04c7 Merge pull request #283 from tmyklebu/master
Python bindings for mllib

This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib.

For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py.  The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub.  The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model.

ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside.  The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method.

I have tested these bindings on an x86_64 machine running Linux.  There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
2013-12-26 01:31:06 -05:00
..
examples Add banner to PySpark and make wordcount output nicer 2013-09-01 14:13:16 -07:00
lib Fix PySpark for assembly run and include it in dist 2013-08-29 21:19:06 -07:00
pyspark Merge pull request #283 from tmyklebu/master 2013-12-26 01:31:06 -05:00
test_support License headers 2013-12-09 16:41:01 -08:00
.gitignore Rename top-level 'pyspark' directory to 'python' 2013-01-01 15:05:00 -08:00
epydoc.conf Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00
run-tests Add custom serializer support to PySpark. 2013-11-10 16:45:38 -08:00