c344ed04c7
Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order. |
||
---|---|---|
.. | ||
mllib | ||
__init__.py | ||
accumulators.py | ||
broadcast.py | ||
cloudpickle.py | ||
context.py | ||
daemon.py | ||
files.py | ||
java_gateway.py | ||
join.py | ||
rdd.py | ||
rddsampler.py | ||
serializers.py | ||
shell.py | ||
statcounter.py | ||
storagelevel.py | ||
tests.py | ||
worker.py |