spark-instrumented-optimizer

History

Matei Zaharia c344ed04c7 Merge pull request #283 from tmyklebu/master Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.		2013-12-26 01:31:06 -05:00
..
examples	Add banner to PySpark and make wordcount output nicer	2013-09-01 14:13:16 -07:00
lib	Fix PySpark for assembly run and include it in dist	2013-08-29 21:19:06 -07:00
pyspark	Merge pull request #283 from tmyklebu/master	2013-12-26 01:31:06 -05:00
test_support	License headers	2013-12-09 16:41:01 -08:00
.gitignore	Rename top-level 'pyspark' directory to 'python'	2013-01-01 15:05:00 -08:00
epydoc.conf	Add custom serializer support to PySpark.	2013-11-10 16:45:38 -08:00
run-tests	Add custom serializer support to PySpark.	2013-11-10 16:45:38 -08:00