[SPARK-7667] [MLLIB] MLlib Python API consistency check

MLlib Python API consistency check

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #6856 from yanboliang/spark-7667 and squashes the following commits:

21bae35 [Yanbo Liang] remove duplicate code
eb12f95 [Yanbo Liang] fix doc inherit problem
9e7ec3c [Yanbo Liang] address comments
e763d32 [Yanbo Liang] MLlib Python API consistency check
This commit is contained in:
Yanbo Liang 2015-06-29 18:50:23 -07:00 committed by Joseph K. Bradley
parent 4915e9e3bf
commit f9b6bf2f83

View file

@ -111,6 +111,15 @@ class JavaVectorTransformer(JavaModelWrapper, VectorTransformer):
"""
def transform(self, vector):
"""
Applies transformation on a vector or an RDD[Vector].
Note: In Python, transform cannot currently be used within
an RDD transformation or action.
Call transform directly on the RDD instead.
:param vector: Vector or RDD of Vector to be transformed.
"""
if isinstance(vector, RDD):
vector = vector.map(_convert_to_vector)
else:
@ -191,7 +200,7 @@ class StandardScaler(object):
Computes the mean and variance and stores as a model to be used
for later scaling.
:param data: The data used to compute the mean and variance
:param dataset: The data used to compute the mean and variance
to build the transformation model.
:return: a StandardScalarModel
"""
@ -346,10 +355,6 @@ class IDFModel(JavaVectorTransformer):
vector
:return: an RDD of TF-IDF vectors or a TF-IDF vector
"""
if isinstance(x, RDD):
return JavaVectorTransformer.transform(self, x)
x = _convert_to_vector(x)
return JavaVectorTransformer.transform(self, x)
def idf(self):