spark-instrumented-optimizer/python/pyspark/mllib
freeman 6c6f325740 [SPARK-5089][PYSPARK][MLLIB] Fix vector convert
This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans.

The PR includes the fix, as well as a new test for the correct conversion behavior.

davies

Author: freeman <the.freeman.lab@gmail.com>

Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits:

764db47 [freeman] Add a test for proper conversion behavior
704f97e [freeman] Return array after changing type
2015-01-05 13:10:59 -08:00
..
__init__.py [SPARK-4821] [mllib] [python] [docs] Fix for pyspark.mllib.rand doc 2014-12-17 14:12:46 -08:00
classification.py [SPARK-4822] Use sphinx tags for Python doc annotations 2014-12-17 17:31:24 -08:00
clustering.py [SPARK-4531] [MLlib] cache serialized java object 2014-11-21 15:02:31 -08:00
common.py [SPARK-4531] [MLlib] cache serialized java object 2014-11-21 15:02:31 -08:00
feature.py [SPARK-4822] Use sphinx tags for Python doc annotations 2014-12-17 17:31:24 -08:00
linalg.py [SPARK-5089][PYSPARK][MLLIB] Fix vector convert 2015-01-05 13:10:59 -08:00
rand.py [SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py 2014-11-13 10:24:54 -08:00
recommendation.py [SPARK-4531] [MLlib] cache serialized java object 2014-11-21 15:02:31 -08:00
regression.py [SPARK-4531] [MLlib] cache serialized java object 2014-11-21 15:02:31 -08:00
stat.py [SPARK-4822] Use sphinx tags for Python doc annotations 2014-12-17 17:31:24 -08:00
tests.py [SPARK-5089][PYSPARK][MLLIB] Fix vector convert 2015-01-05 13:10:59 -08:00
tree.py [SPARK-4580] [SPARK-4610] [mllib] [docs] Documentation for tree ensembles + DecisionTree API fix 2014-12-04 09:57:50 +08:00
util.py [SPARK-4324] [PySpark] [MLlib] support numpy.array for all MLlib API 2014-11-10 22:26:16 -08:00