spark-instrumented-optimizer/python/pyspark/mllib
Xiangrui Meng 1a9c6cddad [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.

~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~

marmbrus jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:

3a0b6e5 [Xiangrui Meng] organize imports
236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
2014-11-03 22:29:48 -08:00
..
__init__.py SPARK-1426: Make MLlib work with NumPy versions older than 1.7 2014-04-15 00:19:43 -07:00
classification.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
clustering.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
common.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
feature.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
linalg.py [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD 2014-11-03 22:29:48 -08:00
random.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
recommendation.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
regression.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
stat.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
tests.py [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD 2014-11-03 22:29:48 -08:00
tree.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00
util.py [SPARK-4124] [MLlib] [PySpark] simplify serialization in MLlib Python API 2014-10-30 22:25:18 -07:00