1a9c6cddad
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley. ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~ marmbrus jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #3070 from mengxr/SPARK-3573 and squashes the following commits: 3a0b6e5 [Xiangrui Meng] organize imports 236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples |
||
---|---|---|
.. | ||
audit-release | ||
create-release | ||
check-license | ||
github_jira_sync.py | ||
lint-python | ||
lint-scala | ||
merge_spark_pr.py | ||
mima | ||
README.md | ||
run-tests | ||
run-tests-codes.sh | ||
run-tests-jenkins | ||
scalastyle |
Spark Developer Scripts
This directory contains scripts useful to developers when packaging, testing, or committing to Spark.
Many of these scripts require Apache credentials to work correctly.