spark-instrumented-optimizer/mllib
Joseph K. Bradley e281b87398 [SPARK-5565][ML] LDA wrapper for Pipelines API
This adds LDA to spark.ml, the Pipelines API.  It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change:
* I eliminated doc IDs.  These are not necessary with DataFrames since the user can add an ID column as needed.

Note: This will conflict with [https://github.com/apache/spark/pull/9484], but I'll try to merge [https://github.com/apache/spark/pull/9484] first and then rebase this PR.

CC: hhbyyh feynmanliang  If you have a chance to make a pass, that'd be really helpful--thanks!  Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #9513 from jkbradley/lda-pipelines.
2015-11-10 16:20:10 -08:00
..
src [SPARK-5565][ML] LDA wrapper for Pipelines API 2015-11-10 16:20:10 -08:00
pom.xml [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. 2015-10-07 14:11:21 -07:00