(Just made a PR for this, mengxr was the reporter of:)
MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
Author: Sean Owen <sowen@cloudera.com>
Closes#1394 from srowen/SPARK-2363 and squashes the following commits:
54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
Added documentation for user to use the decision tree algorithms for classification and regression in Spark 1.0 release.
Apart from a general review, I need specific input on the following:
* I had to move a lot of the existing documentation under the *linear methods* umbrella to accommodate decision trees. I wonder if there is a better way to organize the programming guide given we are so close to the release.
* I have not looked closely at pyspark but I am wondering new mllib algorithms are automatically plugged in or do we need to some extra work to call mllib functions from pyspark. I will add to the pyspark examples based upon the advice I get.
cc: @mengxr, @hirakendu, @etrain, @atalwalkar
Author: Manish Amde <manish9ue@gmail.com>
Closes#402 from manishamde/tree_doc and squashes the following commits:
022485a [Manish Amde] more documentation
865826e [Manish Amde] minor: grammar
dbb0e5e [Manish Amde] minor improvements to text
b9ef6c4 [Manish Amde] basic decision tree code examples
6e297d7 [Manish Amde] added subsections
f427e84 [Manish Amde] renaming sections
9c0c4be [Manish Amde] split candidate
6925275 [Manish Amde] impurity and information gain
94fd2f9 [Manish Amde] more reorg
b93125c [Manish Amde] more subsection reorg
3ecb2ad [Manish Amde] minor text addition
1537dd3 [Manish Amde] added placeholders and some doc
d06511d [Manish Amde] basic skeleton