Commit graph

777 commits

Author SHA1 Message Date
Xiangrui Meng b9bdf12a1c [SPARK-7198] [MLLIB] VectorAssembler should output ML attributes
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases:

1. DoubleType with ML attribute: carry over
2. DoubleType without ML attribute: numeric value
3. Scalar type: numeric value
4. VectorType with all ML attributes: carry over and update names
5. VectorType with number of ML attributes: assume all numeric
6. VectorType without ML attributes: check the first row and get the number of attributes

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:

a9d2469 [Xiangrui Meng] add space
facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes

(cherry picked from commit 7859ab659e)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-28 16:32:59 -07:00
Xiangrui Meng 7b5dffb802 [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times
~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT

(cherry picked from commit 530efe3e80)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 12:03:55 -07:00
Patrick Wendell 7c342bdd93 Preparing development version 1.4.0-SNAPSHOT 2015-05-27 22:36:30 -07:00
Patrick Wendell 4983dfc878 Preparing Spark release v1.4.0-rc3 2015-05-27 22:36:23 -07:00
Xiangrui Meng 34e233f9ce [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API
1. removed `Params.validateParams(extra)`
2. added `Evaluate.evaluate(dataset, paramPairs*)`
3. updated `RegressionEvaluator` doc

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:

5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps
607445d [Xiangrui Meng] merge master
8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1
fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)

(cherry picked from commit a9f1c0c57b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 23:51:39 -07:00
Xiangrui Meng b5ee7eefdb [SPARK-7748] [MLLIB] Graduate spark.ml from alpha
With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.

jkbradley harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:

effbccd [Xiangrui Meng] organize imports
c15028e [Xiangrui Meng] added missing docs
1b2e5f8 [Xiangrui Meng] update package doc
73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)

(cherry picked from commit 836a75898f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 15:51:40 -07:00
MechCoder 51d98b0e97 [SPARK-7844] [MLLIB] Fix broken tests in KernelDensity
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity

(cherry picked from commit 61664732b2)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 13:22:42 -07:00
Ram Sriharsha 16a6da52f8 [SPARK-7833] [ML] Add python wrapper for RegressionEvaluator
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6365 from harsha2010/SPARK-7833 and squashes the following commits:

923f288 [Ram Sriharsha] cleanup
7623b7d [Ram Sriharsha] python style fix
9743f83 [Ram Sriharsha] [SPARK-7833][ml] Add python wrapper for RegressionEvaluator

(cherry picked from commit 65c696ecc0)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-24 10:36:08 -07:00
Patrick Wendell 947d700ec8 Preparing development version 1.4.0-SNAPSHOT 2015-05-23 20:13:05 -07:00
Patrick Wendell 03fb26a3e5 Preparing Spark release v1.4.0-rc2 2015-05-23 20:13:00 -07:00
Patrick Wendell f2f74b9b1a Preparing development version 1.4.1-SNAPSHOT 2015-05-23 14:59:37 -07:00
Patrick Wendell 0da7396990 Preparing Spark release v1.4.0-rc2-test 2015-05-23 14:59:31 -07:00
Patrick Wendell 8da8caab17 Preparing development version 1.4.1-SNAPSHOT 2015-05-23 14:46:27 -07:00
Patrick Wendell 8f50218f38 Preparing Spark release 1.4.0-rc2-test 2015-05-23 14:46:23 -07:00
Ram Sriharsha d709d7cebd [SPARK-7404] [ML] Add RegressionEvaluator to spark.ml
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6344 from harsha2010/SPARK-7404 and squashes the following commits:

16b9d77 [Ram Sriharsha] consistent naming
7f100b6 [Ram Sriharsha] cleanup
c46044d [Ram Sriharsha] Merge with Master + Code Review Fixes
188fa0a [Ram Sriharsha] Merge branch 'master' into SPARK-7404
f5b6a4c [Ram Sriharsha] cleanup doc
97beca5 [Ram Sriharsha] update test to use R packages
32dd310 [Ram Sriharsha] fix indentation
f93b812 [Ram Sriharsha] fix test
1b6ebb3 [Ram Sriharsha] [SPARK-7404][ml] Add RegressionEvaluator to spark.ml

(cherry picked from commit f490b3b4c7)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-22 09:59:51 -07:00
Joseph K. Bradley 19e579c552 [SPARK-7578] [ML] [DOC] User guide for spark.ml Normalizer, IDF, StandardScaler
Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF

(cherry picked from commit 2728c3df66)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 22:59:53 -07:00
Xiangrui Meng 11a5f32116 [SPARK-7535] [.0] [MLLIB] Audit the pipeline APIs for 1.4
Some changes to the pipeilne APIs:

1. Estimator/Transformer/ doesn’t need to extend Params since PipelineStage already does.
1. Move Evaluator to ml.evaluation.
1. Mention larger metric values are better.
1. PipelineModel doc. “compiled” -> “fitted”
1. Hide object PolynomialExpansion.
1. Hide object VectorAssembler.
1. Word2Vec.minCount (and other) -> group param
1. ParamValidators -> DeveloperApi
1. Hide MetadataUtils/SchemaUtils.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6322 from mengxr/SPARK-7535.0 and squashes the following commits:

9e9c7da [Xiangrui Meng] move JavaEvaluator to ml.evaluation as well
e179480 [Xiangrui Meng] move Evaluation to ml.evaluation in PySpark
08ef61f [Xiangrui Meng] update pipieline APIs

(cherry picked from commit 8f11c6116b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 22:57:43 -07:00
Xiangrui Meng df55a0d767 [SPARK-7219] [MLLIB] Output feature attributes in HashingTF
This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to support output metadata. A `df outputMetadata: Metadata` is not sufficient because the metadata may also depends on the input data. Though this is not true for `HashingTF`, I think it is reasonable to update `UnaryTransformer` in a separate PR. `checkParams` is added to verify common requirements for params. I will send a separate PR to use it in other test suites. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6308 from mengxr/SPARK-7219 and squashes the following commits:

9bd2922 [Xiangrui Meng] address comments
e82a68a [Xiangrui Meng] remove sqlContext from test suite
995535b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7219
2194703 [Xiangrui Meng] add test for attributes
178ae23 [Xiangrui Meng] update HashingTF with tests
91a6106 [Xiangrui Meng] WIP

(cherry picked from commit 85b96372cf)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-21 18:04:55 -07:00
Xiangrui Meng ef9336335f [SPARK-7794] [MLLIB] update RegexTokenizer default settings
The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6330 from mengxr/SPARK-7794 and squashes the following commits:

5ee7cde [Xiangrui Meng] update RegexTokenizer default settings

(cherry picked from commit f5db4b416c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 17:59:13 -07:00
Xiangrui Meng 7e0912b1d1 [SPARK-7498] [MLLIB] add varargs back to setDefault
We removed `varargs` due to Java compilation issues. That was a false alarm because I didn't run `build/sbt clean`. So this PR reverts the changes. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6320 from mengxr/SPARK-7498 and squashes the following commits:

74a7259 [Xiangrui Meng] add varargs back to setDefault

(cherry picked from commit cdc7c055c9)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 13:06:59 -07:00
Joseph K. Bradley e29b811ed3 [SPARK-7585] [ML] [DOC] VectorIndexer user guide section
Added VectorIndexer section to ML user guide.  Also added javaCategoryMaps() method and Java unit test for it.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6255 from jkbradley/vector-indexer-guide and squashes the following commits:

dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps
f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide.  Also added javaCategoryMaps() method and Java unit test for it.

(cherry picked from commit 6d75ed7e5c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 13:06:17 -07:00
Shuo Xiang f6a29c72c6 [SPARK-7793] [MLLIB] Use getOrElse for getting the threshold of SVM model
same issue and fix as in Spark-7694.

Author: Shuo Xiang <shuoxiangpub@gmail.com>

Closes #6321 from coderxiang/nb and squashes the following commits:

a5e6de4 [Shuo Xiang] use getOrElse for svmmodel.tostring
2cb0177 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into nb
5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
98804c9 [Shuo Xiang] fix bug in topBykey and update test

(cherry picked from commit 4f572008f8)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 12:11:17 -07:00
Xiangrui Meng b97a8053a0 [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType
to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire

Author: Xiangrui Meng <meng@databricks.com>

Closes #6277 from mengxr/SPARK-7752 and squashes the following commits:

f38b662 [Xiangrui Meng] add another case _ back in test
ae5c66a [Xiangrui Meng] model type -> modelType
711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752
40ae53e [Xiangrui Meng] fix Java test suite
264a814 [Xiangrui Meng] add case _ back
3c456a8 [Xiangrui Meng] update NB user guide
17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings

(cherry picked from commit 13348e21b6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-21 10:30:27 -07:00
Xiangrui Meng 64762444e7 [SPARK-7753] [MLLIB] Update KernelDensity API
Update `KernelDensity` API to make it extensible to different kernels in the future. `bandwidth` is used instead of `standardDeviation`. The static `kernelDensity` method is removed from `Statistics`. The implementation is updated using BLAS, while the algorithm remains the same. sryza srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #6279 from mengxr/SPARK-7753 and squashes the following commits:

4cdfadc [Xiangrui Meng] add example code in the doc
767fd5a [Xiangrui Meng] update KernelDensity API

(cherry picked from commit 947ea1cf5f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 23:39:06 -07:00
Xiangrui Meng 9711e9bf1d [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext
to simplify test suites that require a SQLContext.

Author: Xiangrui Meng <meng@databricks.com>

Closes #6303 from mengxr/SPARK-7774 and squashes the following commits:

0622b5a [Xiangrui Meng] update some other test suites
e1f9b8d [Xiangrui Meng] add sqlContext to MLlibTestSparkContext

(cherry picked from commit ddec173cba)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 20:30:46 -07:00
Patrick Wendell 9b37e32c55 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:29:00 -07:00
Patrick Wendell 1e458e3553 Preparing Spark release rc-test 2015-05-20 17:28:55 -07:00
Xiangrui Meng 5f64269c52 [SPARK-7762] [MLLIB] set default value for outputCol
Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6289 from mengxr/SPARK-7762 and squashes the following commits:

54edebc [Xiangrui Meng] merge master
bff8667 [Xiangrui Meng] update unit test
171246b [Xiangrui Meng] add unit test for outputCol
a4321bd [Xiangrui Meng] set default value for outputCol

(cherry picked from commit c330e52dae)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-20 17:26:44 -07:00
pwendell 8d66849862 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:26:15 -07:00
pwendell ae29aeaf8e Preparing Spark release rc-test 2015-05-20 17:26:10 -07:00
jenkins 534c787b9f Preparing development version 1.4.0-SNAPSHOT 2015-05-20 16:49:59 -07:00
jenkins 5f4d87f608 Preparing Spark release rc-test 2015-05-20 16:49:54 -07:00
Patrick Wendell 205ed15f29 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 16:30:01 -07:00
Patrick Wendell 09a1c6231e Preparing Spark release rc-test 2015-05-20 16:29:52 -07:00
Xiangrui Meng b40c5ed7a7 [SPARK-7537] [MLLIB] spark.mllib API updates
Minor updates to the spark.mllib APIs:

1. Add `DeveloperApi` to `PMMLExportable` and add `Experimental` to `toPMML` methods.
2. Mention `RankingMetrics.of` in the `RankingMetrics` constructor.

Author: Xiangrui Meng <meng@databricks.com>

Closes #6280 from mengxr/SPARK-7537 and squashes the following commits:

1bd2583 [Xiangrui Meng] organize imports
94afa7a [Xiangrui Meng] mark all toPMML methods experimental
4c40da1 [Xiangrui Meng] mention the factory method for RankingMetrics for Java users
88c62d0 [Xiangrui Meng] add DeveloperApi to PMMLExportable

(cherry picked from commit 2ad4837cfa)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 12:50:30 -07:00
Yanbo Liang 606ae3e10e [SPARK-6094] [MLLIB] Add MultilabelMetrics in PySpark/MLlib
Add MultilabelMetrics in PySpark/MLlib

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #6276 from yanboliang/spark-6094 and squashes the following commits:

b8e3343 [Yanbo Liang] Add MultilabelMetrics in PySpark/MLlib

(cherry picked from commit 98a46f9dff)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 07:56:00 -07:00
Xiangrui Meng 996e2d4b38 [SPARK-7654] [MLLIB] Migrate MLlib to the DataFrame reader/writer API
parquetFile -> read.parquet rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #6281 from mengxr/SPARK-7654 and squashes the following commits:

a79b612 [Xiangrui Meng] parquetFile -> read.parquet

(cherry picked from commit 589b12f8e6)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 07:47:26 -07:00
Liang-Chi Hsieh 5643499d22 [SPARK-7652] [MLLIB] Update the implementation of naive Bayes prediction with BLAS
JIRA: https://issues.apache.org/jira/browse/SPARK-7652

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #6189 from viirya/naive_bayes_blas_prediction and squashes the following commits:

ab611fd [Liang-Chi Hsieh] Remove unnecessary space.
ddc48b9 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into naive_bayes_blas_prediction
b5772b4 [Liang-Chi Hsieh] Fix binary compatibility.
2f65186 [Liang-Chi Hsieh] Remove toDense.
1b6cdfe [Liang-Chi Hsieh] Update the implementation of naive Bayes prediction with BLAS.

(cherry picked from commit c12dff9b82)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-19 13:53:16 -07:00
Xusen Yin c3871eeb25 [SPARK-7586] [ML] [DOC] Add docs of Word2Vec in ml package
CC jkbradley.

JIRA [issue](https://issues.apache.org/jira/browse/SPARK-7586).

Author: Xusen Yin <yinxusen@gmail.com>

Closes #6181 from yinxusen/SPARK-7586 and squashes the following commits:

77014c5 [Xusen Yin] comment fix
57a4c07 [Xusen Yin] small fix for docs
1178c8f [Xusen Yin] remove the correctness check in java suite
1c3f389 [Xusen Yin] delete sbt commit
1af152b [Xusen Yin] check python example code
1b5369e [Xusen Yin] add docs of word2vec

(cherry picked from commit 68fb2a46ed)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-19 13:44:06 -07:00
Joseph K. Bradley cd3093e705 [SPARK-7678] [ML] Fix default random seed in HasSeed
Changed shared param HasSeed to have default based on hashCode of class name, instead of random number.
Also, removed fixed random seeds from Word2Vec and ALS.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6251 from jkbradley/scala-fixed-seed and squashes the following commits:

0e37184 [Joseph K. Bradley] Fixed Word2VecSuite, ALSSuite in spark.ml to use original fixed random seeds
678ec3a [Joseph K. Bradley] Removed fixed random seeds from Word2Vec and ALS. Changed shared param HasSeed to have default based on hashCode of class name, instead of random number.

(cherry picked from commit 7b16e9f211)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-19 10:57:54 -07:00
Joseph K. Bradley 24cb323e76 [SPARK-7047] [ML] ml.Model optional parent support
Made Model.parent transient.  Added Model.hasParent to test for null parent

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #5914 from jkbradley/parent-optional and squashes the following commits:

d501774 [Joseph K. Bradley] Made Model.parent transient.  Added Model.hasParent to test for null parent

(cherry picked from commit fb90273212)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-19 10:55:32 -07:00
Patrick Wendell ac3197e1b9 Preparing development version 1.4.1-SNAPSHOT 2015-05-19 09:35:12 +00:00
Patrick Wendell 777a08166f Preparing Spark release v1.4.0-rc1 2015-05-19 09:35:12 +00:00
Patrick Wendell 586ede6b32 Revert "Preparing Spark release v1.4.0-rc1"
This reverts commit 79fb01a3be.
2015-05-19 02:27:14 -07:00
Patrick Wendell e7309ec729 Revert "Preparing development version 1.4.1-SNAPSHOT"
This reverts commit a1d896b85b.
2015-05-19 02:27:07 -07:00
Patrick Wendell a1d896b85b Preparing development version 1.4.1-SNAPSHOT 2015-05-19 07:13:24 +00:00
Patrick Wendell 79fb01a3be Preparing Spark release v1.4.0-rc1 2015-05-19 07:13:24 +00:00
Patrick Wendell b0c63d2413 Revert "Preparing Spark release v1.4.0-rc1"
This reverts commit 38ccef36c1.
2015-05-19 00:10:39 -07:00
Patrick Wendell 198a186ad3 Revert "Preparing development version 1.4.1-SNAPSHOT"
This reverts commit 40190ce226.
2015-05-19 00:10:37 -07:00
Xusen Yin 38a3fc8485 [SPARK-7581] [ML] [DOC] User guide for spark.ml PolynomialExpansion
JIRA [here](https://issues.apache.org/jira/browse/SPARK-7581).

CC jkbradley

Author: Xusen Yin <yinxusen@gmail.com>

Closes #6113 from yinxusen/SPARK-7581 and squashes the following commits:

1a7d80d [Xusen Yin] merge with master
892a8e9 [Xusen Yin] fix python 3 compatibility
ec935bf [Xusen Yin] small fix
3e9fa1d [Xusen Yin] delete note
69fcf85 [Xusen Yin] simplify and add python example
81d21dc [Xusen Yin] add programming guide for Polynomial Expansion
40babfb [Xusen Yin] add java test suite for PolynomialExpansion

(cherry picked from commit 6008ec14ed)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-19 00:06:43 -07:00