Commit graph

795 commits

Author SHA1 Message Date
Xiangrui Meng 4cafc63524 [SPARK-7584] [MLLIB] User guide for VectorAssembler
This PR adds a section in the user guide for `VectorAssembler` with code examples in Python/Java/Scala. It also adds a unit test in Java.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6556 from mengxr/SPARK-7584 and squashes the following commits:

11313f6 [Xiangrui Meng] simplify Java example
0cd47f3 [Xiangrui Meng] update user guide
fd36292 [Xiangrui Meng] update Java unit test
ce61ca0 [Xiangrui Meng] add Java unit test for VectorAssembler
e399942 [Xiangrui Meng] scala/python example code

(cherry picked from commit 90c606925e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-06-01 15:05:21 -07:00
Reynold Xin 70cf9c3495 [SPARK-3850] Trim trailing spaces for MLlib.
Author: Reynold Xin <rxin@databricks.com>

Closes #6534 from rxin/whitespace-mllib and squashes the following commits:

38926e3 [Reynold Xin] [SPARK-3850] Trim trailing spaces for MLlib.

(cherry picked from commit e1067d0ad1)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 11:35:46 -07:00
Reynold Xin 01f38f75d9 [SPARK-7979] Enforce structural type checker.
Author: Reynold Xin <rxin@databricks.com>

Closes #6536 from rxin/structural-type-checker and squashes the following commits:

f833151 [Reynold Xin] Fixed compilation.
633f9a1 [Reynold Xin] Fixed typo.
d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.

(cherry picked from commit 4b5f12bac9)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-31 01:40:57 -07:00
Mike Dusenberry df56309b04 [SPARK-7920] [MLLIB] Make MLlib ChiSqSelector Serializable (& Fix Related Documentation Example).
The MLlib ChiSqSelector class is not serializable, and so the example in the ChiSqSelector documentation fails. Also, that example is missing the import of ChiSqSelector.

This PR makes ChiSqSelector extend Serializable in MLlib, and adds the ChiSqSelector import statement to the associated example in the documentation.

Author: Mike Dusenberry <dusenberrymw@gmail.com>

Closes #6462 from dusenberrymw/Make_ChiSqSelector_Serializable_and_Fix_Related_Docs_Example and squashes the following commits:

9cb2f94 [Mike Dusenberry] Make MLlib ChiSqSelector Serializable.
d9003bf [Mike Dusenberry] Add missing import in MLlib ChiSqSelector Docs Scala example.

(cherry picked from commit 1281a35188)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-30 16:51:08 -07:00
Reynold Xin f40605f064 [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.
…

Author: Reynold Xin <rxin@databricks.com>

Closes #6491 from rxin/more-whitespace and squashes the following commits:

f6e63dc [Reynold Xin] [SPARK-7940] Enforce whitespace checking for DO, TRY, CATCH, FINALLY, MATCH, LARROW, RARROW in style checker.

(cherry picked from commit 94f62a4979)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-29 13:39:02 -07:00
Patrick Wendell e549874c33 Preparing development version 1.4.0-SNAPSHOT 2015-05-29 13:07:07 -07:00
Patrick Wendell dd109a8746 Preparing Spark release v1.4.0-rc3 2015-05-29 13:06:59 -07:00
Patrick Wendell c68abaa34e Preparing development version 1.4.0-SNAPSHOT 2015-05-29 12:15:18 -07:00
Patrick Wendell fb60503ff2 Preparing Spark release v1.4.0-rc3 2015-05-29 12:15:13 -07:00
MechCoder 4be701aa50 [SPARK-7946] [MLLIB] DecayFactor wrongly set in StreamingKMeans
Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6497 from MechCoder/spark-7946 and squashes the following commits:

2fdd0a3 [MechCoder] Add non-regression test
8c988c6 [MechCoder] [SPARK-7946] DecayFactor wrongly set in StreamingKMeans

(cherry picked from commit 6181937f31)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-29 11:36:48 -07:00
Xiangrui Meng 509a7cafcc [SPARK-7912] [SPARK-7921] [MLLIB] Update OneHotEncoder to handle ML attributes and change includeFirst to dropLast
This PR contains two major changes to `OneHotEncoder`:

1. more robust handling of ML attributes. If the input attribute is unknown, we look at the values to get the max category index
2. change `includeFirst` to `dropLast` and leave the default to `true`. There are couple benefits:

    a. consistent with other tutorials of one-hot encoding (or dummy coding) (e.g., http://www.ats.ucla.edu/stat/mult_pkg/faq/general/dummy.htm)
    b. keep the indices unmodified in the output vector. If we drop the first, all indices will be shifted by 1.
    c. If users use `StringIndex`, the last element is the least frequent one.

Sorry for including two changes in one PR! I'll update the user guide in another PR.

jkbradley sryza

Author: Xiangrui Meng <meng@databricks.com>

Closes #6466 from mengxr/SPARK-7912 and squashes the following commits:

a280dca [Xiangrui Meng] fix tests
d8f234d [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7912
171b276 [Xiangrui Meng] mention the difference between our impl vs sklearn's
00dfd96 [Xiangrui Meng] update OneHotEncoder in Python
208ddad [Xiangrui Meng] update OneHotEncoder to handle ML attributes and change includeFirst to dropLast

(cherry picked from commit 23452be944)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-29 00:51:24 -07:00
Patrick Wendell 6bf5a42084 Preparing development version 1.4.0-SNAPSHOT 2015-05-28 23:40:27 -07:00
Patrick Wendell f2796816be Preparing Spark release v1.4.0-rc3 2015-05-28 23:40:22 -07:00
Patrick Wendell 119c93af9c Preparing development version 1.4.0-SNAPSHOT 2015-05-28 22:57:31 -07:00
Patrick Wendell 2d97d7a0aa Preparing Spark release v1.4.0-rc3 2015-05-28 22:57:26 -07:00
Xiangrui Meng 68559423ac [SPARK-7922] [MLLIB] use DataFrames for user/item factors in ALSModel
Expose user/item factors in DataFrames. This is to be more consistent with the pipeline API. It also helps maintain consistent APIs across languages. This PR also removed fitting params from `ALSModel`.

coderxiang

Author: Xiangrui Meng <meng@databricks.com>

Closes #6468 from mengxr/SPARK-7922 and squashes the following commits:

7bfb1d5 [Xiangrui Meng] update ALSModel in PySpark
1ba5607 [Xiangrui Meng] use DataFrames for user/item factors in ALS

(cherry picked from commit db95137897)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 22:38:46 -07:00
Xiangrui Meng 0c05115063 [SPARK-7927] [MLLIB] Enforce whitespace for more tokens in style checker
rxin

Author: Xiangrui Meng <meng@databricks.com>

Closes #6481 from mengxr/mllib-scalastyle and squashes the following commits:

3ca4d61 [Xiangrui Meng] revert scalastyle config
30961ba [Xiangrui Meng] adjust spaces in mllib/test
571b5c5 [Xiangrui Meng] fix spaces in mllib

(cherry picked from commit 04616b1a2f)
Signed-off-by: Reynold Xin <rxin@databricks.com>
2015-05-28 20:09:21 -07:00
Xusen Yin 7bb445a38c [SPARK-7577] [ML] [DOC] add bucketizer doc
CC jkbradley

Author: Xusen Yin <yinxusen@gmail.com>

Closes #6451 from yinxusen/SPARK-7577 and squashes the following commits:

e2dc32e [Xusen Yin] rename colums
e350e49 [Xusen Yin] add all demos
006ddf1 [Xusen Yin] add java test
3238481 [Xusen Yin] add bucketizer

(cherry picked from commit 1bd63e82fd)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-28 17:30:33 -07:00
Xiangrui Meng b9bdf12a1c [SPARK-7198] [MLLIB] VectorAssembler should output ML attributes
`VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases:

1. DoubleType with ML attribute: carry over
2. DoubleType without ML attribute: numeric value
3. Scalar type: numeric value
4. VectorType with all ML attributes: carry over and update names
5. VectorType with number of ML attributes: assume all numeric
6. VectorType without ML attributes: check the first row and get the number of attributes

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:

a9d2469 [Xiangrui Meng] add space
facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes

(cherry picked from commit 7859ab659e)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-28 16:32:59 -07:00
Xiangrui Meng 7b5dffb802 [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times
~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~

The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:

c257d2a [Xiangrui Meng] add a workaround for VectorUDT

(cherry picked from commit 530efe3e80)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-28 12:03:55 -07:00
Patrick Wendell 7c342bdd93 Preparing development version 1.4.0-SNAPSHOT 2015-05-27 22:36:30 -07:00
Patrick Wendell 4983dfc878 Preparing Spark release v1.4.0-rc3 2015-05-27 22:36:23 -07:00
Xiangrui Meng 34e233f9ce [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API
1. removed `Params.validateParams(extra)`
2. added `Evaluate.evaluate(dataset, paramPairs*)`
3. updated `RegressionEvaluator` doc

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:

5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps
607445d [Xiangrui Meng] merge master
8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1
fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)

(cherry picked from commit a9f1c0c57b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 23:51:39 -07:00
Xiangrui Meng b5ee7eefdb [SPARK-7748] [MLLIB] Graduate spark.ml from alpha
With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.

jkbradley harsha2010

Author: Xiangrui Meng <meng@databricks.com>

Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:

effbccd [Xiangrui Meng] organize imports
c15028e [Xiangrui Meng] added missing docs
1b2e5f8 [Xiangrui Meng] update package doc
73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)

(cherry picked from commit 836a75898f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 15:51:40 -07:00
MechCoder 51d98b0e97 [SPARK-7844] [MLLIB] Fix broken tests in KernelDensity
The densities in KernelDensity are scaled down by
(number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #6383 from MechCoder/spark-7844 and squashes the following commits:

ab81302 [MechCoder] Math->math
9b8ed50 [MechCoder] Make one pass to update count
a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity

(cherry picked from commit 61664732b2)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-26 13:22:42 -07:00
Ram Sriharsha 16a6da52f8 [SPARK-7833] [ML] Add python wrapper for RegressionEvaluator
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6365 from harsha2010/SPARK-7833 and squashes the following commits:

923f288 [Ram Sriharsha] cleanup
7623b7d [Ram Sriharsha] python style fix
9743f83 [Ram Sriharsha] [SPARK-7833][ml] Add python wrapper for RegressionEvaluator

(cherry picked from commit 65c696ecc0)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-24 10:36:08 -07:00
Patrick Wendell 947d700ec8 Preparing development version 1.4.0-SNAPSHOT 2015-05-23 20:13:05 -07:00
Patrick Wendell 03fb26a3e5 Preparing Spark release v1.4.0-rc2 2015-05-23 20:13:00 -07:00
Patrick Wendell f2f74b9b1a Preparing development version 1.4.1-SNAPSHOT 2015-05-23 14:59:37 -07:00
Patrick Wendell 0da7396990 Preparing Spark release v1.4.0-rc2-test 2015-05-23 14:59:31 -07:00
Patrick Wendell 8da8caab17 Preparing development version 1.4.1-SNAPSHOT 2015-05-23 14:46:27 -07:00
Patrick Wendell 8f50218f38 Preparing Spark release 1.4.0-rc2-test 2015-05-23 14:46:23 -07:00
Ram Sriharsha d709d7cebd [SPARK-7404] [ML] Add RegressionEvaluator to spark.ml
Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6344 from harsha2010/SPARK-7404 and squashes the following commits:

16b9d77 [Ram Sriharsha] consistent naming
7f100b6 [Ram Sriharsha] cleanup
c46044d [Ram Sriharsha] Merge with Master + Code Review Fixes
188fa0a [Ram Sriharsha] Merge branch 'master' into SPARK-7404
f5b6a4c [Ram Sriharsha] cleanup doc
97beca5 [Ram Sriharsha] update test to use R packages
32dd310 [Ram Sriharsha] fix indentation
f93b812 [Ram Sriharsha] fix test
1b6ebb3 [Ram Sriharsha] [SPARK-7404][ml] Add RegressionEvaluator to spark.ml

(cherry picked from commit f490b3b4c7)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-22 09:59:51 -07:00
Joseph K. Bradley 19e579c552 [SPARK-7578] [ML] [DOC] User guide for spark.ml Normalizer, IDF, StandardScaler
Added user guide sections with code examples.
Also added small Java unit tests to test Java example in guide.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6127 from jkbradley/feature-guide-2 and squashes the following commits:

cd47f4b [Joseph K. Bradley] Updated based on code review
f16bcec [Joseph K. Bradley] Fixed merge issues and update Python examples print calls for Python 3
0a862f9 [Joseph K. Bradley] Added Normalizer, StandardScaler to ml-features doc, plus small Java unit tests
a21c2d6 [Joseph K. Bradley] Updated ml-features.md with IDF

(cherry picked from commit 2728c3df66)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 22:59:53 -07:00
Xiangrui Meng 11a5f32116 [SPARK-7535] [.0] [MLLIB] Audit the pipeline APIs for 1.4
Some changes to the pipeilne APIs:

1. Estimator/Transformer/ doesn’t need to extend Params since PipelineStage already does.
1. Move Evaluator to ml.evaluation.
1. Mention larger metric values are better.
1. PipelineModel doc. “compiled” -> “fitted”
1. Hide object PolynomialExpansion.
1. Hide object VectorAssembler.
1. Word2Vec.minCount (and other) -> group param
1. ParamValidators -> DeveloperApi
1. Hide MetadataUtils/SchemaUtils.

jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6322 from mengxr/SPARK-7535.0 and squashes the following commits:

9e9c7da [Xiangrui Meng] move JavaEvaluator to ml.evaluation as well
e179480 [Xiangrui Meng] move Evaluation to ml.evaluation in PySpark
08ef61f [Xiangrui Meng] update pipieline APIs

(cherry picked from commit 8f11c6116b)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 22:57:43 -07:00
Xiangrui Meng df55a0d767 [SPARK-7219] [MLLIB] Output feature attributes in HashingTF
This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to support output metadata. A `df outputMetadata: Metadata` is not sufficient because the metadata may also depends on the input data. Though this is not true for `HashingTF`, I think it is reasonable to update `UnaryTransformer` in a separate PR. `checkParams` is added to verify common requirements for params. I will send a separate PR to use it in other test suites. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6308 from mengxr/SPARK-7219 and squashes the following commits:

9bd2922 [Xiangrui Meng] address comments
e82a68a [Xiangrui Meng] remove sqlContext from test suite
995535b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7219
2194703 [Xiangrui Meng] add test for attributes
178ae23 [Xiangrui Meng] update HashingTF with tests
91a6106 [Xiangrui Meng] WIP

(cherry picked from commit 85b96372cf)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-21 18:04:55 -07:00
Xiangrui Meng ef9336335f [SPARK-7794] [MLLIB] update RegexTokenizer default settings
The previous default is `{gaps: false, pattern: "\\p{L}+|[^\\p{L}\\s]+"}`. The default pattern is hard to understand. This PR changes the default to `{gaps: true, pattern: "\\s+"}`. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6330 from mengxr/SPARK-7794 and squashes the following commits:

5ee7cde [Xiangrui Meng] update RegexTokenizer default settings

(cherry picked from commit f5db4b416c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 17:59:13 -07:00
Xiangrui Meng 7e0912b1d1 [SPARK-7498] [MLLIB] add varargs back to setDefault
We removed `varargs` due to Java compilation issues. That was a false alarm because I didn't run `build/sbt clean`. So this PR reverts the changes. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6320 from mengxr/SPARK-7498 and squashes the following commits:

74a7259 [Xiangrui Meng] add varargs back to setDefault

(cherry picked from commit cdc7c055c9)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 13:06:59 -07:00
Joseph K. Bradley e29b811ed3 [SPARK-7585] [ML] [DOC] VectorIndexer user guide section
Added VectorIndexer section to ML user guide.  Also added javaCategoryMaps() method and Java unit test for it.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #6255 from jkbradley/vector-indexer-guide and squashes the following commits:

dbb8c4c [Joseph K. Bradley] simplified VectorIndexerModel.javaCategoryMaps
f692084 [Joseph K. Bradley] Added VectorIndexer section to ML user guide.  Also added javaCategoryMaps() method and Java unit test for it.

(cherry picked from commit 6d75ed7e5c)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 13:06:17 -07:00
Shuo Xiang f6a29c72c6 [SPARK-7793] [MLLIB] Use getOrElse for getting the threshold of SVM model
same issue and fix as in Spark-7694.

Author: Shuo Xiang <shuoxiangpub@gmail.com>

Closes #6321 from coderxiang/nb and squashes the following commits:

a5e6de4 [Shuo Xiang] use getOrElse for svmmodel.tostring
2cb0177 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into nb
5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
98804c9 [Shuo Xiang] fix bug in topBykey and update test

(cherry picked from commit 4f572008f8)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-21 12:11:17 -07:00
Xiangrui Meng b97a8053a0 [SPARK-7752] [MLLIB] Use lowercase letters for NaiveBayes.modelType
to be consistent with other string names in MLlib. This PR also updates the implementation to use vals instead of hardcoded strings. jkbradley leahmcguire

Author: Xiangrui Meng <meng@databricks.com>

Closes #6277 from mengxr/SPARK-7752 and squashes the following commits:

f38b662 [Xiangrui Meng] add another case _ back in test
ae5c66a [Xiangrui Meng] model type -> modelType
711d1c6 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7752
40ae53e [Xiangrui Meng] fix Java test suite
264a814 [Xiangrui Meng] add case _ back
3c456a8 [Xiangrui Meng] update NB user guide
17bba53 [Xiangrui Meng] update naive Bayes to use lowercase model type strings

(cherry picked from commit 13348e21b6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-21 10:30:27 -07:00
Xiangrui Meng 64762444e7 [SPARK-7753] [MLLIB] Update KernelDensity API
Update `KernelDensity` API to make it extensible to different kernels in the future. `bandwidth` is used instead of `standardDeviation`. The static `kernelDensity` method is removed from `Statistics`. The implementation is updated using BLAS, while the algorithm remains the same. sryza srowen

Author: Xiangrui Meng <meng@databricks.com>

Closes #6279 from mengxr/SPARK-7753 and squashes the following commits:

4cdfadc [Xiangrui Meng] add example code in the doc
767fd5a [Xiangrui Meng] update KernelDensity API

(cherry picked from commit 947ea1cf5f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 23:39:06 -07:00
Xiangrui Meng 9711e9bf1d [SPARK-7774] [MLLIB] add sqlContext to MLlibTestSparkContext
to simplify test suites that require a SQLContext.

Author: Xiangrui Meng <meng@databricks.com>

Closes #6303 from mengxr/SPARK-7774 and squashes the following commits:

0622b5a [Xiangrui Meng] update some other test suites
e1f9b8d [Xiangrui Meng] add sqlContext to MLlibTestSparkContext

(cherry picked from commit ddec173cba)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
2015-05-20 20:30:46 -07:00
Patrick Wendell 9b37e32c55 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:29:00 -07:00
Patrick Wendell 1e458e3553 Preparing Spark release rc-test 2015-05-20 17:28:55 -07:00
Xiangrui Meng 5f64269c52 [SPARK-7762] [MLLIB] set default value for outputCol
Set a default value for `outputCol` instead of forcing users to name it. This is useful for intermediate transformers in the pipeline. jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #6289 from mengxr/SPARK-7762 and squashes the following commits:

54edebc [Xiangrui Meng] merge master
bff8667 [Xiangrui Meng] update unit test
171246b [Xiangrui Meng] add unit test for outputCol
a4321bd [Xiangrui Meng] set default value for outputCol

(cherry picked from commit c330e52dae)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
2015-05-20 17:26:44 -07:00
pwendell 8d66849862 Preparing development version 1.4.0-SNAPSHOT 2015-05-20 17:26:15 -07:00
pwendell ae29aeaf8e Preparing Spark release rc-test 2015-05-20 17:26:10 -07:00
jenkins 534c787b9f Preparing development version 1.4.0-SNAPSHOT 2015-05-20 16:49:59 -07:00
jenkins 5f4d87f608 Preparing Spark release rc-test 2015-05-20 16:49:54 -07:00