## What changes were proposed in this pull request?
Tag classes in ml.tuning as experimental, add docs for kfolds avg metric, and copy TrainValidationSplit scaladoc for more detailed explanation.
## How was this patch tested?
built docs locally
Author: Holden Karau <holden@us.ibm.com>
Closes#12967 from holdenk/SPARK-15195-pydoc-ml-tuning.
## What changes were proposed in this pull request?
PyDoc links in ml are in non-standard format. Switch to standard sphinx link format for better formatted documentation. Also add a note about default value in one place. Copy some extended docs from scala for GBT
## How was this patch tested?
Built docs locally.
Author: Holden Karau <holden@us.ibm.com>
Closes#12918 from holdenk/SPARK-15137-linkify-pyspark-ml-classification.
## What changes were proposed in this pull request?
This PR continues the work from #11871 with the following changes:
* load English stopwords as default
* covert stopwords to list in Python
* update some tests and doc
## How was this patch tested?
Unit tests.
Closes#11871
cc: burakkose srowen
Author: Burak Köse <burakks41@gmail.com>
Author: Xiangrui Meng <meng@databricks.com>
Author: Burak KOSE <burakks41@gmail.com>
Closes#12843 from mengxr/SPARK-14050.
## What changes were proposed in this pull request?
Copy the package documentation from Scala/Java to Python for ML package and remove beta tags. Not super sure if we want to keep the BETA tag but since we are making it the default it seems like probably the time to remove it (happy to put it back in if we want to keep it BETA).
## How was this patch tested?
Python documentation built locally as HTML and text and verified output.
Author: Holden Karau <holden@us.ibm.com>
Closes#12883 from holdenk/SPARK-15106-add-pyspark-package-doc-for-ml.
## What changes were proposed in this pull request?
PySpark ML Params setter code clean up.
For examples,
```setInputCol``` can be simplified from
```
self._set(inputCol=value)
return self
```
to:
```
return self._set(inputCol=value)
```
This is a pretty big sweeps, and we cleaned wherever possible.
## How was this patch tested?
Exist unit tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12749 from yanboliang/spark-14971.
## What changes were proposed in this pull request?
This PR is an update for [https://github.com/apache/spark/pull/12738] which:
* Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
* Various fixes for bugs found
* This includes changing classes taking weightCol to treat unset and empty String Param values the same way.
Defaults changed:
* Scala
* LogisticRegression: weightCol defaults to not set (instead of empty string)
* StringIndexer: labels default to not set (instead of empty array)
* GeneralizedLinearRegression:
* maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
* weightCol defaults to not set (instead of empty string)
* LinearRegression: weightCol defaults to not set (instead of empty string)
* Python
* MultilayerPerceptron: layers default to not set (instead of [1,1])
* ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)
## How was this patch tested?
Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test.
Author: Joseph K. Bradley <joseph@databricks.com>
Author: yinxusen <yinxusen@gmail.com>
Closes#12816 from jkbradley/yinxusen-SPARK-14931.
#### What changes were proposed in this pull request?
This PR removes three methods the were deprecated in 1.6.0:
- `PortableDataStream.close()`
- `LinearRegression.weights`
- `LogisticRegression.weights`
The rationale for doing this is that the impact is small and that Spark 2.0 is a major release.
#### How was this patch tested?
Compilation succeded.
Author: Herman van Hovell <hvanhovell@questtec.nl>
Closes#12732 from hvanhovell/SPARK-14952.
## What changes were proposed in this pull request?
This PR fixes the bug that generates infinite distances between word vectors. For example,
Before this PR, we have
```
val synonyms = model.findSynonyms("who", 40)
```
will give the following results:
```
to Infinity
and Infinity
that Infinity
with Infinity
```
With this PR, the distance between words is a value between 0 and 1, as follows:
```
scala> model.findSynonyms("who", 10)
res0: Array[(String, Double)] = Array((Harvard-educated,0.5253688097000122), (ex-SAS,0.5213794708251953), (McMutrie,0.5187736749649048), (fellow,0.5166833400726318), (businessman,0.5145374536514282), (American-born,0.5127736330032349), (British-born,0.5062344074249268), (gray-bearded,0.5047978162765503), (American-educated,0.5035858750343323), (mentored,0.49849334359169006))
scala> model.findSynonyms("king", 10)
res1: Array[(String, Double)] = Array((queen,0.6787897944450378), (prince,0.6786158084869385), (monarch,0.659771203994751), (emperor,0.6490438580513), (goddess,0.643266499042511), (dynasty,0.635733425617218), (sultan,0.6166239380836487), (pharaoh,0.6150713562965393), (birthplace,0.6143025159835815), (empress,0.6109727025032043))
scala> model.findSynonyms("queen", 10)
res2: Array[(String, Double)] = Array((princess,0.7670737504959106), (godmother,0.6982434988021851), (raven-haired,0.6877717971801758), (swan,0.684934139251709), (hunky,0.6816608309745789), (Titania,0.6808111071586609), (heroine,0.6794036030769348), (king,0.6787897944450378), (diva,0.67848801612854), (lip-synching,0.6731793284416199))
```
### There are two places changed in this PR:
- Normalize the word vector to avoid overflow when calculating inner product between word vectors. This also simplifies the distance calculation, since the word vectors only need to be normalized once.
- Scale the learning rate by number of iteration, to be consistent with Google Word2Vec implementation
## How was this patch tested?
Use word2vec to train text corpus, and run model.findSynonyms() to get the distances between word vectors.
Author: Junyang <fly.shenjy@gmail.com>
Author: flyskyfly <fly.shenjy@gmail.com>
Closes#11812 from flyjy/TVec.
## What changes were proposed in this pull request?
As discussed in #12660, this PR renames
* intermediateRDDStorageLevel -> intermediateStorageLevel
* finalRDDStorageLevel -> finalStorageLevel
The argument name in `ALS.train` will be addressed in SPARK-15027.
## How was this patch tested?
Existing unit tests.
Author: Xiangrui Meng <meng@databricks.com>
Closes#12803 from mengxr/SPARK-14412.
`mllib` `ALS` supports `setIntermediateRDDStorageLevel` and `setFinalRDDStorageLevel`. This PR adds these as Params in `ml` `ALS`. They are put in group **expertParam** since few users will need them.
## How was this patch tested?
New test cases in `ALSSuite` and `tests.py`.
cc yanboliang jkbradley sethah rishabhbhardwaj
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#12660 from MLnick/SPARK-14412-als-storage-params.
## What changes were proposed in this pull request?
Per discussion on [https://github.com/apache/spark/pull/12604], this removes ML persistence for Python tuning (TrainValidationSplit, CrossValidator, and their Models) since they do not handle nesting easily. This support should be re-designed and added in the next release.
## How was this patch tested?
Removed unit test elements saving and loading the tuning algorithms, but kept tests to save and load their bestModel fields.
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12782 from jkbradley/remove-python-tuning-saveload.
## What changes were proposed in this pull request?
pyspark.ml API for LDA
* LDA, LDAModel, LocalLDAModel, DistributedLDAModel
* includes persistence
This replaces [https://github.com/apache/spark/pull/10242]
## How was this patch tested?
* doc test for LDA, including Param setters
* unit test for persistence
Author: Joseph K. Bradley <joseph@databricks.com>
Author: Jeff Zhang <zjffdu@apache.org>
Closes#12723 from jkbradley/zjffdu-SPARK-11940.
## What changes were proposed in this pull request?
support avgMetrics in CrossValidatorModel with Python
## How was this patch tested?
Doctest and `test_save_load` in `pyspark/ml/test.py`
[JIRA](https://issues.apache.org/jira/browse/SPARK-12810)
Author: Kai Jiang <jiangkai@gmail.com>
Closes#12464 from vectorijk/spark-12810.
## What changes were proposed in this pull request?
Since [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574) breaks behavior of ```HashingTF```, we should try to enforce good practice by removing the "native" hashAlgorithm option in spark.ml and pyspark.ml. We can leave spark.mllib and pyspark.mllib alone.
## How was this patch tested?
Unit tests.
cc jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12702 from yanboliang/spark-14899.
## What changes were proposed in this pull request?
Before, spark.ml GaussianMixtureModel used the spark.mllib MultivariateGaussian in its public API. This was added after 1.6, so we can modify this API without breaking APIs.
This PR copies MultivariateGaussian to mllib-local in spark.ml, with a few changes:
* Renamed fields to match numpy, scipy: mu => mean, sigma => cov
This PR then uses the spark.ml MultivariateGaussian in the spark.ml GaussianMixtureModel, which involves:
* Modifying the constructor
* Adding a computeProbabilities method
Also:
* Added EPSILON to mllib-local for use in MultivariateGaussian
## How was this patch tested?
Existing unit tests
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12593 from jkbradley/sparkml-gmm-fix.
## What changes were proposed in this pull request?
SPARK-14071 changed MLWritable.write to be a property. This reverts that change since there was not a good way to make MLReadable.read appear to be a property.
## How was this patch tested?
existing unit tests
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12671 from jkbradley/revert-MLWritable-write-py.
## What changes were proposed in this pull request?
We deprecated ```runs``` of mllib.KMeans in Spark 1.6 (SPARK-11358). In 2.0, we will make it no effect (with warning messages). We did not remove ```setRuns/getRuns``` for better binary compatibility.
This PR change `runs` which are appeared at the public API. Usage inside of ```KMeans.runAlgorithm()``` will be resolved at #10806.
## How was this patch tested?
Existing unit tests.
cc jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12608 from yanboliang/spark-11559.
## What changes were proposed in this pull request?
As the discussion at [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574), ```HashingTF``` should support MurmurHash3 and make it as the default hash algorithm. We should also expose set/get API for ```hashAlgorithm```, then users can choose the hash method.
Note: The problem that ```mllib.feature.HashingTF``` behaves differently between Scala/Java and Python will be resolved in the followup work.
## How was this patch tested?
unit tests.
cc jkbradley MLnick
Author: Yanbo Liang <ybliang8@gmail.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12498 from yanboliang/spark-10574.
## What changes were proposed in this pull request?
Removed instances of JavaMLWriter, JavaMLReader appearing in public Python API docs
## How was this patch tested?
n/a
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12542 from jkbradley/javamlwriter-doc.
## What changes were proposed in this pull request?
Add Python API in ML for GaussianMixture
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Add doctest and test cases are the same as mllib Python tests
./dev/lint-python
PEP8 checks passed.
rm -rf _build/*
pydoc checks passed.
./python/run-tests --python-executables=python2.7 --modules=pyspark-ml
Running PySpark tests. Output is in /Users/mwang/spark_ws_0904/python/unit-tests.log
Will test against the following Python executables: ['python2.7']
Will test the following Python modules: ['pyspark-ml']
Finished test(python2.7): pyspark.ml.evaluation (18s)
Finished test(python2.7): pyspark.ml.clustering (40s)
Finished test(python2.7): pyspark.ml.classification (49s)
Finished test(python2.7): pyspark.ml.recommendation (44s)
Finished test(python2.7): pyspark.ml.feature (64s)
Finished test(python2.7): pyspark.ml.regression (45s)
Finished test(python2.7): pyspark.ml.tuning (30s)
Finished test(python2.7): pyspark.ml.tests (56s)
Tests passed in 106 seconds
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#12402 from wangmiao1981/gmm.
## What changes were proposed in this pull request?
Removed expectedType arg from PySpark Param __init__, as suggested by the JIRA.
## How was this patch tested?
Manually looked through all places that use Param. Compiled and ran all ML PySpark test cases before and after the fix.
Author: Jason Lee <cjlee@us.ibm.com>
Closes#12581 from jasoncl/SPARK-14768.
## What changes were proposed in this pull request?
#11663 adds type conversion functionality for parameters in Pyspark. This PR find out the omissive ```Param``` that did not pass corresponding ```TypeConverter``` argument and fix them. After this PR, all params in pyspark/ml/ used ```TypeConverter```.
## How was this patch tested?
Existing tests.
cc jkbradley sethah
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12529 from yanboliang/typeConverter.
## What changes were proposed in this pull request?
#11939 make Python param setters use the `_set` method. This PR fix omissive ones.
## How was this patch tested?
Existing tests.
cc jkbradley sethah
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12531 from yanboliang/setters-omissive.
## What changes were proposed in this pull request?
This patch provides a first cut of python APIs for structured streaming. This PR provides the new classes:
- ContinuousQuery
- Trigger
- ProcessingTime
in pyspark under `pyspark.sql.streaming`.
In addition, it contains the new methods added under:
- `DataFrameWriter`
a) `startStream`
b) `trigger`
c) `queryName`
- `DataFrameReader`
a) `stream`
- `DataFrame`
a) `isStreaming`
This PR doesn't contain all methods exposed for `ContinuousQuery`, for example:
- `exception`
- `sourceStatuses`
- `sinkStatus`
They may be added in a follow up.
This PR also contains some very minor doc fixes in the Scala side.
## How was this patch tested?
Python doc tests
TODO:
- [ ] verify Python docs look good
Author: Burak Yavuz <brkyvz@gmail.com>
Author: Burak Yavuz <burak@databricks.com>
Closes#12320 from brkyvz/stream-python.
## What changes were proposed in this pull request?
PySpark Param constructors need to pass the TypeConverter argument by name, partly to make sure it is not mistaken for the expectedType arg and partly because we will remove the expectedType arg in 2.1. In several places, this is not being done correctly.
This PR changes all usages in pyspark/ml/ to keyword args.
## How was this patch tested?
Existing unit tests. I will not test type conversion for every Param unless we really think it necessary.
Also, if you start the PySpark shell and import classes (e.g., pyspark.ml.feature.StandardScaler), then you no longer get this warning:
```
/Users/josephkb/spark/python/pyspark/ml/param/__init__.py:58: UserWarning: expectedType is deprecated and will be removed in 2.1. Use typeConverter instead, as a keyword argument.
"Use typeConverter instead, as a keyword argument.")
```
That warning came from the typeConverter argument being passes as the expectedType arg by mistake.
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12480 from jkbradley/typeconverter-fix.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-14440
Remove
* PipelineMLWriter
* PipelineMLReader
* PipelineModelMLWriter
* PipelineModelMLReader
and modify comments.
## How was this patch tested?
test with unit test.
Author: Xusen Yin <yinxusen@gmail.com>
Closes#12216 from yinxusen/SPARK-14440.
## What changes were proposed in this pull request?
Added windowSize getter/setter to ML/MLlib
## How was this patch tested?
Added test cases in tests.py under both ML and MLlib
Author: Jason Lee <cjlee@us.ibm.com>
Closes#12428 from jasoncl/SPARK-14564.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-14306
Add PySpark OneVsRest save/load supports.
## How was this patch tested?
Test with Python unit test.
Author: Xusen Yin <yinxusen@gmail.com>
Closes#12439 from yinxusen/SPARK-14306-0415.
## What changes were proposed in this pull request?
Python spark.ml Identifiable classes use UIDs of type str, but they should use unicode (in Python 2.x) to match Java. This could be a problem if someone created a class in Java with odd unicode characters, saved it, and loaded it in Python.
This PR: Use unicode everywhere in Python.
## How was this patch tested?
Updated persistence unit test to check uid type
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12368 from jkbradley/python-uid-unicode.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-7861
Add PySpark OneVsRest. I implement it with Python since it's a meta-pipeline.
## How was this patch tested?
Test with doctest.
Author: Xusen Yin <yinxusen@gmail.com>
Closes#12124 from yinxusen/SPARK-14306-7861.
## What changes were proposed in this pull request?
Param setters in python previously accessed the _paramMap directly to update values. The `_set` method now implements type checking, so it should be used to update all parameters. This PR eliminates all direct accesses to `_paramMap` besides the one in the `_set` method to ensure type checking happens.
Additional changes:
* [SPARK-13068](https://github.com/apache/spark/pull/11663) missed adding type converters in evaluation.py so those are done here
* An incorrect `toBoolean` type converter was used for StringIndexer `handleInvalid` param in previous PR. This is fixed here.
## How was this patch tested?
Existing unit tests verify that parameters are still set properly. No new functionality is actually added in this PR.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#11939 from sethah/SPARK-14104.
## What changes were proposed in this pull request?
The default stopwords were a Java object. They are no longer.
## How was this patch tested?
Unit test which failed before the fix
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12422 from jkbradley/pyspark-stopwords.
## What changes were proposed in this pull request?
PySpark ml GBTClassifier, Regressor support export/import.
## How was this patch tested?
Doc test.
cc jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12383 from yanboliang/spark-14374.
## What changes were proposed in this pull request?
This fix tries to add binary toggle Param to PySpark HashingTF in ML & MLlib. If this toggle is set, then all non-zero counts will be set to 1.
Note: This fix (SPARK-14238) is extended from SPARK-13963 where Scala implementation was done.
## How was this patch tested?
This fix adds two tests to cover the code changes. One for HashingTF in PySpark's ML and one for HashingTF in PySpark's MLLib.
Author: Yong Tang <yong.tang.github@outlook.com>
Closes#12079 from yongtang/SPARK-14238.
Added binary toggle param to CountVectorizer feature transformer in PySpark.
Created a unit test for using CountVectorizer with the binary toggle on.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#12308 from BryanCutler/binary-param-python-CountVectorizer-SPARK-13967.
## What changes were proposed in this pull request?
The PyDoc Makefile used "=" rather than "?=" for setting env variables so it overwrote the user values. This ignored the environment variables we set for linting allowing warnings through. This PR also fixes the warnings that had been introduced.
## How was this patch tested?
manual local export & make
Author: Holden Karau <holden@us.ibm.com>
Closes#12336 from holdenk/SPARK-14573-fix-pydoc-makefile.
Currently, JavaWrapper is only a wrapper class for pipeline classes that have Params and JavaCallable is a separate mixin that provides methods to make Java calls. This change simplifies the class structure and to define the Java wrapper in a plain base class along with methods to make Java calls. Also, renames Java wrapper classes to better reflect their purpose.
Ran existing Python ml tests and generated documentation to test this change.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#12304 from BryanCutler/pyspark-cleanup-JavaWrapper-SPARK-14472.
## What changes were proposed in this pull request?
Python API for GeneralizedLinearRegression
JIRA: https://issues.apache.org/jira/browse/SPARK-13597
## How was this patch tested?
The patch is tested with Python doctest.
Author: Kai Jiang <jiangkai@gmail.com>
Closes#11468 from vectorijk/spark-13597.
## What changes were proposed in this pull request?
Cleanups to documentation. No changes to code.
* GBT docs: Move Scala doc for private object GradientBoostedTrees to public docs for GBTClassifier,Regressor
* GLM regParam: needs doc saying it is for L2 only
* TrainValidationSplitModel: add .. versionadded:: 2.0.0
* Rename “_transformer_params_from_java” to “_transfer_params_from_java”
* LogReg Summary classes: “probability” col should not say “calibrated”
* LR summaries: coefficientStandardErrors —> document that intercept stderr comes last. Same for t,p-values
* approxCountDistinct: Document meaning of “rsd" argument.
* LDA: note which params are for online LDA only
## How was this patch tested?
Doc build
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#12266 from jkbradley/ml-doc-cleanups.
## What changes were proposed in this pull request?
A new column VarianceCol has been added to DecisionTreeRegressor in ML scala code.
This patch adds the corresponding Python API, HasVarianceCol, to class DecisionTreeRegressor.
## How was this patch tested?
./dev/lint-python
PEP8 checks passed.
rm -rf _build/*
pydoc checks passed.
./python/run-tests --python-executables=python2.7 --modules=pyspark-ml
Running PySpark tests. Output is in /Users/mwang/spark_ws_0904/python/unit-tests.log
Will test against the following Python executables: ['python2.7']
Will test the following Python modules: ['pyspark-ml']
Finished test(python2.7): pyspark.ml.evaluation (12s)
Finished test(python2.7): pyspark.ml.clustering (18s)
Finished test(python2.7): pyspark.ml.classification (30s)
Finished test(python2.7): pyspark.ml.recommendation (28s)
Finished test(python2.7): pyspark.ml.feature (43s)
Finished test(python2.7): pyspark.ml.regression (31s)
Finished test(python2.7): pyspark.ml.tuning (19s)
Finished test(python2.7): pyspark.ml.tests (34s)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#12116 from wangmiao1981/fix_api.
## What changes were proposed in this pull request?
supporting `RandomForest{Classifier, Regressor}` save/load for Python API.
[JIRA](https://issues.apache.org/jira/browse/SPARK-14373)
## How was this patch tested?
doctest
Author: Kai Jiang <jiangkai@gmail.com>
Closes#12238 from vectorijk/spark-14373.
## What changes were proposed in this pull request?
Adding Python API for training summaries of LogisticRegression and LinearRegression in PySpark ML.
## How was this patch tested?
Added unit tests to exercise the api calls for the summary classes. Also, manually verified values are expected and match those from Scala directly.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#11621 from BryanCutler/pyspark-ml-summary-SPARK-13430.
## What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/SPARK-13786
Add save/load for Python CrossValidator/Model and TrainValidationSplit/Model.
## How was this patch tested?
Test with Python doctest.
Author: Xusen Yin <yinxusen@gmail.com>
Closes#12020 from yinxusen/SPARK-13786.
## What changes were proposed in this pull request?
PySpark ml.clustering BisectingKMeans support export/import
## How was this patch tested?
doc test.
cc jkbradley
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#12112 from yanboliang/spark-14305.
1.Implement LossFunction trait and implement squared error and cross entropy
loss with it
2.Implement unit test for gradient and loss
3.Implement InPlace trait and in-place layer evaluation
4.Refactor interface for ActivationFunction
5.Update of Layer and LayerModel interfaces
6.Fix random weights assignment
7.Implement memory allocation by MLP model instead of individual layers
These features decreased the memory usage and increased flexibility of
internal API.
Author: Alexander Ulanov <nashb@yandex.ru>
Author: avulanov <avulanov@gmail.com>
Closes#9229 from avulanov/mlp-refactoring.
## What changes were proposed in this pull request?
Feature importances are exposed in the python API for GBTs.
Other changes:
* Update the random forest feature importance documentation to not repeat decision tree docstring and instead place a reference to it.
## How was this patch tested?
Python doc tests were updated to validate GBT feature importance.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#12056 from sethah/Pyspark_GBT_feature_importance.
## What changes were proposed in this pull request?
```MultilayerPerceptronClassifier``` supports save/load for Python API.
## How was this patch tested?
doctest.
cc mengxr jkbradley yinxusen
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#11952 from yanboliang/spark-14152.
Add property to MLWritable.write method, so we can use .write instead of .write()
Add a new test to ml/test.py to check whether the write is a property.
./python/run-tests --python-executables=python2.7 --modules=pyspark-ml
Will test against the following Python executables: ['python2.7']
Will test the following Python modules: ['pyspark-ml']
Finished test(python2.7): pyspark.ml.evaluation (11s)
Finished test(python2.7): pyspark.ml.clustering (16s)
Finished test(python2.7): pyspark.ml.classification (24s)
Finished test(python2.7): pyspark.ml.recommendation (24s)
Finished test(python2.7): pyspark.ml.feature (39s)
Finished test(python2.7): pyspark.ml.regression (26s)
Finished test(python2.7): pyspark.ml.tuning (15s)
Finished test(python2.7): pyspark.ml.tests (30s)
Tests passed in 55 seconds
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#11945 from wangmiao1981/fix_property.
## What changes were proposed in this pull request?
Added MLReadable and MLWritable to Decision Tree Classifier and Regressor. Added doctests.
## How was this patch tested?
Python Unit tests. Tests added to check persistence in DecisionTreeClassifier and DecisionTreeRegressor.
Author: GayathriMurali <gayathri.m.softie@gmail.com>
Closes#11892 from GayathriMurali/SPARK-13949.
## What changes were proposed in this pull request?
GBTs in pyspark previously had seed parameters, but they could not be passed as keyword arguments through the class constructor. This patch adds seed as a keyword argument and also sets default value.
## How was this patch tested?
Doc tests were updated to pass a random seed through the GBTClassifier and GBTRegressor constructors.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#11944 from sethah/SPARK-14107.