## What changes were proposed in this pull request?
Remove ML methods we deprecated in 2.1.
## How was this patch tested?
Existing tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#17867 from yanboliang/spark-20606.
## What changes were proposed in this pull request?
Added a check for for the number of defined values. Previously the argmax function assumed that at least one value was defined if the vector size was greater than zero.
## How was this patch tested?
Tests were added to the existing VectorsSuite to cover this case.
Author: Jon McLean <jon.mclean@atsid.com>
Closes#17877 from jonmclean/vectorArgmaxIndexBug.
This PR is a `DataFrame` version of #17742 for [SPARK-11968](https://issues.apache.org/jira/browse/SPARK-11968), for improving the performance of `recommendAll` methods.
## How was this patch tested?
Existing unit tests.
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17845 from MLnick/ml-als-perf.
The recommendForAll of MLLIB ALS is very slow.
GC is a key problem of the current method.
The task use the following code to keep temp result:
val output = new Array[(Int, (Int, Double))](m*n)
m = n = 4096 (default value, no method to set)
so output is about 4k * 4k * (4 + 4 + 8) = 256M. This is a large memory and cause serious GC problem, and it is frequently OOM.
Actually, we don't need to save all the temp result. Support we recommend topK (topK is about 10, or 20) product for each user, we only need 4k * topK * (4 + 4 + 8) memory to save the temp result.
The Test Environment:
3 workers: each work 10 core, each work 30G memory, each work 1 executor.
The Data: User 480,000, and Item 17,000
BlockSize: 1024 2048 4096 8192
Old method: 245s 332s 488s OOM
This solution: 121s 118s 117s 120s
The existing UT.
Author: Peng <peng.meng@intel.com>
Author: Peng Meng <peng.meng@intel.com>
Closes#17742 from mpjlu/OptimizeAls.
Existing test cases for `recommendForAllX` methods (added in [SPARK-19535](https://issues.apache.org/jira/browse/SPARK-19535)) test `k < num items` and `k = num items`. Technically we should also test that `k > num items` returns the same results as `k = num items`.
## How was this patch tested?
Updated existing unit tests.
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17860 from MLnick/SPARK-20596-als-rec-tests.
## What changes were proposed in this pull request?
This PR adds documentation to the ALS code.
## How was this patch tested?
Existing tests were used.
mengxr srowen
This contribution is my original work. I have the license to work on this project under the Spark project’s open source license.
Author: Daniel Li <dan@danielyli.com>
Closes#17793 from danielyli/spark-20484.
## What changes were proposed in this pull request?
Bucketizer currently requires input column to be Double, but the logic should work on any numeric data types. Many practical problems have integer/float data types, and it could get very tedious to manually cast them into Double before calling bucketizer. This PR extends bucketizer to handle all numeric types.
## How was this patch tested?
New test.
Author: Wayne Zhang <actuaryzhang@uber.com>
Closes#17840 from actuaryzhang/bucketizer.
## What changes were proposed in this pull request?
Address some minor comments for #17715:
* Put bound-constrained optimization params under expertParams.
* Update some docs.
## How was this patch tested?
Existing tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#17829 from yanboliang/spark-20047-followup.
## What changes were proposed in this pull request?
Use midpoints for split values now, and maybe later to make it weighted.
## How was this patch tested?
+ [x] add unit test.
+ [x] revise Split's unit test.
Author: Yan Facai (颜发才) <facai.yan@gmail.com>
Author: 颜发才(Yan Facai) <facai.yan@gmail.com>
Closes#17556 from facaiy/ENH/decision_tree_overflow_and_precision_in_aggregation.
## What changes were proposed in this pull request?
Fix build warnings primarily related to Breeze 0.13 operator changes, Java style problems
## How was this patch tested?
Existing tests
Author: Sean Owen <sowen@cloudera.com>
Closes#17803 from srowen/SPARK-20523.
## What changes were proposed in this pull request?
MultilayerPerceptronClassifierWrapper model should be private.
LogisticRegressionWrapper.scala rFeatures and rCoefficients should be lazy.
## How was this patch tested?
Unit tests.
Author: wangmiao1981 <wm624@hotmail.com>
Closes#17808 from wangmiao1981/lazy.
## What changes were proposed in this pull request?
Add a new section for fpm
Add Example for FPGrowth in scala and Java
updated: Rewrite transform to be more compact.
## How was this patch tested?
local doc generation.
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#17130 from hhbyyh/fpmdoc.
## What changes were proposed in this pull request?
MLlib ```LogisticRegression``` should support bound constrained optimization (only for L2 regularization). Users can add bound constraints to coefficients to make the solver produce solution in the specified range.
Under the hood, we call Breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. But in the current breeze implementation, there are some bugs in L-BFGS-B, and https://github.com/scalanlp/breeze/pull/633 fixed them. We need to upgrade dependent breeze later, and currently we use the workaround L-BFGS-B in this PR temporary for reviewing.
## How was this patch tested?
Unit tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#17715 from yanboliang/spark-20047.
## What changes were proposed in this pull request?
Pregel-based iterative algorithms with more than ~50 iterations begin to slow down and eventually fail with a StackOverflowError due to Spark's lack of support for long lineage chains.
This PR causes Pregel to checkpoint the graph periodically if the checkpoint directory is set.
This PR moves PeriodicGraphCheckpointer.scala from mllib to graphx, moves PeriodicRDDCheckpointer.scala, PeriodicCheckpointer.scala from mllib to core
## How was this patch tested?
unit tests, manual tests
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Author: ding <ding@localhost.localdomain>
Author: dding3 <ding.ding@intel.com>
Author: Michael Allman <michael@videoamp.com>
Closes#15125 from dding3/cp2_pregel.
## What changes were proposed in this pull request?
Upgrade breeze version to 0.13.1, which fixed some critical bugs of L-BFGS-B.
## How was this patch tested?
Existing unit tests.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes#17746 from yanboliang/spark-20449.
## What changes were proposed in this pull request?
This is a follow-up PR of #17478.
## How was this patch tested?
Existing tests
Author: wangmiao1981 <wm624@hotmail.com>
Closes#17754 from wangmiao1981/followup.
## What changes were proposed in this pull request?
In MultivariateOnlineSummarizer,
`add` and `merge` have check for weights and feature sizes. The checks in LR are redundant, which are removed from this PR.
## How was this patch tested?
Existing tests.
Author: wm624@hotmail.com <wm624@hotmail.com>
Closes#17478 from wangmiao1981/logit.
## What changes were proposed in this pull request?
When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical result.
BUT current implementation centralize the `coefficientMatrix` by the global coeffs means.
In fact the `coefficientMatrix` should be centralized on each feature index itself.
Because, according to the MLOR probability distribution function, it can be proven easily that:
suppose `{ w0, w1, .. w(K-1) }` make up the `coefficientMatrix`,
then `{ w0 + c, w1 + c, ... w(K - 1) + c}` will also be the equivalent solution.
`c` is an arbitrary vector of `numFeatures` dimension.
reference
https://core.ac.uk/download/pdf/6287975.pdf
So that we need to centralize the `coefficientMatrix` on each feature dimension separately.
**We can also confirm this through R library `glmnet`, that MLOR in `glmnet` always generate coefficients result that the sum of each dimension is all `zero`, when reg == 0.**
## How was this patch tested?
Tests added.
Author: WeichenXu <WeichenXu123@outlook.com>
Closes#17706 from WeichenXu123/mlor_center.
## What changes were proposed in this pull request?
Improve PrefixSpan pre-processing efficency by preventing sequences of zero in the cleaned database.
The efficiency gain is reflected in the following graph : https://postimg.org/image/9x6ireuvn/
## How was this patch tested?
Using MLlib's PrefixSpan existing tests and tests of my own on the 8 datasets shown in the graph. All
result obtained were stricly the same as the original implementation (without this change).
dev/run-tests was also runned, no error were found.
Author : Cyril de Vogelaere <cyril.devogelaeregmail.com>
Author: Syrux <pokcyril@hotmail.com>
Closes#17575 from Syrux/SPARK-20265.
## What changes were proposed in this pull request?
This PR proposes to run Spark unidoc to test Javadoc 8 build as Javadoc 8 is easily re-breakable.
There are several problems with it:
- It introduces little extra bit of time to run the tests. In my case, it took 1.5 mins more (`Elapsed :[94.8746569157]`). How it was tested is described in "How was this patch tested?".
- > One problem that I noticed was that Unidoc appeared to be processing test sources: if we can find a way to exclude those from being processed in the first place then that might significantly speed things up.
(see joshrosen's [comment](https://issues.apache.org/jira/browse/SPARK-18692?focusedCommentId=15947627&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15947627))
To complete this automated build, It also suggests to fix existing Javadoc breaks / ones introduced by test codes as described above.
There fixes are similar instances that previously fixed. Please refer https://github.com/apache/spark/pull/15999 and https://github.com/apache/spark/pull/16013
Note that this only fixes **errors** not **warnings**. Please see my observation https://github.com/apache/spark/pull/17389#issuecomment-288438704 for spurious errors by warnings.
## How was this patch tested?
Manually via `jekyll build` for building tests. Also, tested via running `./dev/run-tests`.
This was tested via manually adding `time.time()` as below:
```diff
profiles_and_goals = build_profiles + sbt_goals
print("[info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments: ",
" ".join(profiles_and_goals))
+ import time
+ st = time.time()
exec_sbt(profiles_and_goals)
+ print("Elapsed :[%s]" % str(time.time() - st))
```
produces
```
...
========================================================================
Building Unidoc API Documentation
========================================================================
...
[info] Main Java API documentation successful.
...
Elapsed :[94.8746569157]
...
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17477 from HyukjinKwon/SPARK-18692.
## What changes were proposed in this pull request?
- made `numInstances` public in GLR
- made `degreesOfFreedom` public in LR
## How was this patch tested?
reran the concerned test suites
Author: Benjamin Fradet <benjamin.fradet@gmail.com>
Closes#17431 from BenFradet/SPARK-20097.
## What changes were proposed in this pull request?
Add Locale.ROOT to internal calls to String `toLowerCase`, `toUpperCase`, to avoid inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
The change looks large but it is just adding `Locale.ROOT` (the locale with no country or language specified) to every call to these methods.
## How was this patch tested?
Existing tests.
Author: Sean Owen <sowen@cloudera.com>
Closes#17527 from srowen/SPARK-20156.
## What changes were proposed in this pull request?
This error message doesn't get properly formatted because of a missing `s`. Currently the error looks like:
```
Caused by: java.lang.IllegalArgumentException: requirement failed: indices should be one-based and in ascending order; found current=$current, previous=$previous; line="$line"
```
(note the literal `$current` instead of the interpolated value)
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Vijay Ramesh <vramesh@demandbase.com>
Closes#17572 from vijaykramesh/master.
## What changes were proposed in this pull request?
The Dataframes-based support for the correlation statistics is added in #17108. This patch adds the Python interface for it.
## How was this patch tested?
Python unit test.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes#17494 from viirya/correlation-python-api.
## What changes were proposed in this pull request?
The ML `RandomForestClassificationModel` and `RandomForestRegressionModel` were not using the estimator parent UID when being fit. This change fixes that so the models can be properly be identified with their parents.
## How was this patch tested?Existing tests.
Added check to verify that model uid matches that of the parent, then renamed `checkCopy` to `checkCopyAndUids` and verified that it was called by one test for each ML algorithm.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#17296 from BryanCutler/rfmodels-use-parent-uid-SPARK-19953.
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-20003
I was doing some test and found the issue. ml.fpm.FPGrowthModel `setMinConfidence` should always affect rules generation and transform.
Currently associationRules in FPGrowthModel is a lazy val and `setMinConfidence` in FPGrowthModel has no impact once associationRules got computed .
I try to cache the associationRules to avoid re-computation if `minConfidence` is not changed, but this makes FPGrowthModel somehow stateful. Let me know if there's any concern.
## How was this patch tested?
new unit test and I strength the unit test for model save/load to ensure the cache mechanism.
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#17336 from hhbyyh/fpmodelminconf.
## What changes were proposed in this pull request?
This is a small piece from https://github.com/apache/spark/pull/16722 which ultimately will add sample weights to decision trees. This is to allow more flexibility in testing outliers since linear models and trees behave differently.
Note: The primary author when this is committed should be sethah since this is taken from his code.
## How was this patch tested?
Existing tests
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#17501 from jkbradley/SPARK-20183.
## What changes were proposed in this pull request?
Adds SparkR API for FPGrowth: [SPARK-19825](https://issues.apache.org/jira/browse/SPARK-19825):
- `spark.fpGrowth` -model training.
- `freqItemsets` and `associationRules` methods with new corresponding generics.
- Scala helper: `org.apache.spark.ml.r. FPGrowthWrapper`
- unit tests.
## How was this patch tested?
Feature specific unit tests.
Author: zero323 <zero323@users.noreply.github.com>
Closes#17170 from zero323/SPARK-19825.
## What changes were proposed in this pull request?
Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will be added after https://github.com/apache/spark/pull/17316
## How was this patch tested?
local doc generation and example execution
Author: Yuhao Yang <yuhao.yang@intel.com>
Closes#17324 from hhbyyh/imputerdoc.
## What changes were proposed in this pull request?
Some ML Models were using `defaultCopy` which expects a default constructor, and others were not setting the parent estimator. This change fixes these by creating a new instance of the model and explicitly setting values and parent.
## How was this patch tested?
Added `MLTestingUtils.checkCopy` to the offending models to tests to verify the copy is made and parent is set.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#17326 from BryanCutler/ml-model-copy-error-SPARK-19985.
…adoc
## What changes were proposed in this pull request?
Use recommended values for row boundaries in Window's scaladoc, i.e. `Window.unboundedPreceding`, `Window.unboundedFollowing`, and `Window.currentRow` (that were introduced in 2.1.0).
## How was this patch tested?
Local build
Author: Jacek Laskowski <jacek@japila.pl>
Closes#17417 from jaceklaskowski/window-expression-scaladoc.
## What changes were proposed in this pull request?
A pyspark wrapper for spark.ml.stat.ChiSquareTest
## How was this patch tested?
unit tests
doctests
Author: Bago Amirbekian <bago@databricks.com>
Closes#17421 from MrBago/chiSquareTestWrapper.
## What changes were proposed in this pull request?
Use the new `compressed` method on matrices to store the logistic regression coefficients as sparse or dense - whichever is requires less memory.
Marked as WIP so we can add some performance test results. Basically, we should see if prediction is slower because of using a sparse matrix over a dense one. This can happen since sparse matrices do not use native BLAS operations when computing the margins.
## How was this patch tested?
Unit tests added.
Author: sethah <seth.hendrickson16@gmail.com>
Closes#17426 from sethah/SPARK-17137.
Add Python wrapper for `Imputer` feature transformer.
## How was this patch tested?
New doc tests and tweak to PySpark ML `tests.py`
Author: Nick Pentreath <nickp@za.ibm.com>
Closes#17316 from MLnick/SPARK-15040-pyspark-imputer.
## What changes were proposed in this pull request?
This patch adds the Dataframes-based support for the correlation statistics found in the `org.apache.spark.mllib.stat.correlation.Statistics`, following the design doc discussed in the JIRA ticket.
The current implementation is a simple wrapper around the `spark.mllib` implementation. Future optimizations can be implemented at a later stage.
## How was this patch tested?
```
build/sbt "testOnly org.apache.spark.ml.stat.StatisticsSuite"
```
Author: Timothy Hunter <timhunter@databricks.com>
Closes#17108 from thunterdb/19636.
## What changes were proposed in this pull request?
Several javadoc8 breaks have been introduced. This PR proposes fix those instances so that we can build Scala/Java API docs.
```
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:6: error: reference not found
[error] * <code>flatMapGroupsWithState</code> operations on {link KeyValueGroupedDataset}.
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:10: error: reference not found
[error] * Both, <code>mapGroupsWithState</code> and <code>flatMapGroupsWithState</code> in {link KeyValueGroupedDataset}
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:51: error: reference not found
[error] * {link GroupStateTimeout.ProcessingTimeTimeout}) or event time (i.e.
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:52: error: reference not found
[error] * {link GroupStateTimeout.EventTimeTimeout}).
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/streaming/GroupState.java:158: error: reference not found
[error] * Spark SQL types (see {link Encoder} for more details).
[error] ^
[error] .../spark/mllib/target/java/org/apache/spark/ml/fpm/FPGrowthParams.java:26: error: bad use of '>'
[error] * Number of partitions (>=1) used by parallel FP-growth. By default the param is not set, and
[error] ^
[error] .../spark/sql/core/src/main/java/org/apache/spark/api/java/function/FlatMapGroupsWithStateFunction.java:30: error: reference not found
[error] * {link org.apache.spark.sql.KeyValueGroupedDataset#flatMapGroupsWithState(
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:211: error: reference not found
[error] * See {link GroupState} for more details.
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:232: error: reference not found
[error] * See {link GroupState} for more details.
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:254: error: reference not found
[error] * See {link GroupState} for more details.
[error] ^
[error] .../spark/sql/core/target/java/org/apache/spark/sql/KeyValueGroupedDataset.java:277: error: reference not found
[error] * See {link GroupState} for more details.
[error] ^
[error] .../spark/core/target/java/org/apache/spark/TaskContextImpl.java:10: error: reference not found
[error] * {link TaskMetrics} & {link MetricsSystem} objects are not thread safe.
[error] ^
[error] .../spark/core/target/java/org/apache/spark/TaskContextImpl.java:10: error: reference not found
[error] * {link TaskMetrics} & {link MetricsSystem} objects are not thread safe.
[error] ^
[info] 13 errors
```
```
jekyll 3.3.1 | Error: Unidoc generation failed
```
## How was this patch tested?
Manually via `jekyll build`
Author: hyukjinkwon <gurwls223@gmail.com>
Closes#17389 from HyukjinKwon/minor-javadoc8-fix.
## What changes were proposed in this pull request?
I realized that since ChiSquare is in the package stat, it's pretty unclear if it's the hypothesis test, distribution, or what. This PR renames it to ChiSquareTest to clarify this.
## How was this patch tested?
Existing unit tests
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#17368 from jkbradley/SPARK-20039.
## What changes were proposed in this pull request?
Update docs for NaN handling in approxQuantile.
## How was this patch tested?
existing tests.
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes#17369 from zhengruifeng/doc_quantiles_nan.
## What changes were proposed in this pull request?
API documentation and collaborative filtering documentation page changes to clarify inconsistent description of ALS rank parameter.
- [DOCS] was previously: "rank is the number of latent factors in the model."
- [API] was previously: "rank - number of features to use"
This change describes rank in both places consistently as:
- "Number of features to use (also referred to as the number of latent factors)"
Author: Chris Snow <chris.snowuk.ibm.com>
Author: christopher snow <chsnow123@gmail.com>
Closes#17345 from snowch/SPARK-20011.
## What changes were proposed in this pull request?
Replaces `featuresCol` `Param` with `itemsCol`. See [SPARK-19899](https://issues.apache.org/jira/browse/SPARK-19899).
## How was this patch tested?
Manual tests. Existing unit tests.
Author: zero323 <zero323@users.noreply.github.com>
Closes#17321 from zero323/SPARK-19899.
## What changes were proposed in this pull request?
Wrapper taking and return a DataFrame
## How was this patch tested?
Copied unit tests from RDD-based API
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#17110 from jkbradley/df-hypotests.
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-13568
It is quite common to encounter missing values in data sets. It would be useful to implement a Transformer that can impute missing data points, similar to e.g. Imputer in scikit-learn.
Initially, options for imputation could include mean, median and most frequent, but we could add various other approaches, where possible existing DataFrame code can be used (e.g. for approximate quantiles etc).
Currently this PR supports imputation for Double and Vector (null and NaN in Vector).
## How was this patch tested?
new unit tests and manual test
Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Yuhao Yang <yuhao.yang@intel.com>
Author: Yuhao <yuhao.yang@intel.com>
Closes#11601 from hhbyyh/imputer.
## What changes were proposed in this pull request?
This PR is to enhance StringIndexer with NULL values handling.
Before the PR, StringIndexer will throw an exception when encounters NULL values.
With this PR:
- handleInvalid=error: Throw an exception as before
- handleInvalid=skip: Skip null values as well as unseen labels
- handleInvalid=keep: Give null values an additional index as well as unseen labels
BTW, I noticed someone was trying to solve the same problem ( #9920 ) but seems getting no progress or response for a long time. Would you mind to give me a chance to solve it ? I'm eager to help. :-)
## How was this patch tested?
new unit tests
Author: Menglong TAN <tanmenglong@renrenche.com>
Author: Menglong TAN <tanmenglong@gmail.com>
Closes#17233 from crackcell/11569_StringIndexer_NULL.
## What changes were proposed in this pull request?
This commit moved `distinct` in its intended place to avoid duplicated predictions and adds unit test covering the issue.
## How was this patch tested?
Unit tests.
Author: zero323 <zero323@users.noreply.github.com>
Closes#17283 from zero323/SPARK-19940.
Currently generating synonyms using a large model (I've tested with 3m words) is very slow. These efficiencies have sped things up for us by ~17%
I wasn't sure if such small changes were worthy of a jira, but the guidelines seemed to suggest that that is the preferred approach
## What changes were proposed in this pull request?
Address a few small issues in the findSynonyms logic:
1) remove usage of ``Array.fill`` to zero out the ``cosineVec`` array. The default float value in Scala and Java is 0.0f, so explicitly setting the values to zero is not needed
2) use Floats throughout. The conversion to Doubles before doing the ``priorityQueue`` is totally superfluous, since all the similarity computations are done using Floats anyway. Creating a second large array just serves to put extra strain on the GC
3) convert the slow ``for(i <- cosVec.indices)`` to an ugly, but faster, ``while`` loop
These efficiencies are really only apparent when working with a large model
## How was this patch tested?
Existing unit tests + some in-house tests to time the difference
cc jkbradley MLNick srowen
Author: Asher Krim <krim.asher@gmail.com>
Author: Asher Krim <krim.asher@gmail>
Closes#17263 from Krimit/fasterFindSynonyms.
## What changes were proposed in this pull request?
Port Tweedie GLM #16344 to SparkR
felixcheung yanboliang
## How was this patch tested?
new test in SparkR
Author: actuaryzhang <actuaryzhang10@gmail.com>
Closes#16729 from actuaryzhang/sparkRTweedie.
## What changes were proposed in this pull request?
Give proper syntax for Java and Python in addition to Scala.
## How was this patch tested?
Manually.
Author: Joseph K. Bradley <joseph@databricks.com>
Closes#17215 from jkbradley/write-err-msg.
## What changes were proposed in this pull request?
RandomForest R Wrapper and GBT R Wrapper return param `maxDepth` to R models.
Below 4 R wrappers are changed:
* `RandomForestClassificationWrapper`
* `RandomForestRegressionWrapper`
* `GBTClassificationWrapper`
* `GBTRegressionWrapper`
## How was this patch tested?
Test manually on my local machine.
Author: Xin Ren <iamshrek@126.com>
Closes#17207 from keypointt/SPARK-19282.