spark-instrumented-optimizer/project
Yanbo Liang 1e44dd0044 [SPARK-3181][ML] Implement huber loss for LinearRegression.
## What changes were proposed in this pull request?
MLlib ```LinearRegression``` supports _huber_ loss addition to _leastSquares_ loss. The huber loss objective function is:
![image](https://user-images.githubusercontent.com/1962026/29554124-9544d198-8750-11e7-8afa-33579ec419d5.png)
Refer Eq.(6) and Eq.(8) in [A robust hybrid of lasso and ridge regression](http://statweb.stanford.edu/~owen/reports/hhu.pdf). This objective is jointly convex as a function of (w, σ) ∈ R × (0,∞), we can use L-BFGS-B to solve it.

The current implementation is a straight forward porting for Python scikit-learn [```HuberRegressor```](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html). There are some differences:
* We use mean loss (```lossSum/weightSum```), but sklearn uses total loss (```lossSum```).
* We multiply the loss function and L2 regularization by 1/2. It does not affect the result if we multiply the whole formula by a factor, we just keep consistent with _leastSquares_ loss.

So if fitting w/o regularization, MLlib and sklearn produce the same output. If fitting w/ regularization, MLlib should set ```regParam``` divide by the number of instances to match the output of sklearn.

## How was this patch tested?
Unit tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #19020 from yanboliang/spark-3181.
2017-12-13 21:19:14 -08:00
..
build.properties [SPARK-21709][BUILD] sbt 0.13.16 and some plugin updates 2017-08-12 20:01:20 +01:00
MimaBuild.scala [SPARK-22485][BUILD] Use exclude[Problem] instead excludePackage in MiMa 2017-11-09 16:40:19 -08:00
MimaExcludes.scala [SPARK-3181][ML] Implement huber loss for LinearRegression. 2017-12-13 21:19:14 -08:00
plugins.sbt [SPARK-21708][BUILD] update some sbt plugins 2017-10-31 08:16:54 +00:00
SparkBuild.scala [SPARK-18278][SCHEDULER] Spark on Kubernetes - Basic Scheduler Backend 2017-11-28 23:02:09 -08:00