[SPARK-13545][MLLIB][PYSPARK] Make MLlib LogisticRegressionWithLBFGS's default parameters consistent in Scala and Python

## What changes were proposed in this pull request? * The default value of ```regParam``` of PySpark MLlib ```LogisticRegressionWithLBFGS``` should be consistent with Scala which is ```0.0```. (This is also consistent with ML ```LogisticRegression```.) * BTW, if we use a known updater(L1 or L2) for binary classification, ```LogisticRegressionWithLBFGS``` will call the ML implementation. We should update the API doc to clarifying ```numCorrections``` will have no effect if we fall into that route. * Make a pass for all parameters of ```LogisticRegressionWithLBFGS```, others are set properly. cc mengxr dbtsai ## How was this patch tested? No new tests, it should pass all current tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #11424 from yanboliang/spark-13545.
2016-02-29 00:55:51 -08:00 · 2016-02-29 00:55:51 -08:00 · d81a71357e
parent dd3b5455c6
commit d81a71357e
2 changed files with 9 additions and 3 deletions
--- a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
@ -408,6 +408,10 @@ class LogisticRegressionWithLBFGS
   * defaults to the mllib implementation. If more than two classes
   * or feature scaling is disabled, always uses mllib implementation.
   * Uses user provided weights.
+   *
+   * In the ml LogisticRegression implementation, the number of corrections
+   * used in the LBFGS update can not be configured. So `optimizer.setNumCorrections()`
+   * will have no effect if we fall into that route.
   */
  override def run(input: RDD[LabeledPoint], initialWeights: Vector): LogisticRegressionModel = {
    run(input, initialWeights, userSuppliedWeights = true)
--- a/python/pyspark/mllib/classification.py
+++ b/python/pyspark/mllib/classification.py
@ -326,7 +326,7 @@ class LogisticRegressionWithLBFGS(object):
    """
    @classmethod
    @since('1.2.0')
-    def train(cls, data, iterations=100, initialWeights=None, regParam=0.01, regType="l2",
+    def train(cls, data, iterations=100, initialWeights=None, regParam=0.0, regType="l2",
              intercept=False, corrections=10, tolerance=1e-6, validateData=True, numClasses=2):
        """
        Train a logistic regression model on the given data.
@ -341,7 +341,7 @@ class LogisticRegressionWithLBFGS(object):
          (default: None)
        :param regParam:
          The regularizer parameter.
-          (default: 0.01)
+          (default: 0.0)
        :param regType:
          The type of regularizer used for training our model.
          Allowed values:
@ -356,7 +356,9 @@ class LogisticRegressionWithLBFGS(object):
          (default: False)
        :param corrections:
          The number of corrections used in the LBFGS update.
-          (default: 10)
+          If a known updater is used for binary classification,
+          it calls the ml implementation and this parameter will
+          have no effect. (default: 10)
        :param tolerance:
          The convergence tolerance of iterations for L-BFGS.
          (default: 1e-6)