spark-instrumented-optimizer/python/pyspark/ml
Xusen Yin a6428292f7 [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update
## What changes were proposed in this pull request?

This PR is an update for [https://github.com/apache/spark/pull/12738] which:
* Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
* Various fixes for bugs found
  * This includes changing classes taking weightCol to treat unset and empty String Param values the same way.

Defaults changed:
* Scala
 * LogisticRegression: weightCol defaults to not set (instead of empty string)
 * StringIndexer: labels default to not set (instead of empty array)
 * GeneralizedLinearRegression:
   * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
   * weightCol defaults to not set (instead of empty string)
 * LinearRegression: weightCol defaults to not set (instead of empty string)
* Python
 * MultilayerPerceptron: layers default to not set (instead of [1,1])
 * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)

## How was this patch tested?

Generic unit test.  Manually tested that unit test by changing defaults and verifying that broke the test.

Author: Joseph K. Bradley <joseph@databricks.com>
Author: yinxusen <yinxusen@gmail.com>

Closes #12816 from jkbradley/yinxusen-SPARK-14931.
2016-05-01 12:29:01 -07:00
..
param [SPARK-14768][ML][PYSPARK] removed expectedType from Param __init__() 2016-04-25 15:32:11 +02:00
__init__.py [SPARK-13038][PYSPARK] Add load/save to pipeline 2016-03-16 13:49:40 -07:00
base.py [SPARK-13038][PYSPARK] Add load/save to pipeline 2016-03-16 13:49:40 -07:00
classification.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00
clustering.py [SPARK-11940][PYSPARK][ML] Python API for ml.clustering.LDA PR2 2016-04-29 10:42:52 -07:00
evaluation.py [SPARK-14555] First cut of Python API for Structured Streaming 2016-04-20 10:32:01 -07:00
feature.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00
pipeline.py [SPARK-14555] First cut of Python API for Structured Streaming 2016-04-20 10:32:01 -07:00
recommendation.py [SPARK-14412][.2][ML] rename *RDDStorageLevel to *StorageLevel in ml.ALS 2016-04-30 00:41:28 -07:00
regression.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00
tests.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00
tuning.py [SPARK-13786][ML][PYTHON] Removed save/load for python tuning 2016-04-29 20:51:24 -07:00
util.py [SPARK-14903][SPARK-14071][ML][PYTHON] Revert : MLWritable.write property 2016-04-26 12:00:57 -07:00
wrapper.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00