spark-instrumented-optimizer

History

Xusen Yin a6428292f7 [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update ## What changes were proposed in this pull request? This PR is an update for [https://github.com/apache/spark/pull/12738] which: * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side * Various fixes for bugs found * This includes changing classes taking weightCol to treat unset and empty String Param values the same way. Defaults changed: * Scala * LogisticRegression: weightCol defaults to not set (instead of empty string) * StringIndexer: labels default to not set (instead of empty array) * GeneralizedLinearRegression: * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver) * weightCol defaults to not set (instead of empty string) * LinearRegression: weightCol defaults to not set (instead of empty string) * Python * MultilayerPerceptron: layers default to not set (instead of [1,1]) * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set) ## How was this patch tested? Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test. Author: Joseph K. Bradley <joseph@databricks.com> Author: yinxusen <yinxusen@gmail.com> Closes #12816 from jkbradley/yinxusen-SPARK-14931.		2016-05-01 12:29:01 -07:00
..
param	[SPARK-14768][ML][PYSPARK] removed expectedType from Param __init__()	2016-04-25 15:32:11 +02:00
__init__.py	[SPARK-13038][PYSPARK] Add load/save to pipeline	2016-03-16 13:49:40 -07:00
base.py	[SPARK-13038][PYSPARK] Add load/save to pipeline	2016-03-16 13:49:40 -07:00
classification.py	[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update	2016-05-01 12:29:01 -07:00
clustering.py	[SPARK-11940][PYSPARK][ML] Python API for ml.clustering.LDA PR2	2016-04-29 10:42:52 -07:00
evaluation.py	[SPARK-14555] First cut of Python API for Structured Streaming	2016-04-20 10:32:01 -07:00
feature.py	[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update	2016-05-01 12:29:01 -07:00
pipeline.py	[SPARK-14555] First cut of Python API for Structured Streaming	2016-04-20 10:32:01 -07:00
recommendation.py	[SPARK-14412][.2][ML] rename RDDStorageLevel to StorageLevel in ml.ALS	2016-04-30 00:41:28 -07:00
regression.py	[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update	2016-05-01 12:29:01 -07:00
tests.py	[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update	2016-05-01 12:29:01 -07:00
tuning.py	[SPARK-13786][ML][PYTHON] Removed save/load for python tuning	2016-04-29 20:51:24 -07:00
util.py	[SPARK-14903][SPARK-14071][ML][PYTHON] Revert : MLWritable.write property	2016-04-26 12:00:57 -07:00
wrapper.py	[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update	2016-05-01 12:29:01 -07:00