spark-instrumented-optimizer

History

Peng c8b612decb [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference ## What changes were proposed in this pull request? For feature selection method ChiSquareSelector, it is based on the ChiSquareTestResult.statistic (ChiSqure value) to select the features. It select the features with the largest ChiSqure value. But the Degree of Freedom (df) of ChiSqure value is different in Statistics.chiSqTest(RDD), and for different df, you cannot base on ChiSqure value to select features. So we change statistic to pValue for SelectKBest and SelectPercentile ## How was this patch tested? change existing test Author: Peng <peng.meng@intel.com> Closes #15444 from mpjlu/chisqure-bug.		2016-10-14 12:48:57 +01:00
..
linalg	[SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract	2016-10-03 17:57:54 -07:00
param	[SPARK-17057][ML] ProbabilisticClassifierModels' thresholds should have at most one 0	2016-09-24 08:15:55 +01:00
__init__.py	[SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide	2016-07-15 13:38:23 -07:00
base.py	[SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python	2016-06-13 19:59:53 -07:00
classification.py	[SPARK-17745][ML][PYSPARK] update NB python api - add weight col parameter	2016-10-12 19:52:57 -07:00
clustering.py	[SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means\|\| default init steps from 5 to 2.	2016-09-11 13:47:13 +01:00
common.py	[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch	2016-10-03 14:12:03 -07:00
evaluation.py	[SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/load	2016-10-14 04:17:03 -07:00
feature.py	[SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference	2016-10-14 12:48:57 +01:00
pipeline.py	[SPARK-15018][PYSPARK][ML] Improve handling of PySpark Pipeline when used without stages	2016-08-19 23:46:36 -07:00
recommendation.py	[SPARK-15741][PYSPARK][ML] Pyspark cleanup of set default seed to None	2016-06-21 11:43:25 -07:00
regression.py	[SPARK-17281][ML][MLLIB] Add treeAggregateDepth parameter for AFTSurvivalRegression	2016-09-22 04:35:54 -07:00
tests.py	[SPARK-15957][FOLLOW-UP][ML][PYSPARK] Add Python API for RFormula forceIndexLabel.	2016-10-13 19:44:24 -07:00
tuning.py	[SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics	2016-08-03 04:18:28 -07:00
util.py	[SPARK-15113][PYSPARK][ML] Add missing num features num classes	2016-08-22 12:21:22 +02:00
wrapper.py	[SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python	2016-06-13 19:59:53 -07:00