spark-instrumented-optimizer/python/pyspark/mllib
Peng c8b612decb
[SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference
## What changes were proposed in this pull request?

For feature selection method ChiSquareSelector, it is based on the ChiSquareTestResult.statistic (ChiSqure value) to select the features. It select the features with the largest ChiSqure value. But the Degree of Freedom (df) of ChiSqure value is different in Statistics.chiSqTest(RDD), and for different df, you cannot base on ChiSqure value to select features.

So we change statistic to pValue for SelectKBest and SelectPercentile

## How was this patch tested?
change existing test

Author: Peng <peng.meng@intel.com>

Closes #15444 from mpjlu/chisqure-bug.
2016-10-14 12:48:57 +01:00
..
linalg [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract 2016-10-03 17:57:54 -07:00
stat [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
__init__.py [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide 2016-07-15 13:38:23 -07:00
classification.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
clustering.py [SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init steps from 5 to 2. 2016-09-11 13:47:13 +01:00
common.py [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch 2016-10-03 14:12:03 -07:00
evaluation.py [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics 2016-06-10 10:09:19 +01:00
feature.py [SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest and SelectPercentile because of DoF difference 2016-10-14 12:48:57 +01:00
fpm.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
random.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
recommendation.py [SPARK-16348][ML][MLLIB][PYTHON] Use full classpaths for pyspark ML JVM calls 2016-07-05 17:00:24 -07:00
regression.py [MINOR] Fix Typos 'a -> an' 2016-05-26 22:39:14 -07:00
tests.py [SPARK-17587][PYTHON][MLLIB] SparseVector __getitem__ should follow __getitem__ contract 2016-10-03 17:57:54 -07:00
tree.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
util.py [MINOR][PYSPARK][DOCS] Fix examples in PySpark documentation 2016-09-28 06:19:04 -04:00