spark-instrumented-optimizer/python/pyspark/mllib
Peng, Meng b366f18496
[SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Positive Rate (FPR) test
## What changes were proposed in this pull request?

Univariate feature selection works by selecting the best features based on univariate statistical tests. False Positive Rate (FPR) is a popular univariate statistical test for feature selection. We add a chiSquare Selector based on False Positive Rate (FPR) test in this PR, like it is implemented in scikit-learn.
http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection

## How was this patch tested?

Add Scala ut

Author: Peng, Meng <peng.meng@intel.com>

Closes #14597 from mpjlu/fprChiSquare.
2016-09-21 10:17:38 +01:00
..
linalg [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
stat [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
__init__.py [SPARK-14817][ML][MLLIB][DOC] Made DataFrame-based API primary in MLlib guide 2016-07-15 13:38:23 -07:00
classification.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
clustering.py [SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means|| default init steps from 5 to 2. 2016-09-11 13:47:13 +01:00
common.py [SPARK-16348][ML][MLLIB][PYTHON] Use full classpaths for pyspark ML JVM calls 2016-07-05 17:00:24 -07:00
evaluation.py [SPARK-15823][PYSPARK][ML] Add @property for 'accuracy' in MulticlassMetrics 2016-06-10 10:09:19 +01:00
feature.py [SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Positive Rate (FPR) test 2016-09-21 10:17:38 +01:00
fpm.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
random.py [SPARK-15464][ML][MLLIB][SQL][TESTS] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code 2016-05-23 18:14:48 -07:00
recommendation.py [SPARK-16348][ML][MLLIB][PYTHON] Use full classpaths for pyspark ML JVM calls 2016-07-05 17:00:24 -07:00
regression.py [MINOR] Fix Typos 'a -> an' 2016-05-26 22:39:14 -07:00
tests.py [SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace 2016-08-19 10:11:59 +01:00
tree.py [SPARK-14812][ML][MLLIB][PYTHON] Experimental, DeveloperApi annotation audit for ML 2016-07-13 12:33:39 -07:00
util.py [SPARK-16242][MLLIB][PYSPARK] Conversion between old/new matrix columns in a DataFrame (Python) 2016-06-28 06:28:22 -07:00