[SPARK-34080][ML][PYTHON][FOLLOW-UP] Update score function in UnivariateFeatureSelector document
### What changes were proposed in this pull request? This follows up #31160 to update score function in the document. ### Why are the changes needed? Currently we use `f_classif`, `ch2`, `f_regression`, which sound to me the sklearn's naming. It is good to have it but I think it is nice if we have formal score function name with sklearn's ones. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No, only doc change. Closes #31531 from viirya/SPARK-34080-minor. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
c8628c943c
commit
1fbd576410
|
@ -1802,9 +1802,9 @@ User can set `featureType` and `labelType`, and Spark will pick the score functi
|
|||
~~~
|
||||
featureType | labelType |score function
|
||||
------------|------------|--------------
|
||||
categorical |categorical | chi2
|
||||
continuous |categorical | f_classif
|
||||
continuous |continuous | f_regression
|
||||
categorical |categorical | chi-squared (chi2)
|
||||
continuous |categorical | ANOVATest (f_classif)
|
||||
continuous |continuous | F-value (f_regression)
|
||||
~~~
|
||||
|
||||
It supports five selection modes: `numTopFeatures`, `percentile`, `fpr`, `fdr`, `fwe`:
|
||||
|
|
|
@ -44,7 +44,7 @@ import org.apache.spark.sql.types.StructType
|
|||
* By default, the selection method is `numTopFeatures`, with the default number of top features
|
||||
* set to 50.
|
||||
*/
|
||||
@deprecated("use UnivariateFeatureSelector instead", "3.1.0")
|
||||
@deprecated("use UnivariateFeatureSelector instead", "3.1.1")
|
||||
@Since("1.6.0")
|
||||
final class ChiSqSelector @Since("1.6.0") (@Since("1.6.0") override val uid: String)
|
||||
extends Selector[ChiSqSelectorModel] {
|
||||
|
|
|
@ -100,9 +100,12 @@ private[feature] trait UnivariateFeatureSelectorParams extends Params
|
|||
* The user can set `featureType` and labelType`, and Spark will pick the score function based on
|
||||
* the specified `featureType` and labelType`.
|
||||
* The following combination of `featureType` and `labelType` are supported:
|
||||
* - `featureType` `categorical` and `labelType` `categorical`: Spark uses chi2.
|
||||
* - `featureType` `continuous` and `labelType` `categorical`: Spark uses f_classif.
|
||||
* - `featureType` `continuous` and `labelType` `continuous`: Spark uses f_regression.
|
||||
* - `featureType` `categorical` and `labelType` `categorical`: Spark uses chi-squared,
|
||||
* i.e. chi2 in sklearn.
|
||||
* - `featureType` `continuous` and `labelType` `categorical`: Spark uses ANOVATest,
|
||||
* i.e. f_classif in sklearn.
|
||||
* - `featureType` `continuous` and `labelType` `continuous`: Spark uses F-value,
|
||||
* i.e. f_regression in sklearn.
|
||||
*
|
||||
* The `UnivariateFeatureSelector` supports different selection modes: `numTopFeatures`,
|
||||
* `percentile`, `fpr`, `fdr`, `fwe`.
|
||||
|
|
|
@ -5821,9 +5821,12 @@ class UnivariateFeatureSelector(JavaEstimator, _UnivariateFeatureSelectorParams,
|
|||
|
||||
The following combination of `featureType` and `labelType` are supported:
|
||||
|
||||
- `featureType` `categorical` and `labelType` `categorical`, Spark uses chi2.
|
||||
- `featureType` `continuous` and `labelType` `categorical`, Spark uses f_classif.
|
||||
- `featureType` `continuous` and `labelType` `continuous`, Spark uses f_regression.
|
||||
- `featureType` `categorical` and `labelType` `categorical`, Spark uses chi-squared,
|
||||
i.e. chi2 in sklearn.
|
||||
- `featureType` `continuous` and `labelType` `categorical`, Spark uses ANOVATest,
|
||||
i.e. f_classif in sklearn.
|
||||
- `featureType` `continuous` and `labelType` `continuous`, Spark uses F-value,
|
||||
i.e. f_regression in sklearn.
|
||||
|
||||
The `UnivariateFeatureSelector` supports different selection modes: `numTopFeatures`,
|
||||
`percentile`, `fpr`, `fdr`, `fwe`.
|
||||
|
|
Loading…
Reference in a new issue