165590bc29
### What changes were proposed in this pull request? 1, remove newly added method: `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]`, because: 1) it is only used in `ChiSqSelector`; 2, since the returned dataframe of `def test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame` only contains one row, after collect it back to driver, the results are similar; 2, add method `def test(dataset: DataFrame, featuresCol: String, labelCol: String, flatten: Boolean): DataFrame` to return the flatten results; ### Why are the changes needed? 1, when get returned result dataframe, we may want to filter it like `pValue<0.1`, but current returned dataframe is hard to use; 2, current impl need to collect all test results of all columns back to the driver, this is a bottleneck, if we return the flatten datafame, we no longer to collect them to driver; ### Does this PR introduce any user-facing change? Yes: 1, `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]` removed; 2, the returned dataframe need an action to trigger computation; ### How was this patch tested? updated testsuites Closes #28176 from zhengruifeng/flatten_chisq. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |