spark-instrumented-optimizer/mllib
zhengruifeng 165590bc29 [SPARK-31301][ML] Flatten the result dataframe of tests in testChiSquare
### What changes were proposed in this pull request?
1, remove newly added method: `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]`, because: 1) it is only used in `ChiSqSelector`; 2, since the returned dataframe of `def test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame` only contains one row, after collect it back to driver, the results are similar;
2, add method `def test(dataset: DataFrame, featuresCol: String, labelCol: String, flatten: Boolean): DataFrame` to return the flatten results;

### Why are the changes needed?
1, when get returned result dataframe, we may want to filter it like `pValue<0.1`, but current returned dataframe is hard to use;
2, current impl need to collect all test results of all columns back to the driver, this is a bottleneck, if we return the flatten datafame, we no longer to collect them to driver;

### Does this PR introduce any user-facing change?
Yes:
1, `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]` removed;
2, the returned dataframe need an action to trigger computation;

### How was this patch tested?
updated testsuites

Closes #28176 from zhengruifeng/flatten_chisq.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2020-04-14 14:44:54 +08:00
..
benchmarks [SPARK-29297][TESTS] Compare core/mllib module benchmarks in JDK8/11 2019-09-29 21:43:58 -07:00
src [SPARK-31301][ML] Flatten the result dataframe of tests in testChiSquare 2020-04-14 14:44:54 +08:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00