spark-instrumented-optimizer

History

zhengruifeng 165590bc29 [SPARK-31301][ML] Flatten the result dataframe of tests in testChiSquare ### What changes were proposed in this pull request? 1, remove newly added method: `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]`, because: 1) it is only used in `ChiSqSelector`; 2, since the returned dataframe of `def test(dataset: DataFrame, featuresCol: String, labelCol: String): DataFrame` only contains one row, after collect it back to driver, the results are similar; 2, add method `def test(dataset: DataFrame, featuresCol: String, labelCol: String, flatten: Boolean): DataFrame` to return the flatten results; ### Why are the changes needed? 1, when get returned result dataframe, we may want to filter it like `pValue<0.1`, but current returned dataframe is hard to use; 2, current impl need to collect all test results of all columns back to the driver, this is a bottleneck, if we return the flatten datafame, we no longer to collect them to driver; ### Does this PR introduce any user-facing change? Yes: 1, `def testChiSquare(dataset: Dataset[_], featuresCol: String, labelCol: String): Array[SelectionTestResult]` removed; 2, the returned dataframe need an action to trigger computation; ### How was this patch tested? updated testsuites Closes #28176 from zhengruifeng/flatten_chisq. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: zhengruifeng <ruifengz@foxmail.com>		2020-04-14 14:44:54 +08:00
..
benchmarks	[SPARK-29297][TESTS] Compare `core`/`mllib` module benchmarks in JDK8/11	2019-09-29 21:43:58 -07:00
src	[SPARK-31301][ML] Flatten the result dataframe of tests in testChiSquare	2020-04-14 14:44:54 +08:00
pom.xml	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00