0a4080ec3b
### What changes were proposed in this pull request? 1, distributedly gather matrix `contingency` of each feature 2, distributedly compute the results and then collect them back to the driver ### Why are the changes needed? existing impl is not efficient: 1, it directly collect matrix `contingency` of partial featues to driver and compute the corresponding result on one pass; 2, a matrix `contingency` of a featues is of size numDistinctValues X numDistinctLabels, so only 1000 matrices can be collected at a time; ### Does this PR introduce any user-facing change? No ### How was this patch tested? existing testsuites Closes #27461 from zhengruifeng/chisq_opt. Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |