spark-instrumented-optimizer/mllib
zhengruifeng 0c38765b29 [SPARK-32974][ML] FeatureHasher transform optimization
### What changes were proposed in this pull request?
pre-compute the output indices of numerical columns, instead of computing them on each row.

### Why are the changes needed?
for a numerical column, its output index is a hash of its `col_name`, we can pre-compute it at first, instead of computing it on each row.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #29850 from zhengruifeng/hash_opt.

Authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: zhengruifeng <ruifengz@foxmail.com>
2020-09-27 09:35:05 +08:00
..
benchmarks [SPARK-29297][TESTS] Compare core/mllib module benchmarks in JDK8/11 2019-09-29 21:43:58 -07:00
src [SPARK-32974][ML] FeatureHasher transform optimization 2020-09-27 09:35:05 +08:00
pom.xml [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00