64fb358a99
### What changes were proposed in this pull request? When input column lengths can not be inferred and handleInvalid = "keep", VectorAssembler will throw a runtime exception. However the error message with this exception is not consistent. I change the content of this error message to make it work properly. ### Why are the changes needed? This is a bug. Here is a simple example to reproduce it. ``` // create a df without vector size val df = Seq( (Vectors.dense(1.0), Vectors.dense(2.0)) ).toDF("n1", "n2") // only set vector size hint for n1 column val hintedDf = new VectorSizeHint() .setInputCol("n1") .setSize(1) .transform(df) // assemble n1, n2 val output = new VectorAssembler() .setInputCols(Array("n1", "n2")) .setOutputCol("features") .setHandleInvalid("keep") .transform(hintedDf) // because only n1 has vector size, the error message should tell us to set vector size for n2 too output.show() ``` Expected error message: ``` Can not infer column lengths with handleInvalid = "keep". Consider using VectorSizeHint to add metadata for columns: [n2]. ``` Actual error message: ``` Can not infer column lengths with handleInvalid = "keep". Consider using VectorSizeHint to add metadata for columns: [n1, n2]. ``` This introduce difficulties when I try to resolve this exception, for I do not know which column required vectorSizeHint. This is especially troublesome when you have a large number of columns to deal with. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add test in VectorAssemblerSuite. Closes #28487 from fan31415/SPARK-31671. Lead-authored-by: fan31415 <fan12356789@gmail.com> Co-authored-by: yijiefan <fanyije@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |