1fd54f4bf5
### What changes were proposed in this pull request? The strict requirement for the vocabulary to remain non-empty has been removed in this pull request. Link to the discussion: http://apache-spark-user-list.1001560.n3.nabble.com/Ability-to-have-CountVectorizerModel-vocab-as-empty-td38396.html ### Why are the changes needed? This soothens running it across the corner cases. Without this, the user has to manupulate the data in genuine case, which may be a perfectly fine valid use-case. Question: Should we a log when empty vocabulary is found instead? ### Does this PR introduce _any_ user-facing change? May be a slight change. If someone has put a try-catch to detect an empty vocab. Then that behavior would no longer stand still. ### How was this patch tested? 1. Added testcase to `fit` generating an empty vocabulary 2. Added testcase to `transform` with empty vocabulary Request to review: srowen hhbyyh Closes #29482 from purijatin/spark_32662. Authored-by: Jatin Puri <purijatin@gmail.com> Signed-off-by: Huaxin Gao <huaxing@us.ibm.com> |
||
---|---|---|
.. | ||
benchmarks | ||
src | ||
pom.xml |