spark-instrumented-optimizer

History

Jatin Puri 1fd54f4bf5 [SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size ### What changes were proposed in this pull request? The strict requirement for the vocabulary to remain non-empty has been removed in this pull request. Link to the discussion: http://apache-spark-user-list.1001560.n3.nabble.com/Ability-to-have-CountVectorizerModel-vocab-as-empty-td38396.html ### Why are the changes needed? This soothens running it across the corner cases. Without this, the user has to manupulate the data in genuine case, which may be a perfectly fine valid use-case. Question: Should we a log when empty vocabulary is found instead? ### Does this PR introduce _any_ user-facing change? May be a slight change. If someone has put a try-catch to detect an empty vocab. Then that behavior would no longer stand still. ### How was this patch tested? 1. Added testcase to `fit` generating an empty vocabulary 2. Added testcase to `transform` with empty vocabulary Request to review: srowen hhbyyh Closes #29482 from purijatin/spark_32662. Authored-by: Jatin Puri <purijatin@gmail.com> Signed-off-by: Huaxin Gao <huaxing@us.ibm.com>		2020-08-21 16:14:29 -07:00
..
benchmarks	[SPARK-29297][TESTS] Compare `core`/`mllib` module benchmarks in JDK8/11	2019-09-29 21:43:58 -07:00
src	[SPARK-32662][ML] CountVectorizerModel: Remove requirement for minimum Vocab size	2020-08-21 16:14:29 -07:00
pom.xml	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00