spark-instrumented-optimizer/python/pyspark/ml
Nick Pentreath e8b79afa02 [SPARK-14891][ML] Add schema validation for ALS
This PR adds schema validation to `ml`'s ALS and ALSModel. Currently, no schema validation was performed as `transformSchema` was never called in `ALS.fit` or `ALSModel.transform`. Furthermore, due to no schema validation, if users passed in Long (or Float etc) ids, they would be silently cast to Int with no warning or error thrown.

With this PR, ALS now supports all numeric types for `user`, `item`, and `rating` columns. The rating column is cast to `Float` and the user and item cols are cast to `Int` (as is the case currently) - however for user/item, the cast throws an error if the value is outside integer range. Behavior for rating col is unchanged (as it is not an issue).

## How was this patch tested?
New test cases in `ALSSuite`.

Author: Nick Pentreath <nickp@za.ibm.com>

Closes #12762 from MLnick/SPARK-14891-als-validate-schema.
2016-05-18 21:13:12 +02:00
..
linalg [SPARK-14906][ML] Copy linalg in PySpark to new ML package 2016-05-17 00:08:02 -07:00
param [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
__init__.py [SPARK-15106][PYSPARK][ML] Add PySpark package doc for ML component & remove "BETA" 2016-05-05 10:52:25 +01:00
base.py [SPARK-13038][PYSPARK] Add load/save to pipeline 2016-03-16 13:49:40 -07:00
classification.py [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
clustering.py [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
evaluation.py [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
feature.py [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
pipeline.py [SPARK-14971][ML][PYSPARK] PySpark ML Params setter code clean up 2016-05-03 16:46:13 +02:00
recommendation.py [SPARK-14891][ML] Add schema validation for ALS 2016-05-18 21:13:12 +02:00
regression.py [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based algorithms 2016-05-17 12:51:07 -07:00
tests.py [SPARK-14978][PYSPARK] PySpark TrainValidationSplitModel should support validationMetrics 2016-05-18 08:29:47 +02:00
tuning.py [SPARK-14978][PYSPARK] PySpark TrainValidationSplitModel should support validationMetrics 2016-05-18 08:29:47 +02:00
util.py [SPARK-14903][SPARK-14071][ML][PYTHON] Revert : MLWritable.write property 2016-04-26 12:00:57 -07:00
wrapper.py [SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update 2016-05-01 12:29:01 -07:00