spark-instrumented-optimizer/python
Holden Karau 3b29004d24 [SPARK-7675][ML][PYSPARK] sparkml params type conversion
From JIRA:
Currently, PySpark wrappers for spark.ml Scala classes are brittle when accepting Param types. E.g., Normalizer's "p" param cannot be set to "2" (an integer); it must be set to "2.0" (a float). Fixing this is not trivial since there does not appear to be a natural place to insert the conversion before Python wrappers call Java's Params setter method.

A possible fix will be to include a method "_checkType" to PySpark's Param class which checks the type, prints an error if needed, and converts types when relevant (e.g., int to float, or scipy matrix to array). The Java wrapper method which copies params to Scala can call this method when available.

This fix instead checks the types at set time since I think failing sooner is better, but I can switch it around to check at copy time if that would be better. So far this only converts int to float and other conversions (like scipymatrix to array) are left for the future.

Author: Holden Karau <holden@us.ibm.com>

Closes #9581 from holdenk/SPARK-7675-PySpark-sparkml-Params-type-conversion.
2016-01-06 10:43:03 -08:00
..
docs [SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9 2015-10-20 10:52:49 -07:00
lib [SPARK-10447][SPARK-3842][PYSPARK] upgrade pyspark to py4j0.9 2015-10-20 10:52:49 -07:00
pyspark [SPARK-7675][ML][PYSPARK] sparkml params type conversion 2016-01-06 10:43:03 -08:00
test_support [SPARK-11292] [SQL] Python API for text data source 2015-10-28 14:28:38 -07:00
.gitignore [SPARK-3946] gitignore in /python includes wrong directory 2014-10-14 14:09:39 -07:00
run-tests [SPARK-8583] [SPARK-5482] [BUILD] Refactor python/run-tests to integrate with dev/run-tests module system 2015-06-27 20:24:34 -07:00
run-tests.py [SPARK-12361][PYSPARK][TESTS] Should set PYSPARK_DRIVER_PYTHON before Python tests 2015-12-16 11:29:51 -08:00