spark-instrumented-optimizer/python/pyspark
zero323 8556710409 [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase
### What changes were proposed in this pull request?

Adds

```python
class _IsotonicRegressionBase(HasFeaturesCol, HasLabelCol, HasPredictionCol, HasWeightCol): ...
```

with related `Params` and uses it to replace `JavaPredictor` and `HasWeightCol` in `IsotonicRegression` base classes and `JavaPredictionModel,` in `IsotonicRegressionModel` base classes.

### Why are the changes needed?

Previous work (#25776) on [SPARK-28985](https://issues.apache.org/jira/browse/SPARK-28985) replaced `JavaEstimator`, `HasFeaturesCol`, `HasLabelCol`, `HasPredictionCol` in `IsotonicRegression` and `JavaModel` in `IsotonicRegressionModel` with newly added `JavaPredictor`:

e97b55d322/python/pyspark/ml/wrapper.py (L377)

and `JavaPredictionModel`

e97b55d322/python/pyspark/ml/wrapper.py (L405)

respectively.

This however is inconsistent with Scala counterpart where both  classes extend private `IsotonicRegressionBase`

3cb1b57809/mllib/src/main/scala/org/apache/spark/ml/regression/IsotonicRegression.scala (L42-L43)

This preserves some of the existing inconsistencies (`model` as defined in [the official example](https://github.com/apache/spark/blob/master/examples/src/main/python/ml/isotonic_regression_example.py)), i.e.

```python
from pyspark.ml.regression impor IsotonicRegressionMode
from pyspark.ml.param.shared import HasWeightCol

issubclass(IsotonicRegressionModel, HasWeightCol)
# False

hasattr(model, "weightCol")
# True
```

as well as introduces a bug, by adding unsupported `predict` method:

```python
import inspect

hasattr(model, "predict")
# True

inspect.getfullargspec(IsotonicRegressionModel.predict)
# FullArgSpec(args=['self', 'value'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={})

IsotonicRegressionModel.predict.__doc__
# Predict label for the given features.\n\n        .. versionadded:: 3.0.0'

model.predict(dataset.first().features)

# Py4JError: An error occurred while calling o49.predict. Trace:
# py4j.Py4JException: Method predict([class org.apache.spark.ml.linalg.SparseVector]) does not exist
# ...

```

Furthermore existing implementation can cause further problems in the future, if `Predictor` / `PredictionModel` API changes.

### Does this PR introduce any user-facing change?

Yes. It:

- Removes invalid `IsotonicRegressionModel.predict` method.
- Adds `HasWeightColumn` to `IsotonicRegressionModel`.

however the faulty implementation hasn't been released yet, and proposed additions have negligible potential for breaking existing code (and none, compared to changes already made in #25776).

### How was this patch tested?

- Existing unit tests.
- Manual testing.

CC huaxingao, zhengruifeng

Closes #26023 from zero323/SPARK-28985-FOLLOW-UP-isotonic-regression.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
2019-10-04 18:06:10 -05:00
..
ml [SPARK-28985][PYTHON][ML][FOLLOW-UP] Add _IsotonicRegressionBase 2019-10-04 18:06:10 -05:00
mllib [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
sql [SPARK-27463][PYTHON][FOLLOW-UP] Miscellaneous documentation and code cleanup of cogroup pandas UDF 2019-09-30 22:25:35 +09:00
streaming [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
testing [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
tests [SPARK-21045][PYTHON] Allow non-ascii string as an exception message from python execution in Python 2 2019-09-21 08:09:19 +09:00
__init__.py [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
broadcast.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
cloudpickle.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
conf.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
context.py [MINOR][DOCS] Fix few typos in the java docs 2019-09-12 09:30:03 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
find_spark_home.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
heapq3.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
java_gateway.py [SPARK-27870][SQL][PYTHON] Add a runtime buffer size configuration for Pandas UDFs 2019-06-15 20:56:22 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-27659][PYTHON] Allow PySpark to prefetch during toLocalIterator 2019-09-20 09:59:31 -07:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resourceinformation.py [SPARK-28234][CORE][PYTHON] Add python and JavaSparkContext support to get resources 2019-07-11 09:32:58 +09:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
shell.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
shuffle.py [SPARK-25696] The storage memory displayed on spark Application UI is… 2018-12-10 18:27:01 -06:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 2018-11-07 22:48:50 -06:00
taskcontext.py [SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated usages of Experimental, Evolving annotations 2019-09-01 10:15:00 -05:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
version.py [SPARK-25592] Setting version to 3.0.0-SNAPSHOT 2018-10-02 08:48:24 -07:00
worker.py [SPARK-21045][PYTHON] Allow non-ascii string as an exception message from python execution in Python 2 2019-09-21 08:09:19 +09:00