spark-instrumented-optimizer/python/pyspark/ml
Hyukjin Kwon a71dd6af2f [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs
### What changes were proposed in this pull request?

This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade)

```
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:64: error: Name "np.ndarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:91: error: Name "np.recarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:165: error: Name "np.ndarray" is not defined
python/pyspark/pandas/categorical.py:82: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/categorical.py:109: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/ml/linalg/__init__.pyi:184: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/ml/linalg/__init__.pyi:217: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/pandas/typedef/typehints.py:163: error: Module has no attribute "bool"; maybe "bool_" or "bool8"?
python/pyspark/pandas/typedef/typehints.py:174: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float96"?
python/pyspark/pandas/typedef/typehints.py:180: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object"
python/pyspark/pandas/indexing.py:1649: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/indexing.py:1656: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/frame.py:4969: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4969: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:4970: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4970: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:7402: error: "List[Any]" has no attribute "tolist"
python/pyspark/pandas/series.py:1030: error: Module has no attribute "_NoValue"
python/pyspark/pandas/series.py:1031: error: Module has no attribute "_NoValue"
python/pyspark/pandas/indexes/category.py:159: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/indexes/category.py:180: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/namespace.py:2036: error: Argument 1 to "column_name" has incompatible type "float"; expected "str"
python/pyspark/pandas/mlflow.py:59: error: Incompatible types in assignment (expression has type "Type[floating[Any]]", variable has type "str")
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/data_type_ops/categorical_ops.py:56: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/tests/test_typedef.py:70: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:77: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:85: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py💯 error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:108: error: Name "np.float" is not defined
python/pyspark/mllib/clustering.pyi:152: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", base class "KMeansModel" defined the type as "List[ndarray[Any, Any]]")
python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel"
Found 32 errors in 15 files (checked 315 source files)
1
```

### Why are the changes needed?

Python 3.6 is deprecated at SPARK-35938

### Does this PR introduce _any_ user-facing change?

No. Maybe static analysis, etc. by some type hints but they are really non-breaking..

### How was this patch tested?

I manually checked by GitHub Actions build in forked repository.

Closes #33356 from HyukjinKwon/SPARK-36146.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-15 08:01:54 -07:00
..
linalg [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
param [SPARK-35176][PYTHON] Standardize input validation error type 2021-05-03 15:34:24 +09:00
tests [SPARK-35678][ML][FOLLOWUP] Revert changes in ANN 2021-06-24 14:02:28 +09:00
__init__.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
base.py [SPARK-35176][PYTHON] Standardize input validation error type 2021-05-03 15:34:24 +09:00
base.pyi [SPARK-33251][PYTHON][DOCS] Migration to NumPy documentation style in ML (pyspark.ml.*) 2020-11-10 09:33:48 +09:00
classification.py [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase 2021-06-20 11:48:38 +09:00
classification.pyi [SPARK-33520][ML][PYSPARK] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/evaluator 2020-12-04 08:35:50 +08:00
clustering.py [SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py 2021-05-13 22:23:51 +09:00
clustering.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
common.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
common.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
evaluation.py [SPARK-35176][PYTHON] Standardize input validation error type 2021-05-03 15:34:24 +09:00
evaluation.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
feature.py [SPARK-35392][ML][PYTHON] Fix flaky tests in ml/clustering.py and ml/feature.py 2021-05-13 22:23:51 +09:00
feature.pyi [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector 2021-01-16 11:09:23 +08:00
fpm.py [MINOR][DOCS] Use ASCII characters when possible in PySpark documentation 2021-04-04 09:49:36 +03:00
fpm.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
functions.py [MINOR][DOCS][ML] Explicit return type of array_to_vector utility function 2021-04-27 09:08:26 -05:00
functions.pyi [SPARK-33556][ML] Add array_to_vector function for dataframe column 2020-12-01 09:52:19 +09:00
image.py [SPARK-33251][PYTHON][DOCS] Migration to NumPy documentation style in ML (pyspark.ml.*) 2020-11-10 09:33:48 +09:00
image.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
pipeline.py [SPARK-33592] Fix: Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading 2020-12-01 09:36:42 +08:00
pipeline.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
recommendation.py [SPARK-35150][ML] Accelerate fallback BLAS with dev.ludovic.netlib 2021-04-27 14:00:59 -05:00
recommendation.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
regression.py [MINOR][DOCS] Avoid some python docs where first sentence has "e.g." or similar 2021-05-12 10:38:59 +09:00
regression.pyi Spelling r common dev mlib external project streaming resource managers python 2020-11-27 10:22:45 -06:00
stat.py [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector 2021-01-16 11:09:23 +08:00
stat.pyi [SPARK-34080][ML][PYTHON] Add UnivariateFeatureSelector 2021-01-16 11:09:23 +08:00
tree.py [SPARK-34093][ML] param maxDepth should check upper bound 2021-01-18 11:36:10 -06:00
tree.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
tuning.py [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase 2021-06-20 11:48:38 +09:00
tuning.pyi [SPARK-34415][ML] Randomization in hyperparameter optimization 2021-02-27 08:34:39 -06:00
util.py [MINOR][DOCS] Avoid some python docs where first sentence has "e.g." or similar 2021-05-12 10:38:59 +09:00
util.pyi [SPARK-33520][ML][PYSPARK] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/evaluator 2020-12-04 08:35:50 +08:00
wrapper.py [SPARK-33251][PYTHON][DOCS] Migration to NumPy documentation style in ML (pyspark.ml.*) 2020-11-10 09:33:48 +09:00
wrapper.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00