spark-instrumented-optimizer/python/pyspark
Hyukjin Kwon a71dd6af2f [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs
### What changes were proposed in this pull request?

This PR proposes to use Python 3.9 in documentation and linter at GitHub Actions. This PR also contains the fixes for mypy check (introduced by Python 3.9 upgrade)

```
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:64: error: Name "np.ndarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:91: error: Name "np.recarray" is not defined
python/pyspark/sql/pandas/_typing/protocols/frame.pyi:165: error: Name "np.ndarray" is not defined
python/pyspark/pandas/categorical.py:82: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/categorical.py:109: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/ml/linalg/__init__.pyi:184: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/ml/linalg/__init__.pyi:217: error: Return type "ndarray[Any, Any]" of "toArray" incompatible with return type "NoReturn" in supertype "Matrix"
python/pyspark/pandas/typedef/typehints.py:163: error: Module has no attribute "bool"; maybe "bool_" or "bool8"?
python/pyspark/pandas/typedef/typehints.py:174: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float96"?
python/pyspark/pandas/typedef/typehints.py:180: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/ml.py:81: error: Value of type variable "_DTypeScalar_co" of "dtype" cannot be "object"
python/pyspark/pandas/indexing.py:1649: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/indexing.py:1656: error: Module has no attribute "int"; maybe "uint", "rint", or "intp"?
python/pyspark/pandas/frame.py:4969: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4969: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:4970: error: Function "numpy.array" is not valid as a type
python/pyspark/pandas/frame.py:4970: note: Perhaps you need "Callable[...]" or a callback protocol?
python/pyspark/pandas/frame.py:7402: error: "List[Any]" has no attribute "tolist"
python/pyspark/pandas/series.py:1030: error: Module has no attribute "_NoValue"
python/pyspark/pandas/series.py:1031: error: Module has no attribute "_NoValue"
python/pyspark/pandas/indexes/category.py:159: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/indexes/category.py:180: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/namespace.py:2036: error: Argument 1 to "column_name" has incompatible type "float"; expected "str"
python/pyspark/pandas/mlflow.py:59: error: Incompatible types in assignment (expression has type "Type[floating[Any]]", variable has type "str")
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/data_type_ops/categorical_ops.py:43: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "ordered"
python/pyspark/pandas/data_type_ops/categorical_ops.py:56: error: Item "dtype[Any]" of "Union[dtype[Any], Any]" has no attribute "categories"
python/pyspark/pandas/tests/test_typedef.py:70: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:77: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:85: error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py💯 error: Name "np.float" is not defined
python/pyspark/pandas/tests/test_typedef.py:108: error: Name "np.float" is not defined
python/pyspark/mllib/clustering.pyi:152: error: Incompatible types in assignment (expression has type "ndarray[Any, Any]", base class "KMeansModel" defined the type as "List[ndarray[Any, Any]]")
python/pyspark/mllib/classification.pyi:93: error: Signature of "predict" incompatible with supertype "LinearClassificationModel"
Found 32 errors in 15 files (checked 315 source files)
1
```

### Why are the changes needed?

Python 3.6 is deprecated at SPARK-35938

### Does this PR introduce _any_ user-facing change?

No. Maybe static analysis, etc. by some type hints but they are really non-breaking..

### How was this patch tested?

I manually checked by GitHub Actions build in forked repository.

Closes #33356 from HyukjinKwon/SPARK-36146.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2021-07-15 08:01:54 -07:00
..
cloudpickle [SPARK-33983][PYTHON] Update cloudpickle to v1.6.0 2021-01-04 10:36:31 -08:00
ml [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
mllib [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
pandas [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
resource [SPARK-32320][PYSPARK] Remove mutable default arguments 2020-12-08 09:35:36 +08:00
sql [SPARK-36146][PYTHON][INFRA][TESTS] Upgrade Python version from 3.6 to 3.9 in GitHub Actions' linter/docs 2021-07-15 08:01:54 -07:00
streaming [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
testing [SPARK-35599][PYTHON] Adjust check_exact parameter for older pd.testing 2021-06-07 11:12:49 +09:00
tests [SPARK-36062][PYTHON] Try to capture faulthanlder when a Python worker crashes 2021-07-09 11:30:39 +09:00
__init__.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
__init__.pyi [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
accumulators.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
accumulators.pyi [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
broadcast.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
broadcast.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
conf.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
conf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
context.py [SPARK-35938][PYTHON] Add deprecation warning for Python 3.6 2021-07-01 09:32:25 +09:00
context.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
files.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
find_spark_home.py [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option 2021-01-05 17:21:32 +09:00
install.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
java_gateway.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
profiler.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
py.typed [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
rdd.py [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function 2021-06-09 10:57:27 +09:00
rdd.pyi [SPARK-35986][PYSPARK] Fix type hint for RDD.histogram's buckets 2021-07-04 10:22:57 +09:00
rddsampler.py
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
resultiterable.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
serializers.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
shell.py [SPARK-33363] Add prompt information related to the current task when pyspark/sparkR starts 2020-11-10 11:12:19 +09:00
shuffle.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
statcounter.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
statcounter.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
status.py
status.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
storagelevel.py [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py 2020-09-15 08:41:22 -05:00
storagelevel.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
taskcontext.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
taskcontext.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
traceback_utils.py
util.py [SPARK-35946][PYTHON] Respect Py4J server in InheritableThread API 2021-06-29 22:18:54 -07:00
util.pyi [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
version.py [SPARK-35996][BUILD] Setting version to 3.3.0-SNAPSHOT 2021-07-02 13:47:36 -07:00
version.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
worker.py [SPARK-36062][PYTHON] Try to capture faulthanlder when a Python worker crashes 2021-07-09 11:30:39 +09:00