spark-instrumented-optimizer/python/pyspark
HyukjinKwon e11a24c1ba [SPARK-33371][PYTHON] Update setup.py and tests for Python 3.9
### What changes were proposed in this pull request?

This PR proposes to fix PySpark to officially support Python 3.9. The main codes already work. We should just note that we support Python 3.9.

Also, this PR fixes some minor fixes into the test codes.
- `Thread.isAlive` is removed in Python 3.9, and `Thread.is_alive` exists in Python 3.6+, see https://docs.python.org/3/whatsnew/3.9.html#removed
- Fixed `TaskContextTestsWithWorkerReuse.test_barrier_with_python_worker_reuse` and `TaskContextTests.test_barrier` to be less flaky. This becomes more flaky in Python 3.9 for some reasons.

NOTE that PyArrow does not support Python 3.9 yet.

### Why are the changes needed?

To officially support Python 3.9.

### Does this PR introduce _any_ user-facing change?

Yes, it officially supports Python 3.9.

### How was this patch tested?

Manually ran the tests:

```
$  ./run-tests --python-executable=python
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['python']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
python python_implementation is CPython
python version is: Python 3.9.0
Starting test(python): pyspark.ml.tests.test_base
Starting test(python): pyspark.ml.tests.test_evaluation
Starting test(python): pyspark.ml.tests.test_algorithms
Starting test(python): pyspark.ml.tests.test_feature
Finished test(python): pyspark.ml.tests.test_base (12s)
Starting test(python): pyspark.ml.tests.test_image
Finished test(python): pyspark.ml.tests.test_evaluation (15s)
Starting test(python): pyspark.ml.tests.test_linalg
Finished test(python): pyspark.ml.tests.test_feature (25s)
Starting test(python): pyspark.ml.tests.test_param
Finished test(python): pyspark.ml.tests.test_image (17s)
Starting test(python): pyspark.ml.tests.test_persistence
Finished test(python): pyspark.ml.tests.test_param (17s)
Starting test(python): pyspark.ml.tests.test_pipeline
Finished test(python): pyspark.ml.tests.test_linalg (30s)
Starting test(python): pyspark.ml.tests.test_stat
Finished test(python): pyspark.ml.tests.test_pipeline (6s)
Starting test(python): pyspark.ml.tests.test_training_summary
Finished test(python): pyspark.ml.tests.test_stat (12s)
Starting test(python): pyspark.ml.tests.test_tuning
Finished test(python): pyspark.ml.tests.test_algorithms (68s)
Starting test(python): pyspark.ml.tests.test_wrapper
Finished test(python): pyspark.ml.tests.test_persistence (51s)
Starting test(python): pyspark.mllib.tests.test_algorithms
Finished test(python): pyspark.ml.tests.test_training_summary (33s)
Starting test(python): pyspark.mllib.tests.test_feature
Finished test(python): pyspark.ml.tests.test_wrapper (19s)
Starting test(python): pyspark.mllib.tests.test_linalg
Finished test(python): pyspark.mllib.tests.test_feature (26s)
Starting test(python): pyspark.mllib.tests.test_stat
Finished test(python): pyspark.mllib.tests.test_stat (22s)
Starting test(python): pyspark.mllib.tests.test_streaming_algorithms
Finished test(python): pyspark.mllib.tests.test_algorithms (53s)
Starting test(python): pyspark.mllib.tests.test_util
Finished test(python): pyspark.mllib.tests.test_linalg (54s)
Starting test(python): pyspark.sql.tests.test_arrow
Finished test(python): pyspark.sql.tests.test_arrow (0s) ... 61 tests were skipped
Starting test(python): pyspark.sql.tests.test_catalog
Finished test(python): pyspark.mllib.tests.test_util (11s)
Starting test(python): pyspark.sql.tests.test_column
Finished test(python): pyspark.sql.tests.test_catalog (16s)
Starting test(python): pyspark.sql.tests.test_conf
Finished test(python): pyspark.sql.tests.test_column (17s)
Starting test(python): pyspark.sql.tests.test_context
Finished test(python): pyspark.sql.tests.test_context (6s) ... 3 tests were skipped
Starting test(python): pyspark.sql.tests.test_dataframe
Finished test(python): pyspark.sql.tests.test_conf (11s)
Starting test(python): pyspark.sql.tests.test_datasources
Finished test(python): pyspark.sql.tests.test_datasources (19s)
Starting test(python): pyspark.sql.tests.test_functions
Finished test(python): pyspark.sql.tests.test_dataframe (35s) ... 3 tests were skipped
Starting test(python): pyspark.sql.tests.test_group
Finished test(python): pyspark.sql.tests.test_functions (32s)
Starting test(python): pyspark.sql.tests.test_pandas_cogrouped_map
Finished test(python): pyspark.sql.tests.test_pandas_cogrouped_map (1s) ... 15 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_grouped_map
Finished test(python): pyspark.sql.tests.test_group (19s)
Starting test(python): pyspark.sql.tests.test_pandas_map
Finished test(python): pyspark.sql.tests.test_pandas_grouped_map (0s) ... 21 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_udf
Finished test(python): pyspark.sql.tests.test_pandas_map (0s) ... 6 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_udf_grouped_agg
Finished test(python): pyspark.sql.tests.test_pandas_udf (0s) ... 6 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_udf_scalar
Finished test(python): pyspark.sql.tests.test_pandas_udf_grouped_agg (0s) ... 13 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_udf_typehints
Finished test(python): pyspark.sql.tests.test_pandas_udf_scalar (0s) ... 50 tests were skipped
Starting test(python): pyspark.sql.tests.test_pandas_udf_window
Finished test(python): pyspark.sql.tests.test_pandas_udf_typehints (0s) ... 10 tests were skipped
Starting test(python): pyspark.sql.tests.test_readwriter
Finished test(python): pyspark.sql.tests.test_pandas_udf_window (0s) ... 14 tests were skipped
Starting test(python): pyspark.sql.tests.test_serde
Finished test(python): pyspark.sql.tests.test_serde (19s)
Starting test(python): pyspark.sql.tests.test_session
Finished test(python): pyspark.mllib.tests.test_streaming_algorithms (120s)
Starting test(python): pyspark.sql.tests.test_streaming
Finished test(python): pyspark.sql.tests.test_readwriter (25s)
Starting test(python): pyspark.sql.tests.test_types
Finished test(python): pyspark.ml.tests.test_tuning (208s)
Starting test(python): pyspark.sql.tests.test_udf
Finished test(python): pyspark.sql.tests.test_session (31s)
Starting test(python): pyspark.sql.tests.test_utils
Finished test(python): pyspark.sql.tests.test_streaming (35s)
Starting test(python): pyspark.streaming.tests.test_context
Finished test(python): pyspark.sql.tests.test_types (34s)
Starting test(python): pyspark.streaming.tests.test_dstream
Finished test(python): pyspark.sql.tests.test_utils (14s)
Starting test(python): pyspark.streaming.tests.test_kinesis
Finished test(python): pyspark.streaming.tests.test_kinesis (0s) ... 2 tests were skipped
Starting test(python): pyspark.streaming.tests.test_listener
Finished test(python): pyspark.streaming.tests.test_listener (11s)
Starting test(python): pyspark.tests.test_appsubmit
Finished test(python): pyspark.sql.tests.test_udf (39s)
Starting test(python): pyspark.tests.test_broadcast
Finished test(python): pyspark.streaming.tests.test_context (23s)
Starting test(python): pyspark.tests.test_conf
Finished test(python): pyspark.tests.test_conf (15s)
Starting test(python): pyspark.tests.test_context
Finished test(python): pyspark.tests.test_broadcast (33s)
Starting test(python): pyspark.tests.test_daemon
Finished test(python): pyspark.tests.test_daemon (5s)
Starting test(python): pyspark.tests.test_install_spark
Finished test(python): pyspark.tests.test_context (44s)
Starting test(python): pyspark.tests.test_join
Finished test(python): pyspark.tests.test_appsubmit (68s)
Starting test(python): pyspark.tests.test_profiler
Finished test(python): pyspark.tests.test_join (7s)
Starting test(python): pyspark.tests.test_rdd
Finished test(python): pyspark.tests.test_profiler (9s)
Starting test(python): pyspark.tests.test_rddbarrier
Finished test(python): pyspark.tests.test_rddbarrier (7s)
Starting test(python): pyspark.tests.test_readwrite
Finished test(python): pyspark.streaming.tests.test_dstream (107s)
Starting test(python): pyspark.tests.test_serializers
Finished test(python): pyspark.tests.test_serializers (8s)
Starting test(python): pyspark.tests.test_shuffle
Finished test(python): pyspark.tests.test_readwrite (14s)
Starting test(python): pyspark.tests.test_taskcontext
Finished test(python): pyspark.tests.test_install_spark (65s)
Starting test(python): pyspark.tests.test_util
Finished test(python): pyspark.tests.test_shuffle (8s)
Starting test(python): pyspark.tests.test_worker
Finished test(python): pyspark.tests.test_util (5s)
Starting test(python): pyspark.accumulators
Finished test(python): pyspark.accumulators (5s)
Starting test(python): pyspark.broadcast
Finished test(python): pyspark.broadcast (6s)
Starting test(python): pyspark.conf
Finished test(python): pyspark.tests.test_worker (14s)
Starting test(python): pyspark.context
Finished test(python): pyspark.conf (4s)
Starting test(python): pyspark.ml.classification
Finished test(python): pyspark.tests.test_rdd (60s)
Starting test(python): pyspark.ml.clustering
Finished test(python): pyspark.context (21s)
Starting test(python): pyspark.ml.evaluation
Finished test(python): pyspark.tests.test_taskcontext (69s)
Starting test(python): pyspark.ml.feature
Finished test(python): pyspark.ml.evaluation (26s)
Starting test(python): pyspark.ml.fpm
Finished test(python): pyspark.ml.clustering (45s)
Starting test(python): pyspark.ml.functions
Finished test(python): pyspark.ml.fpm (24s)
Starting test(python): pyspark.ml.image
Finished test(python): pyspark.ml.functions (17s)
Starting test(python): pyspark.ml.linalg.__init__
Finished test(python): pyspark.ml.linalg.__init__ (0s)
Starting test(python): pyspark.ml.recommendation
Finished test(python): pyspark.ml.classification (74s)
Starting test(python): pyspark.ml.regression
Finished test(python): pyspark.ml.image (8s)
Starting test(python): pyspark.ml.stat
Finished test(python): pyspark.ml.stat (29s)
Starting test(python): pyspark.ml.tuning
Finished test(python): pyspark.ml.regression (53s)
Starting test(python): pyspark.mllib.classification
Finished test(python): pyspark.ml.tuning (35s)
Starting test(python): pyspark.mllib.clustering
Finished test(python): pyspark.ml.feature (103s)
Starting test(python): pyspark.mllib.evaluation
Finished test(python): pyspark.mllib.classification (33s)
Starting test(python): pyspark.mllib.feature
Finished test(python): pyspark.mllib.evaluation (21s)
Starting test(python): pyspark.mllib.fpm
Finished test(python): pyspark.ml.recommendation (103s)
Starting test(python): pyspark.mllib.linalg.__init__
Finished test(python): pyspark.mllib.linalg.__init__ (1s)
Starting test(python): pyspark.mllib.linalg.distributed
Finished test(python): pyspark.mllib.feature (26s)
Starting test(python): pyspark.mllib.random
Finished test(python): pyspark.mllib.fpm (23s)
Starting test(python): pyspark.mllib.recommendation
Finished test(python): pyspark.mllib.clustering (50s)
Starting test(python): pyspark.mllib.regression
Finished test(python): pyspark.mllib.random (13s)
Starting test(python): pyspark.mllib.stat.KernelDensity
Finished test(python): pyspark.mllib.stat.KernelDensity (1s)
Starting test(python): pyspark.mllib.stat._statistics
Finished test(python): pyspark.mllib.linalg.distributed (42s)
Starting test(python): pyspark.mllib.tree
Finished test(python): pyspark.mllib.stat._statistics (19s)
Starting test(python): pyspark.mllib.util
Finished test(python): pyspark.mllib.regression (33s)
Starting test(python): pyspark.profiler
Finished test(python): pyspark.mllib.recommendation (36s)
Starting test(python): pyspark.rdd
Finished test(python): pyspark.profiler (9s)
Starting test(python): pyspark.resource.tests.test_resources
Finished test(python): pyspark.mllib.tree (19s)
Starting test(python): pyspark.serializers
Finished test(python): pyspark.mllib.util (21s)
Starting test(python): pyspark.shuffle
Finished test(python): pyspark.resource.tests.test_resources (9s)
Starting test(python): pyspark.sql.avro.functions
Finished test(python): pyspark.shuffle (1s)
Starting test(python): pyspark.sql.catalog
Finished test(python): pyspark.rdd (22s)
Starting test(python): pyspark.sql.column
Finished test(python): pyspark.serializers (12s)
Starting test(python): pyspark.sql.conf
Finished test(python): pyspark.sql.conf (6s)
Starting test(python): pyspark.sql.context
Finished test(python): pyspark.sql.catalog (14s)
Starting test(python): pyspark.sql.dataframe
Finished test(python): pyspark.sql.avro.functions (15s)
Starting test(python): pyspark.sql.functions
Finished test(python): pyspark.sql.column (24s)
Starting test(python): pyspark.sql.group
Finished test(python): pyspark.sql.context (20s)
Starting test(python): pyspark.sql.pandas.conversion
Finished test(python): pyspark.sql.pandas.conversion (13s)
Starting test(python): pyspark.sql.pandas.group_ops
Finished test(python): pyspark.sql.group (36s)
Starting test(python): pyspark.sql.pandas.map_ops
Finished test(python): pyspark.sql.pandas.group_ops (21s)
Starting test(python): pyspark.sql.pandas.serializers
Finished test(python): pyspark.sql.pandas.serializers (0s)
Starting test(python): pyspark.sql.pandas.typehints
Finished test(python): pyspark.sql.pandas.typehints (0s)
Starting test(python): pyspark.sql.pandas.types
Finished test(python): pyspark.sql.pandas.types (0s)
Starting test(python): pyspark.sql.pandas.utils
Finished test(python): pyspark.sql.pandas.utils (0s)
Starting test(python): pyspark.sql.readwriter
Finished test(python): pyspark.sql.dataframe (56s)
Starting test(python): pyspark.sql.session
Finished test(python): pyspark.sql.functions (57s)
Starting test(python): pyspark.sql.streaming
Finished test(python): pyspark.sql.pandas.map_ops (12s)
Starting test(python): pyspark.sql.types
Finished test(python): pyspark.sql.types (10s)
Starting test(python): pyspark.sql.udf
Finished test(python): pyspark.sql.streaming (16s)
Starting test(python): pyspark.sql.window
Finished test(python): pyspark.sql.session (19s)
Starting test(python): pyspark.streaming.util
Finished test(python): pyspark.streaming.util (0s)
Starting test(python): pyspark.util
Finished test(python): pyspark.util (0s)
Finished test(python): pyspark.sql.readwriter (24s)
Finished test(python): pyspark.sql.udf (13s)
Finished test(python): pyspark.sql.window (14s)
Tests passed in 780 seconds

```

Closes #30277 from HyukjinKwon/SPARK-33371.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-11-06 15:05:37 -08:00
..
cloudpickle [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0 2020-07-17 11:49:18 +09:00
ml [SPARK-33203][PYTHON][TEST] Fix tests failing with rounding errors 2020-10-21 18:14:21 -07:00
mllib [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
resource [SPARK-33086][FOLLOW-UP] Remove unused Optional import from pyspark.resource.profile stub 2020-10-12 10:29:28 +09:00
sql Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" 2020-11-05 16:15:17 +09:00
streaming [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
testing [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
tests [SPARK-33371][PYTHON] Update setup.py and tests for Python 3.9 2020-11-06 15:05:37 -08:00
__init__.py [SPARK-33017][PYTHON] Add getCheckpointDir method to PySpark Context 2020-10-05 11:48:28 +09:00
__init__.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
accumulators.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
accumulators.pyi [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
broadcast.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
broadcast.pyi [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
conf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
conf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
context.py [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency 2020-10-27 14:03:57 +09:00
context.pyi [SPARK-33017][PYTHON] Add getCheckpointDir method to PySpark Context 2020-10-05 11:48:28 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
files.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
find_spark_home.py [SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI 2020-09-23 09:30:51 +09:00
install.py [SPARK-32714][FOLLOW-UP][PYTHON] Address pyspark.install typing errors 2020-09-27 16:21:23 +09:00
java_gateway.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
profiler.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
py.typed [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
rdd.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
rdd.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
resultiterable.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
serializers.py [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
shell.py [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
shuffle.py [SPARK-32435][PYTHON] Remove heapq3 port from Python 3 2020-07-27 20:10:13 +09:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
statcounter.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
status.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
storagelevel.py [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py 2020-09-15 08:41:22 -05:00
storagelevel.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
taskcontext.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
taskcontext.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
version.py [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
version.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
worker.py [MINOR][PYTHON] Fix spacing in error message 2020-07-28 11:22:18 +09:00