spark-instrumented-optimizer/python/pyspark
Hyukjin Kwon 6d309914df [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase
### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/32429 and https://github.com/apache/spark/pull/32644.
I was thinking about creating separate PRs but decided to include all in this PR because it shares the same context, and should be easier to review together.

This PR includes:
- Use `InheritableThread` and `inheritable_thread_target` in the current code base to prevent potential resource leak (since we enabled pinned thread mode by default now at https://github.com/apache/spark/pull/32429)
- Copy local properties when `start` at `InheritableThread` is called to mimic JVM behaviour. Previously it was copied when `InheritableThread` instance was created (related to #32644).
- https://github.com/apache/spark/pull/32429 missed one place at `inheritable_thread_target` (https://github.com/apache/spark/blob/master/python/pyspark/util.py#L308). More specifically, I missed one place that should enable pinned thread mode by default.

### Why are the changes needed?

To mimic the JVM behaviour about thread lifecycle

### Does this PR introduce _any_ user-facing change?

Ideally no. One possible case is that users use `InheritableThread` with pinned thread mode enabled.
In this case, the local properties will be copied when starting the thread instead of defining the `InheritableThread` object.
This is a small difference that wouldn't likely affect end users.

### How was this patch tested?

Existing tests should cover this.

Closes #32962 from HyukjinKwon/SPARK-35498-SPARK-35303.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2021-06-20 11:48:38 +09:00
..
cloudpickle [SPARK-33983][PYTHON] Update cloudpickle to v1.6.0 2021-01-04 10:36:31 -08:00
ml [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase 2021-06-20 11:48:38 +09:00
mllib [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
pandas [SPARK-35472][PYTHON] Fix disallow_untyped_defs mypy checks for pyspark.pandas.generic 2021-06-20 11:48:01 +09:00
resource [SPARK-32320][PYSPARK] Remove mutable default arguments 2020-12-08 09:35:36 +08:00
sql [SPARK-35643][PYTHON] Fix ambiguous reference in functions.py column() 2021-06-05 12:40:39 +09:00
streaming [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
testing [SPARK-35599][PYTHON] Adjust check_exact parameter for older pd.testing 2021-06-07 11:12:49 +09:00
tests [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
__init__.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
__init__.pyi [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
_typing.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
accumulators.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
accumulators.pyi [SPARK-33002][PYTHON] Remove non-API annotations 2020-10-07 19:53:59 +09:00
broadcast.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
broadcast.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
conf.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
conf.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
context.py [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase 2021-06-20 11:48:38 +09:00
context.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
files.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
find_spark_home.py [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option 2021-01-05 17:21:32 +09:00
install.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
java_gateway.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark.*, pyspark.resource.*, etc.) 2020-11-16 10:21:50 +09:00
profiler.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
py.typed [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
rdd.py [SPARK-35512][PYTHON] Fix OverflowError(cannot convert float infinity to integer) in partitionBy function 2021-06-09 10:57:27 +09:00
rdd.pyi [SPARK-33457][PYTHON] Adjust mypy configuration 2020-11-25 09:27:04 +09:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
resultiterable.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
serializers.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
shell.py [SPARK-33363] Add prompt information related to the current task when pyspark/sparkR starts 2020-11-10 11:12:19 +09:00
shuffle.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
statcounter.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
statcounter.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
status.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
storagelevel.py [SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py 2020-09-15 08:41:22 -05:00
storagelevel.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
taskcontext.py [SPARK-32194][PYTHON] Use proper exception classes instead of plain Exception 2021-05-26 11:54:40 +09:00
taskcontext.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-35303][SPARK-35498][PYTHON][FOLLOW-UP] Copy local properties when starting the thread, and use inheritable thread in the current codebase 2021-06-20 11:48:38 +09:00
util.pyi [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00
version.py [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT 2020-12-04 14:10:42 -08:00
version.pyi [SPARK-32714][PYTHON] Initial pyspark-stubs port 2020-09-24 14:15:36 +09:00
worker.py [SPARK-35303][PYTHON] Enable pinned thread mode by default 2021-06-18 12:02:29 +09:00