spark-instrumented-optimizer

History

Alex Favaro 96c1a4401d [SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted ### What changes were proposed in this pull request? As discussed on the Jira ticket, this change clears the SQLContext._instantiatedContext class attribute when the SparkSession is stopped. That way, the attribute will be reset with a new, usable SQLContext when a new SparkSession is started. ### Why are the changes needed? When the underlying SQLContext is instantiated for a SparkSession, the instance is saved as a class attribute and returned from subsequent calls to SQLContext.getOrCreate(). If the SparkContext is stopped and a new one started, the SQLContext class attribute is never cleared so any code which calls SQLContext.getOrCreate() will get a SQLContext with a reference to the old, unusable SparkContext. A similar issue was identified and fixed for SparkSession in [SPARK-19055](https://issues.apache.org/jira/browse/SPARK-19055), but the fix did not change SQLContext as well. I ran into this because mllib still [uses](https://github.com/apache/spark/blob/master/python/pyspark/mllib/common.py#L105) SQLContext.getOrCreate() under the hood. ### Does this PR introduce any user-facing change? No ### How was this patch tested? A new test was added. I verified that the test fails without the included change. Closes #27610 from afavaro/restart-sqlcontext. Authored-by: Alex Favaro <alex.favaro@affirm.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-02-20 12:21:24 +09:00
..
avro	[SPARK-27506][SQL][FOLLOWUP] Use option `avroSchema` to specify an evolved schema in `from_avro`	2019-12-30 18:14:21 +09:00
pandas	[SPARK-30812][SQL][CORE] Revise boolean config name to comply with new config naming policy	2020-02-18 20:39:50 +08:00
tests	[SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted	2020-02-20 12:21:24 +09:00
__init__.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
catalog.py	[SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3	2019-09-09 10:19:40 -05:00
column.py	[SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation	2020-02-18 16:46:45 +09:00
conf.py	[SPARK-23698][PYTHON] Resolve undefined names in Python 3	2018-08-22 10:06:59 -07:00
context.py	[SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted	2020-02-20 12:21:24 +09:00
dataframe.py	[SPARK-30791][SQL][PYTHON] Add 'sameSemantics' and 'sementicHash' methods in Dataset	2020-02-18 09:22:26 +08:00
functions.py	[SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation	2020-02-18 16:46:45 +09:00
group.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
readwriter.py	[SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation	2020-02-18 16:46:45 +09:00
session.py	[SPARK-30856][SQL][PYSPARK] Fix SQLContext.getOrCreate() when SparkContext is restarted	2020-02-20 12:21:24 +09:00
streaming.py	[SPARK-30859][PYSPARK][DOCS][MINOR] Fixed docstring syntax issues preventing proper compilation of documentation	2020-02-18 16:46:45 +09:00
types.py	[MINOR][DOCS] Fix typos at python/pyspark/sql/types.py	2020-02-07 18:42:16 +09:00
udf.py	[SPARK-30722][PYTHON][DOCS] Update documentation for Pandas UDF with Python type hints	2020-02-12 10:49:46 +09:00
utils.py	[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package	2020-01-09 10:22:50 +09:00
window.py	[SPARK-30188][SQL] Resolve the failed unit tests when enable AQE	2020-01-13 22:55:19 +08:00