### What changes were proposed in this pull request?
This PR aims to drop Python 2.7, 3.4 and 3.5.
Roughly speaking, it removes all the widely known Python 2 compatibility workarounds such as `sys.version` comparison, `__future__`. Also, it removes the Python 2 dedicated codes such as `ArrayConstructor` in Spark.
### Why are the changes needed?
1. Unsupport EOL Python versions
2. Reduce maintenance overhead and remove a bit of legacy codes and hacks for Python 2.
3. PyPy2 has a critical bug that causes a flaky test, SPARK-28358 given my testing and investigation.
4. Users can use Python type hints with Pandas UDFs without thinking about Python version
5. Users can leverage one latest cloudpickle, https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also leverage C pickle.
### Does this PR introduce _any_ user-facing change?
Yes, users cannot use Python 2.7, 3.4 and 3.5 in the upcoming Spark version.
### How was this patch tested?
Manually tested and also tested in Jenkins.
Closes#28957 from HyukjinKwon/SPARK-32138.
Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
## What changes were proposed in this pull request?
Fix issues arising from the fact that builtins __file__, __long__, __raw_input()__, __unicode__, __xrange()__, etc. were all removed from Python 3. __Undefined names__ have the potential to raise [NameError](https://docs.python.org/3/library/exceptions.html#NameError) at runtime.
## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source --statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source --statistics__
holdenk
flake8 testing of https://github.com/apache/spark on Python 3.6.3
$ __python3 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__
```
./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
result = raw_input("\n%s (y/n): " % prompt)
^
./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
primary_author = raw_input(
^
./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
^
./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
^
./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions)
^
./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
raw_assignee = raw_input(
^
./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
pr_num = raw_input("Which pull request would you like to merge? (e.g. 34): ")
^
./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
result = raw_input("Would you like to use the modified title? (y/n): ")
^
./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
^
./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
response = raw_input("%s [y/n]: " % msg)
^
./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
^
./python/setup.py:37:11: F821 undefined name '__version__'
VERSION = __version__
^
./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
dispatch[buffer] = save_buffer
^
./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
dispatch[file] = save_file
^
./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
if not isinstance(obj, str) and not isinstance(obj, unicode):
^
./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
intlike = (int, long)
^
./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
return self._sc._jvm.Time(long(timestamp * 1000))
^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 undefined name 'xrange'
for i in xrange(50):
^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 undefined name 'xrange'
for j in xrange(5):
^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 undefined name 'xrange'
for k in xrange(20022):
^
20 F821 undefined name 'raw_input'
20
```
Closes#20838 from cclauss/fix-undefined-names.
Authored-by: cclauss <cclauss@bluewin.ch>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
## What changes were proposed in this pull request?
In the PR, I propose to extend `RuntimeConfig` by new method `isModifiable()` which returns `true` if a config parameter can be modified at runtime (for current session state). For static SQL and core parameters, the method returns `false`.
## How was this patch tested?
Added new test to `RuntimeConfigSuite` for checking Spark core and SQL parameters.
Author: Maxim Gekk <maxim.gekk@databricks.com>
Closes#21730 from MaxGekk/is-modifiable.
## What changes were proposed in this pull request?
This cleans up unused imports, mainly from pyspark.sql module. Added a note in function.py that imports `UserDefinedFunction` only to maintain backwards compatibility for using `from pyspark.sql.function import UserDefinedFunction`.
## How was this patch tested?
Existing tests and built docs.
Author: Bryan Cutler <cutlerb@gmail.com>
Closes#20892 from BryanCutler/pyspark-cleanup-imports-SPARK-23700.
The exit() builtin is only for interactive use. applications should use sys.exit().
## What changes were proposed in this pull request?
All usage of the builtin `exit()` function is replaced by `sys.exit()`.
## How was this patch tested?
I ran `python/run-tests`.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Benjamin Peterson <benjamin@python.org>
Closes#20682 from benjaminp/sys-exit.
## What changes were proposed in this pull request?
Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code.
## How was this patch tested?
Existing test.
Author: WeichenXu <WeichenXu123@outlook.com>
Closes#13242 from WeichenXu123/python_doctest_update_sparksession.
## What changes were proposed in this pull request?
Currently we return RuntimeConfig itself to facilitate chaining. However, it makes the output in interactive environments (e.g. notebooks, scala repl) weird because it'd show the response of calling set as a RuntimeConfig itself.
## How was this patch tested?
Updated unit tests.
Author: Reynold Xin <rxin@databricks.com>
Closes#12902 from rxin/SPARK-15126.
## What changes were proposed in this pull request?
Addresses comments in #12765.
## How was this patch tested?
Python tests.
Author: Andrew Or <andrew@databricks.com>
Closes#12784 from andrewor14/python-followup.
## What changes were proposed in this pull request?
The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API.
## How was this patch tested?
Python tests.
Author: Andrew Or <andrew@databricks.com>
Closes#12765 from andrewor14/python-spark-session-more.