spark-instrumented-optimizer/python/pyspark
cclauss 71f38ac242 [SPARK-23698][PYTHON] Resolve undefined names in Python 3
## What changes were proposed in this pull request?

Fix issues arising from the fact that builtins __file__, __long__, __raw_input()__, __unicode__, __xrange()__, etc. were all removed from Python 3.  __Undefined names__ have the potential to raise [NameError](https://docs.python.org/3/library/exceptions.html#NameError) at runtime.

## How was this patch tested?
* $ __python2 -m flake8 . --count --select=E9,F82 --show-source --statistics__
* $ __python3 -m flake8 . --count --select=E9,F82 --show-source --statistics__

holdenk

flake8 testing of https://github.com/apache/spark on Python 3.6.3

$ __python3 -m flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__
```
./dev/merge_spark_pr.py:98:14: F821 undefined name 'raw_input'
    result = raw_input("\n%s (y/n): " % prompt)
             ^
./dev/merge_spark_pr.py:136:22: F821 undefined name 'raw_input'
    primary_author = raw_input(
                     ^
./dev/merge_spark_pr.py:186:16: F821 undefined name 'raw_input'
    pick_ref = raw_input("Enter a branch name [%s]: " % default_branch)
               ^
./dev/merge_spark_pr.py:233:15: F821 undefined name 'raw_input'
    jira_id = raw_input("Enter a JIRA id [%s]: " % default_jira_id)
              ^
./dev/merge_spark_pr.py:278:20: F821 undefined name 'raw_input'
    fix_versions = raw_input("Enter comma-separated fix version(s) [%s]: " % default_fix_versions)
                   ^
./dev/merge_spark_pr.py:317:28: F821 undefined name 'raw_input'
            raw_assignee = raw_input(
                           ^
./dev/merge_spark_pr.py:430:14: F821 undefined name 'raw_input'
    pr_num = raw_input("Which pull request would you like to merge? (e.g. 34): ")
             ^
./dev/merge_spark_pr.py:442:18: F821 undefined name 'raw_input'
        result = raw_input("Would you like to use the modified title? (y/n): ")
                 ^
./dev/merge_spark_pr.py:493:11: F821 undefined name 'raw_input'
    while raw_input("\n%s (y/n): " % pick_prompt).lower() == "y":
          ^
./dev/create-release/releaseutils.py:58:16: F821 undefined name 'raw_input'
    response = raw_input("%s [y/n]: " % msg)
               ^
./dev/create-release/releaseutils.py:152:38: F821 undefined name 'unicode'
        author = unidecode.unidecode(unicode(author, "UTF-8")).strip()
                                     ^
./python/setup.py:37:11: F821 undefined name '__version__'
VERSION = __version__
          ^
./python/pyspark/cloudpickle.py:275:18: F821 undefined name 'buffer'
        dispatch[buffer] = save_buffer
                 ^
./python/pyspark/cloudpickle.py:807:18: F821 undefined name 'file'
        dispatch[file] = save_file
                 ^
./python/pyspark/sql/conf.py:61:61: F821 undefined name 'unicode'
        if not isinstance(obj, str) and not isinstance(obj, unicode):
                                                            ^
./python/pyspark/sql/streaming.py:25:21: F821 undefined name 'long'
    intlike = (int, long)
                    ^
./python/pyspark/streaming/dstream.py:405:35: F821 undefined name 'long'
        return self._sc._jvm.Time(long(timestamp * 1000))
                                  ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:21:10: F821 undefined name 'xrange'
for i in xrange(50):
         ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:22:14: F821 undefined name 'xrange'
    for j in xrange(5):
             ^
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py:23:18: F821 undefined name 'xrange'
        for k in xrange(20022):
                 ^
20    F821 undefined name 'raw_input'
20
```

Closes #20838 from cclauss/fix-undefined-names.

Authored-by: cclauss <cclauss@bluewin.ch>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
2018-08-22 10:06:59 -07:00
..
ml [SPARK-25090][ML] Enforce implicit type coercion in ParamGridBuilder 2018-08-13 09:11:37 +08:00
mllib Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
sql [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
streaming [SPARK-23698][PYTHON] Resolve undefined names in Python 3 2018-08-22 10:06:59 -07:00
__init__.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [BUILD] Fix lint-python. 2018-08-03 03:18:46 +09:00
broadcast.py [SPARK-23522][PYTHON] always use sys.exit over builtin exit 2018-03-08 20:38:34 +09:00
cloudpickle.py [SPARK-24303][PYTHON] Update cloudpickle to v0.4.4 2018-05-18 09:53:24 -07:00
conf.py [SPARK-23522][PYTHON] always use sys.exit over builtin exit 2018-03-08 20:38:34 +09:00
context.py Fix typos 2018-08-12 08:13:09 -05:00
daemon.py [PYSPARK] Update py4j to version 0.10.7. 2018-05-09 10:47:35 -07:00
files.py [SPARK-3309] [PySpark] Put all public API in __all__ 2014-09-03 11:49:45 -07:00
find_spark_home.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
heapq3.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
java_gateway.py [SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame 2018-06-19 13:56:51 -07:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-23522][PYTHON] always use sys.exit over builtin exit 2018-03-08 20:38:34 +09:00
rdd.py [SPARK-25142][PYSPARK] Add error messages when Python worker could not open socket in _load_from_socket. 2018-08-18 17:24:06 +08:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-3074] [PySpark] support groupByKey() with single huge key 2015-04-09 17:07:23 -07:00
serializers.py [SPARK-25095][PYSPARK] Python support for BarrierTaskContext 2018-08-21 15:54:30 -07:00
shell.py [SPARK-16451][REPL] Fail shell if SparkSession fails to start. 2018-06-05 08:29:29 +07:00
shuffle.py [SPARK-23754][PYTHON] Re-raising StopIteration in client code 2018-05-30 18:11:33 +08:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api 2016-04-12 23:06:55 -07:00
taskcontext.py [SPARK-25095][PYSPARK] Python support for BarrierTaskContext 2018-08-21 15:54:30 -07:00
tests.py [SPARK-25095][PYSPARK] Python support for BarrierTaskContext 2018-08-21 15:54:30 -07:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from driver to executor 2018-06-11 10:15:42 +08:00
version.py [SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT 2018-01-13 00:37:59 +08:00
worker.py [SPARK-25095][PYSPARK] Python support for BarrierTaskContext 2018-08-21 15:54:30 -07:00