Fokko Driesprong 9fcf0ea718 [SPARK-32319][PYSPARK] Disallow the use of unused imports
Disallow the use of unused imports:

- Unnecessary increases the memory footprint of the application
- Removes the imports that are required for the examples in the docstring from the file-scope to the example itself. This keeps the files itself clean, and gives a more complete example as it also includes the imports :)

fokkodriesprongFan spark % flake8 python | grep -i "imported but unused"
python/pyspark/ F401 'functools.partial' imported but unused
python/pyspark/ F401 'traceback' imported but unused
python/pyspark/ F401 '_heapq.*' imported but unused
python/pyspark/ F401 'pyspark.version.__version__' imported but unused
python/pyspark/ F401 'pyspark._globals._NoValue' imported but unused
python/pyspark/ F401 'pyspark.sql.SQLContext' imported but unused
python/pyspark/ F401 'pyspark.sql.HiveContext' imported but unused
python/pyspark/ F401 'pyspark.sql.Row' imported but unused
python/pyspark/ F401 're' imported but unused
python/pyspark/ F401 'tempfile.NamedTemporaryFile' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/ F401 'pyspark.mllib.regression.LabeledPoint' imported but unused
python/pyspark/mllib/tests/ F401 'sys' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_linalg.*' imported but unused
python/pyspark/mllib/tests/ F401 'numpy.random' imported but unused
python/pyspark/mllib/tests/ F401 'numpy.exp' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.Vector' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.VectorUDT' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_feature.*' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_util.*' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.Vector' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.VectorUDT' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg._convert_to_vector' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.DenseMatrix' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.SparseMatrix' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.linalg.MatrixUDT' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_stat.*' imported but unused
python/pyspark/mllib/tests/ F401 'time.time' imported but unused
python/pyspark/mllib/tests/ F401 'time.sleep' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_streaming_algorithms.*' imported but unused
python/pyspark/mllib/tests/ F401 'pyspark.mllib.tests.test_algorithms.*' imported but unused
python/pyspark/tests/ F401 'xmlrunner' imported but unused
python/pyspark/tests/ F401 'sys' imported but unused
python/pyspark/tests/ F401 'pyspark.resource.ResourceProfile' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_rdd.*' imported but unused
python/pyspark/tests/ F401 'sys' imported but unused
python/pyspark/tests/ F401 'array.array' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_readwrite.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_join.*' imported but unused
python/pyspark/tests/ F401 'shutil' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_taskcontext.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_conf.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_broadcast.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_daemon.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_util.*' imported but unused
python/pyspark/tests/ F401 'random' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_pin_thread.*' imported but unused
python/pyspark/tests/ F401 'sys' imported but unused
python/pyspark/tests/ F401 'resource' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_worker.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_profiler.*' imported but unused
python/pyspark/tests/ F401 'sys' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_shuffle.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_rddbarrier.*' imported but unused
python/pyspark/tests/ F401 'userlibrary.UserClass' imported but unused
python/pyspark/tests/ F401 'userlib.UserClass' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_context.*' imported but unused
python/pyspark/tests/ F401 'pyspark.tests.test_appsubmit.*' imported but unused
python/pyspark/streaming/ F401 'sys' imported but unused
python/pyspark/streaming/tests/ F401 'pyspark.RDD' imported but unused
python/pyspark/streaming/tests/ F401 'pyspark.streaming.tests.test_dstream.*' imported but unused
python/pyspark/streaming/tests/ F401 'pyspark.streaming.tests.test_kinesis.*' imported but unused
python/pyspark/streaming/tests/ F401 'pyspark.streaming.tests.test_listener.*' imported but unused
python/pyspark/streaming/tests/ F401 'pyspark.streaming.tests.test_context.*' imported but unused
python/pyspark/testing/ F401 'scipy.sparse' imported but unused
python/pyspark/testing/ F401 'numpy as np' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/ F401 'sys' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/ F401 'sys' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/ F401 '' imported but unused
python/pyspark/ml/tests/ F401 'sys' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 'pyspark.sql.functions as F' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 'sys' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 'py4j' imported but unused
python/pyspark/ml/tests/ F401 'pyspark.testing.mlutils.PySparkTestCase' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 'sys' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/tests/ F401 '*' imported but unused
python/pyspark/ml/param/ F401 'sys' imported but unused
python/pyspark/resource/tests/ F401 'random' imported but unused
python/pyspark/resource/tests/ F401 'pyspark.resource.ResourceProfile' imported but unused
python/pyspark/resource/tests/ F401 'pyspark.resource.tests.test_resources.*' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.udf.UserDefinedFunction' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.pandas.functions.pandas_udf' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.types.Row' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.types.StringType' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.Row' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.types.IntegerType' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.types.Row' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.types.StringType' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.udf.UDFRegistration' imported but unused
python/pyspark/sql/ F401 'pyspark.sql.Row' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_utils.*' imported but unused
python/pyspark/sql/tests/ F401 'sys' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.functions.pandas_udf' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.functions.PandasUDFType' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_pandas_map.*' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_catalog.*' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_group.*' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_session.*' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_conf.*' imported but unused
python/pyspark/sql/tests/ F401 'sys' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.functions.sum' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.functions.PandasUDFType' imported but unused
python/pyspark/sql/tests/ F401 'pandas.util.testing.assert_series_equal' imported but unused
python/pyspark/sql/tests/ F401 'pyarrow as pa' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_pandas_cogrouped_map.*' imported but unused
python/pyspark/sql/tests/ F401 'py4j' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_pandas_udf_typehints.*' imported but unused
python/pyspark/sql/tests/ F401 'sys' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.functions.exists' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_functions.*' imported but unused
python/pyspark/sql/tests/ F401 'sys' imported but unused
python/pyspark/sql/tests/ F401 'pyarrow as pa' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.tests.test_pandas_udf_window.*' imported but unused
python/pyspark/sql/tests/ F401 'pyarrow as pa' imported but unused
python/pyspark/sql/tests/ F401 'sys' imported but unused
python/pyspark/sql/tests/ F401 'pyarrow as pa' imported but unused
python/pyspark/sql/tests/ F401 'pyspark.sql.DataFrame' imported but unused
python/pyspark/sql/avro/ F401 'pyspark.sql.Row' imported but unused
python/pyspark/sql/pandas/ F401 'sys' imported but unused

fokkodriesprongFan spark % flake8 python | grep -i "imported but unused"
fokkodriesprongFan spark %

### What changes were proposed in this pull request?

Removing unused imports from the Python files to keep everything nice and tidy.

### Why are the changes needed?

Cleaning up of the imports that aren't used, and suppressing the imports that are used as references to other modules, preserving backward compatibility.

### Does this PR introduce _any_ user-facing change?


### How was this patch tested?

Adding the rule to the existing Flake8 checks.

Closes #29121 from Fokko/SPARK-32319.

Authored-by: Fokko Driesprong <>
Signed-off-by: Dongjoon Hyun <>
2020-08-08 08:51:57 -07:00
create-release [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
deps [SPARK-32490][BUILD] Upgrade netty-all to 4.1.51.Final 2020-08-02 16:46:11 -07:00
sparktestsupport [SPARK-32036] Replace references to blacklist/whitelist language with more appropriate terminology, excluding the blacklisting feature 2020-07-15 11:40:55 -05:00
tests [MINOR] Fix typos in dev/* scripts. 2018-01-31 07:37:25 +09:00
.gitignore [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*.py to .gitignore file. 2018-01-31 00:51:00 +09:00
.rat-excludes [SPARK-23431][CORE] Expose stage level peak executor metrics via REST API 2020-08-04 21:11:00 +08:00
.scalafmt.conf [SPARK-26177] Config change followup to [] Automated formatting for Scala code 2018-12-03 10:03:51 -06:00 [SPARK-26918][DOCS] All .md should have ASF license header 2019-03-30 19:49:45 -05:00
appveyor-install-dependencies.ps1 [SPARK-32231][R][INFRA] Use Hadoop 3.2 winutils in AppVeyor build 2020-07-09 17:18:39 +09:00 [SPARK-30012][CORE][SQL] Change classes extending scala collection classes to work with 2.13 2019-12-03 08:59:43 -08:00
check-license [MINOR][BUILD] Upgrade apache-rat to 0.13 2019-04-01 16:44:42 +09:00
checkstyle-suppressions.xml [SPARK-29674][CORE] Update dropwizard metrics to 4.1.x for JDK 9+ 2019-11-03 15:13:06 -08:00
checkstyle.xml [MINOR] Fix google style guide address 2019-12-12 11:04:01 -06:00 [SPARK-29802][BUILD] Use python3 in build scripts 2020-07-19 11:02:37 +09:00
lint-java [SPARK-23063][K8S] K8s changes for publishing scripts (and a couple of other misses) 2018-01-13 21:34:28 -08:00
lint-python [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
lint-r [SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors 2019-11-17 10:09:46 -08:00
lint-r.R [MINOR][R] small tidying of sh scripts for R 2020-04-30 16:58:05 -07:00
lint-scala [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00 [SPARK-31041][BUILD] Show Maven errors from within 2020-03-11 08:22:02 -05:00 [SPARK-29802][BUILD] Use python3 in build scripts 2020-07-19 11:02:37 +09:00
mima [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00 [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00 Merge pull request #565 from pwendell/dev-scripts. Closes #565. 2014-02-08 23:13:34 -08:00
requirements.txt [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base 2020-07-27 17:49:21 +09:00
run-pip-tests [SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)activation in pip packaging test 2020-07-25 13:09:23 +09:00
run-tests [SPARK-29672][PYSPARK] update spark testing framework to use python3 2019-11-14 10:18:55 -08:00
run-tests-jenkins [SPARK-29672][PYSPARK] update spark testing framework to use python3 2019-11-14 10:18:55 -08:00 [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00 [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
sbt-checkstyle [SPARK-27158][BUILD] dev/mima and dev/scalastyle support dynamic profiles 2019-03-15 08:20:42 +09:00
scalafmt [SPARK-30570][BUILD] Update scalafmt plugin to 1.0.3 with onlyChangedFiles feature 2020-01-23 12:44:43 -08:00
scalastyle Revert "[SPARK-30534][INFRA] Use mvn in dev/scalastyle" 2020-01-21 18:23:03 +09:00 [SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_PROFILES 2020-07-17 11:59:19 -05:00
tox.ini [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00

Spark Developer Scripts

This directory contains scripts useful to developers when packaging, testing, or committing to Spark.

Many of these scripts require Apache credentials to work correctly.