spark-instrumented-optimizer

History

itholic 8bd3770552 [SPARK-32798][PYTHON] Make unionByName optionally fill missing columns with nulls in PySpark ### What changes were proposed in this pull request? This PR proposes to add new argument `allowMissingColumns` to `unionByName` for allowing users to specify whether to allow missing columns or not. ### Why are the changes needed? To expose `allowMissingColumns` argument in Python API also. Currently this is only exposed in Scala/Java APIs. ### Does this PR introduce _any_ user-facing change? Yes, it adds a new examples with new argument in the docstring. ### How was this patch tested? Doctest added and manually tested ``` $ python/run-tests --testnames pyspark.sql.dataframe Running PySpark tests. Output is in /.../spark/python/unit-tests.log Will test against the following Python executables: ['/.../python3', 'python3.8'] Will test the following Python tests: ['pyspark.sql.dataframe'] /.../python3 python_implementation is CPython /.../python3 version is: Python 3.8.5 python3.8 python_implementation is CPython python3.8 version is: Python 3.8.5 Starting test(/.../python3): pyspark.sql.dataframe Starting test(python3.8): pyspark.sql.dataframe Finished test(python3.8): pyspark.sql.dataframe (35s) Finished test(/.../python3): pyspark.sql.dataframe (35s) Tests passed in 35 seconds ``` Closes #29657 from itholic/SPARK-32798. Authored-by: itholic <haejoon309@naver.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-09-08 09:41:02 +09:00
..
cloudpickle	[SPARK-32094][PYTHON] Update cloudpickle to v1.5.0	2020-07-17 11:49:18 +09:00
ml	[SPARK-32719][PYTHON] Add Flake8 check missing imports	2020-08-31 11:23:31 +09:00
mllib	[SPARK-32719][PYTHON] Add Flake8 check missing imports	2020-08-31 11:23:31 +09:00
resource	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
sql	[SPARK-32798][PYTHON] Make unionByName optionally fill missing columns with nulls in PySpark	2020-09-08 09:41:02 +09:00
streaming	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
testing	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
tests	[SPARK-32138][FOLLOW-UP] Drop obsolete StringIO import branching	2020-08-31 16:56:50 +09:00
__init__.py	[SPARK-32719][PYTHON] Add Flake8 check missing imports	2020-08-31 11:23:31 +09:00
_globals.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
accumulators.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
broadcast.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
conf.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
context.py	[SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors	2020-08-04 12:45:06 +09:00
daemon.py	[SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon	2019-07-31 09:10:24 +09:00
files.py	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
find_spark_home.py	[SPARK-29802][BUILD] Use python3 in build scripts	2020-07-19 11:02:37 +09:00
java_gateway.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis	2019-01-17 19:40:39 -06:00
rdd.py	[SPARK-32319][PYSPARK] Disallow the use of unused imports	2020-08-08 08:51:57 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
serializers.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
shell.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
shuffle.py	[SPARK-32435][PYTHON] Remove heapq3 port from Python 3	2020-07-27 20:10:13 +09:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3	2020-08-10 07:33:06 -07:00
taskcontext.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode	2020-07-30 10:15:25 +09:00
version.py	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00
worker.py	[MINOR][PYTHON] Fix spacing in error message	2020-07-28 11:22:18 +09:00