spark-instrumented-optimizer/python/pyspark
itholic 8bd3770552 [SPARK-32798][PYTHON] Make unionByName optionally fill missing columns with nulls in PySpark
### What changes were proposed in this pull request?

This PR proposes to add new argument `allowMissingColumns` to `unionByName` for allowing users to specify whether to allow missing columns or not.

### Why are the changes needed?

To expose `allowMissingColumns` argument in Python API also. Currently this is only exposed in Scala/Java APIs.

### Does this PR introduce _any_ user-facing change?

Yes, it adds a new examples with new argument in the docstring.

### How was this patch tested?

Doctest added and manually tested

```
$ python/run-tests --testnames pyspark.sql.dataframe
Running PySpark tests. Output is in /.../spark/python/unit-tests.log
Will test against the following Python executables: ['/.../python3', 'python3.8']
Will test the following Python tests: ['pyspark.sql.dataframe']
/.../python3 python_implementation is CPython
/.../python3 version is: Python 3.8.5
python3.8 python_implementation is CPython
python3.8 version is: Python 3.8.5
Starting test(/.../python3): pyspark.sql.dataframe
Starting test(python3.8): pyspark.sql.dataframe
Finished test(python3.8): pyspark.sql.dataframe (35s)
Finished test(/.../python3): pyspark.sql.dataframe (35s)
Tests passed in 35 seconds
```

Closes #29657 from itholic/SPARK-32798.

Authored-by: itholic <haejoon309@naver.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-09-08 09:41:02 +09:00
..
cloudpickle [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0 2020-07-17 11:49:18 +09:00
ml [SPARK-32719][PYTHON] Add Flake8 check missing imports 2020-08-31 11:23:31 +09:00
mllib [SPARK-32719][PYTHON] Add Flake8 check missing imports 2020-08-31 11:23:31 +09:00
resource [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
sql [SPARK-32798][PYTHON] Make unionByName optionally fill missing columns with nulls in PySpark 2020-09-08 09:41:02 +09:00
streaming [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
testing [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
tests [SPARK-32138][FOLLOW-UP] Drop obsolete StringIO import branching 2020-08-31 16:56:50 +09:00
__init__.py [SPARK-32719][PYTHON] Add Flake8 check missing imports 2020-08-31 11:23:31 +09:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
broadcast.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
conf.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
context.py [SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to switch allow/disallow SparkContext in executors 2020-08-04 12:45:06 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
find_spark_home.py [SPARK-29802][BUILD] Use python3 in build scripts 2020-07-19 11:02:37 +09:00
java_gateway.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-32319][PYSPARK] Disallow the use of unused imports 2020-08-08 08:51:57 -07:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resultiterable.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
serializers.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
shell.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
shuffle.py [SPARK-32435][PYTHON] Remove heapq3 port from Python 3 2020-07-27 20:10:13 +09:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3 2020-08-10 07:33:06 -07:00
taskcontext.py [SPARK-32138] Drop Python 2.7, 3.4 and 3.5 2020-07-14 11:22:44 +09:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-32010][PYTHON][CORE] Add InheritableThread for local properties and fixing a thread leak issue in pinned thread mode 2020-07-30 10:15:25 +09:00
version.py [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
worker.py [MINOR][PYTHON] Fix spacing in error message 2020-07-28 11:22:18 +09:00