spark-instrumented-optimizer

History

Bruce Robbins 034913b62b [SPARK-23936][SQL] Implement map_concat ## What changes were proposed in this pull request? Implement map_concat high order function. This implementation does not pick a winner when the specified maps have overlapping keys. Therefore, this implementation preserves existing duplicate keys in the maps and potentially introduces new duplicates (After discussion with ueshin, we settled on option 1 from [here](https://issues.apache.org/jira/browse/SPARK-23936?focusedCommentId=16464245&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16464245)). ## How was this patch tested? New tests Manual tests Run all sbt SQL tests Run all pyspark sql tests Author: Bruce Robbins <bersprockets@gmail.com> Closes #21073 from bersprockets/SPARK-23936.		2018-07-09 21:21:38 +09:00
..
ml	[SPARK-24740][PYTHON][ML] Make PySpark's tests compatible with NumPy 1.14+	2018-07-07 11:39:29 +08:00
mllib	[SPARK-24740][PYTHON][ML] Make PySpark's tests compatible with NumPy 1.14+	2018-07-07 11:39:29 +08:00
sql	[SPARK-23936][SQL] Implement map_concat	2018-07-09 21:21:38 +09:00
streaming	[SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame	2018-06-19 13:56:51 -07:00
__init__.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
_globals.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
accumulators.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
broadcast.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
cloudpickle.py	[SPARK-24303][PYTHON] Update cloudpickle to v0.4.4	2018-05-18 09:53:24 -07:00
conf.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
context.py	[SPARK-21945][YARN][PYTHON] Make --py-files work with PySpark shell in Yarn client mode	2018-05-17 12:07:58 +08:00
daemon.py	[PYSPARK] Update py4j to version 0.10.7.	2018-05-09 10:47:35 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
find_spark_home.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
heapq3.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
java_gateway.py	[SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame	2018-06-19 13:56:51 -07:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-23522][PYTHON] always use sys.exit over builtin exit	2018-03-08 20:38:34 +09:00
rdd.py	[SPARK-24739][PYTHON] Make PySpark compatible with Python 3.7	2018-07-07 11:37:41 +08:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[PYTHON] Fix typo in serializer exception	2018-06-15 16:59:00 +08:00
shell.py	[SPARK-16451][REPL] Fail shell if SparkSession fails to start.	2018-06-05 08:29:29 +07:00
shuffle.py	[SPARK-23754][PYTHON] Re-raising StopIteration in client code	2018-05-30 18:11:33 +08:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api	2016-04-12 23:06:55 -07:00
taskcontext.py	[SPARK-24397][PYSPARK] Added TaskContext.getLocalProperty(key) in Python	2018-05-31 11:23:57 -07:00
tests.py	[SPARK-24396][SS][PYSPARK] Add Structured Streaming ForeachWriter for python	2018-06-15 12:56:39 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from driver to executor	2018-06-11 10:15:42 +08:00
version.py	[SPARK-23028] Bump master branch version to 2.4.0-SNAPSHOT	2018-01-13 00:37:59 +08:00
worker.py	[SPARK-24324][PYTHON] Pandas Grouped Map UDF should assign result columns by name	2018-06-24 09:28:46 +08:00