spark-instrumented-optimizer

History

hyukjinkwon 07fd68a29f [SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R ## What changes were proposed in this pull request? This PR proposes to add a wrapper for `unionByName` API to R and Python as well. Python ```python df1 = spark.createDataFrame([[1, 2, 3]], ["col0", "col1", "col2"]) df2 = spark.createDataFrame([[4, 5, 6]], ["col1", "col2", "col0"]) df1.unionByName(df2).show() ``` ``` +----+----+----+ \|col0\|col1\|col3\| +----+----+----+ \| 1\| 2\| 3\| \| 6\| 4\| 5\| +----+----+----+ ``` R ```R df1 <- select(createDataFrame(mtcars), "carb", "am", "gear") df2 <- select(createDataFrame(mtcars), "am", "gear", "carb") head(unionByName(limit(df1, 2), limit(df2, 2))) ``` ``` carb am gear 1 4 1 4 2 4 1 4 3 4 1 4 4 4 1 4 ``` ## How was this patch tested? Doctests for Python and unit test added in `test_sparkSQL.R` for R. Author: hyukjinkwon <gurwls223@gmail.com> Closes #19105 from HyukjinKwon/unionByName-r-python.		2017-09-03 21:03:21 +09:00
..
ml	[SPARK-12664][ML] Expose probability in mlp model	2017-08-22 21:16:34 -07:00
mllib	[SPARK-20862][MLLIB][PYTHON] Avoid passing float to ndarray.reshape in LogisticRegressionModel	2017-05-24 22:55:38 +08:00
sql	[SPARK-21897][PYTHON][R] Add unionByName API to DataFrame in Python and R	2017-09-03 21:03:21 +09:00
streaming	[SPARK-20285][TESTS] Increase the pyspark streaming test timeout to 30 seconds	2017-04-10 14:06:49 -07:00
__init__.py	[MINOR] Fix some typo of the document	2017-06-19 20:35:58 +01:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
cloudpickle.py	[SPARK-21070][PYSPARK] Attempt to update cloudpickle again	2017-08-22 11:17:53 +09:00
conf.py	[SPARK-18447][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note that` across Python API documentation	2016-11-22 11:40:18 +00:00
context.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
find_spark_home.py	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	2016-11-16 14:22:15 -08:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed	2016-11-16 14:22:15 -08:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-21551][PYTHON] Increase timeout for PythonRDD.serveIterator	2017-08-09 14:03:18 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-13534][PYSPARK] Using Apache Arrow to increase performance of DataFrame.toPandas	2017-07-10 15:21:03 -07:00
shell.py	[SPARK-19570][PYSPARK] Allow to disable hive in pyspark shell	2017-04-12 10:54:50 -07:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api	2016-04-12 23:06:55 -07:00
taskcontext.py	[SPARK-18576][PYTHON] Add basic TaskContext information to PySpark	2016-12-20 15:51:21 -08:00
tests.py	[SPARK-12717][PYTHON] Adding thread-safe broadcast pickle registry	2017-08-02 07:12:23 +09:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-19505][PYTHON] AttributeError on Exception.message in Python3	2017-04-11 12:18:31 -07:00
version.py	[MINOR] Bump SparkR and PySpark version to 2.3.0.	2017-06-19 11:13:03 +01:00
worker.py	[SPARK-20685] Fix BatchPythonEvaluation bug in case of single UDF w/ repeated arg.	2017-05-10 16:50:57 -07:00