spark-instrumented-optimizer

History

Andrew Or 89addd40ab [SPARK-14945][PYTHON] SparkSession Python API ## What changes were proposed in this pull request? ``` Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT /_/ Using Python version 2.7.5 (default, Mar 9 2014 22:15:05) SparkSession available as 'spark'. >>> spark <pyspark.sql.session.SparkSession object at 0x101f3bfd0> >>> spark.sql("SHOW TABLES").show() ... +---------+-----------+ \|tableName\|isTemporary\| +---------+-----------+ \| src\| false\| +---------+-----------+ >>> spark.range(1, 10, 2).show() +---+ \| id\| +---+ \| 1\| \| 3\| \| 5\| \| 7\| \| 9\| +---+ ``` Note: This API is NOT complete in its current state. In particular, for now I left out the `conf` and `catalog` APIs, which were added later in Scala. These will be added later before 2.0. ## How was this patch tested? Python tests. Author: Andrew Or <andrew@databricks.com> Closes #12746 from andrewor14/python-spark-session.		2016-04-28 10:55:48 -07:00
..
ml	[SPARK-14899][ML][PYSPARK] Remove spark.ml HashingTF hashingAlg option	2016-04-27 14:08:26 -07:00
mllib	[SPARK-9656][MLLIB][PYTHON] Add missing methods to PySpark's Distributed Linear Algebra Classes	2016-04-27 19:48:05 +02:00
sql	[SPARK-14945][PYTHON] SparkSession Python API	2016-04-28 10:55:48 -07:00
streaming	[SPARK-13579][BUILD] Stop building the main Spark assembly.	2016-04-04 16:52:22 -07:00
__init__.py	[SPARK-14555] First cut of Python API for Structured Streaming	2016-04-20 10:32:01 -07:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-14418][PYSPARK] fix unpersist of Broadcast in Python	2016-04-06 10:46:34 -07:00
cloudpickle.py	[SPARK-13697] [PYSPARK] Fix the missing module name of TransformFunctionSerializer.loads	2016-03-06 08:57:01 -08:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[SPARK-13687][PYTHON] Cleanup PySpark parallelize temporary files	2016-04-10 02:34:54 +01:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-9700] Pick default page size more intelligently.	2015-08-06 23:18:29 -07:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[SPARK-14368][PYSPARK] Support python.spark.worker.memory with upper-case unit.	2016-04-05 12:19:20 +09:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
shell.py	[SPARK-14945][PYTHON] SparkSession Python API	2016-04-28 10:55:48 -07:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api	2016-04-12 23:06:55 -07:00
tests.py	[SPARK-13687][PYTHON] Cleanup PySpark parallelize temporary files	2016-04-10 02:34:54 +01:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-14267] [SQL] [PYSPARK] execute multiple Python UDFs within single batch	2016-03-31 16:40:20 -07:00