spark-instrumented-optimizer

History

Adrian Petrescu 4a426ff8ae [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext ## What changes were proposed in this pull request? The Scala version of `SparkContext` has a handy field called `uiWebUrl` that tells you which URL the SparkUI spawned by that instance lives at. This is often very useful because the value for `spark.ui.port` in the config is only a suggestion; if that port number is taken by another Spark instance on the same machine, Spark will just keep incrementing the port until it finds a free one. So, on a machine with a lot of running PySpark instances, you often have to start trying all of them one-by-one until you find your application name. Scala users have a way around this with `uiWebUrl` but Java and Python users do not. This pull request fixes this in the most straightforward way possible, simply propagating this field through the `JavaSparkContext` and into pyspark through the Java gateway. Please let me know if any additional documentation/testing is needed. ## How was this patch tested? Existing tests were run to make sure there were no regressions, and a binary distribution was created and tested manually for the correct value of `sc.uiWebPort` in a variety of circumstances. Author: Adrian Petrescu <apetresc@gmail.com> Closes #15000 from apetresc/pyspark-uiweburl.		2016-09-20 10:49:02 +01:00
..
ml	[SPARK-17389][FOLLOW-UP][ML] Change KMeans k-means\|\| default init steps from 5 to 2.	2016-09-11 13:47:13 +01:00
mllib	[SPARK-17548][MLLIB] Word2VecModel.findSynonyms no longer spuriously rejects the best match when invoked with a vector	2016-09-17 12:49:58 +01:00
sql	[SPARK-17100] [SQL] fix Python udf in filter on top of outer join	2016-09-19 13:24:16 -07:00
streaming	[SPARK-16950] [PYSPARK] fromOffsets parameter support in KafkaUtils.createDirectStream for python3	2016-08-09 09:44:43 -07:00
__init__.py	[SPARK-14555] First cut of Python API for Structured Streaming	2016-04-20 10:32:01 -07:00
accumulators.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
broadcast.py	[SPARK-17472] [PYSPARK] Better error message for serialization failures of large objects in Python	2016-09-14 13:37:35 -07:00
cloudpickle.py	[SPARK-17472] [PYSPARK] Better error message for serialization failures of large objects in Python	2016-09-14 13:37:35 -07:00
conf.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
context.py	[SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspark.SparkContext	2016-09-20 10:49:02 +01:00
daemon.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
files.py	[SPARK-3309] [PySpark] Put all public API in __all__	2014-09-03 11:49:45 -07:00
heapq3.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
java_gateway.py	[SPARK-15364][ML][PYSPARK] Implement PySpark picklers for ml.Vector and ml.Matrix under spark.ml.python	2016-06-13 19:59:53 -07:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-8652] [PYSPARK] Check return value for all uses of doctest.testmod()	2015-06-26 08:12:22 -07:00
rdd.py	[DOC] improve python doc for rdd.histogram and dataframe.join	2016-07-18 23:49:47 -07:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-3074] [PySpark] support groupByKey() with single huge key	2015-04-09 17:07:23 -07:00
serializers.py	[SPARK-10542] [PYSPARK] fix serialize namedtuple	2015-09-14 19:46:34 -07:00
shell.py	[SPARK-16536][SQL][PYSPARK][MINOR] Expose `sql` in PySpark Shell	2016-07-13 22:24:26 -07:00
shuffle.py	[SPARK-10710] Remove ability to disable spilling in core and SQL	2015-09-19 21:40:21 -07:00
statcounter.py	[SPARK-6919] [PYSPARK] Add asDict method to StatCounter	2015-09-29 13:38:15 -07:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
storagelevel.py	[SPARK-13992][CORE][PYSPARK][FOLLOWUP] Update OFF_HEAP semantics for Java api and Python api	2016-04-12 23:06:55 -07:00
tests.py	[SPARK-16224] [SQL] [PYSPARK] SparkSession builder's configs need to be set to the existing Scala SparkContext's SparkConf	2016-06-28 07:54:44 -07:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
worker.py	[SPARK-14267] [SQL] [PYSPARK] execute multiple Python UDFs within single batch	2016-03-31 16:40:20 -07:00