spark-instrumented-optimizer

History

Gabor Somogyi bd711863fd [SPARK-33629][PYTHON] Make spark.buffer.size configuration visible on driver side ### What changes were proposed in this pull request? `spark.buffer.size` not applied in driver from pyspark. In this PR I've fixed this issue. ### Why are the changes needed? Apply the mentioned config on driver side. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests + manually. Added the following code temporarily: ``` def local_connect_and_auth(port, auth_secret): ... sock.connect(sa) print("SPARK_BUFFER_SIZE: %d" % int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) <- This is the addition sockfile = sock.makefile("rwb", int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) ... ``` Test: ``` #Compile Spark echo "spark.buffer.size 10000" >> conf/spark-defaults.conf $ ./bin/pyspark Python 3.8.5 (default, Jul 21 2020, 10:48:26) [Clang 11.0.3 (clang-1103.0.32.62)] on darwin Type "help", "copyright", "credits" or "license" for more information. 20/12/03 13:38:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/12/03 13:38:14 WARN SparkEnv: I/O encryption enabled without RPC encryption: keys will be visible on the wire. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.1.0-SNAPSHOT /_/ Using Python version 3.8.5 (default, Jul 21 2020 10:48:26) Spark context Web UI available at http://192.168.0.189:4040 Spark context available as 'sc' (master = local[*], app id = local-1606999094506). SparkSession available as 'spark'. >>> sc.setLogLevel("TRACE") >>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect() ... SPARK_BUFFER_SIZE: 10000 ... [[0], [2], [3], [4], [6]] >>> ``` Closes #30592 from gaborgsomogyi/SPARK-33629. Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>		2020-12-04 01:37:44 +09:00
..
cloudpickle	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
ml	[SPARK-33636][PYTHON][ML][FOLLOWUP] Update since tag of labelsArray in StringIndexer	2020-12-03 14:34:44 +09:00
mllib	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
resource	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
sql	[SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests	2020-12-01 10:34:40 +09:00
streaming	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
testing	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
tests	[SPARK-33613][PYTHON][TESTS] Replace deprecated APIs in pyspark tests	2020-12-01 10:34:40 +09:00
__init__.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
__init__.pyi	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
_globals.py	[SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary	2018-02-09 14:21:10 +08:00
_typing.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
accumulators.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
accumulators.pyi	[SPARK-33002][PYTHON] Remove non-API annotations	2020-10-07 19:53:59 +09:00
broadcast.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
broadcast.pyi	[SPARK-33457][PYTHON] Adjust mypy configuration	2020-11-25 09:27:04 +09:00
conf.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
conf.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
context.py	[SPARK-33629][PYTHON] Make spark.buffer.size configuration visible on driver side	2020-12-04 01:37:44 +09:00
context.pyi	[SPARK-33457][PYTHON] Adjust mypy configuration	2020-11-25 09:27:04 +09:00
daemon.py	[SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon	2019-07-31 09:10:24 +09:00
files.py	[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation	2019-07-05 10:08:22 -07:00
files.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
find_spark_home.py	[SPARK-32017][PYTHON][BUILD] Make Pyspark Hadoop 3.2+ Variant available in PyPI	2020-09-23 09:30:51 +09:00
install.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
java_gateway.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
join.py	[SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo…	2016-03-28 14:51:36 -07:00
profiler.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
profiler.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
py.typed	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
rdd.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
rdd.pyi	[SPARK-33457][PYTHON] Adjust mypy configuration	2020-11-25 09:27:04 +09:00
rddsampler.py	[SPARK-4897] [PySpark] Python 3 support	2015-04-16 16:20:57 -07:00
resultiterable.py	[SPARK-32138] Drop Python 2.7, 3.4 and 3.5	2020-07-14 11:22:44 +09:00
resultiterable.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
serializers.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
shell.py	[SPARK-33363] Add prompt information related to the current task when pyspark/sparkR starts	2020-11-10 11:12:19 +09:00
shuffle.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00
statcounter.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
statcounter.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
status.py	[SPARK-4172] [PySpark] Progress API in Python	2015-02-17 13:36:43 -08:00
status.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
storagelevel.py	[SPARK-31448][PYTHON] Fix storage level used in persist() in dataframe.py	2020-09-15 08:41:22 -05:00
storagelevel.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
taskcontext.py	[SPARK-33254][PYTHON][DOCS] Migration to NumPy documentation style in Core (pyspark., pyspark.resource., etc.)	2020-11-16 10:21:50 +09:00
taskcontext.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
traceback_utils.py	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	2014-09-15 19:28:17 -07:00
util.py	[SPARK-33407][PYTHON] Simplify the exception message from Python UDFs (disabled by default)	2020-11-17 14:15:31 +09:00
version.py	[SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT	2020-02-25 19:44:31 -08:00
version.pyi	[SPARK-32714][PYTHON] Initial pyspark-stubs port	2020-09-24 14:15:36 +09:00
worker.py	Spelling r common dev mlib external project streaming resource managers python	2020-11-27 10:22:45 -06:00