spark-instrumented-optimizer/python/pyspark
Thomas Graves 95aec091e4 [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
### What changes were proposed in this pull request?

As part of the Stage level scheduling features, add the Python api's to set resource profiles.
This also adds the functionality to properly apply the pyspark memory configuration when specified in the ResourceProfile. The pyspark memory configuration is being passed in the task local properties. This was an easy way to get it to the PythonRunner that needs it. I modeled this off how the barrier task scheduling is passing the addresses. As part of this I added in the JavaRDD api's because those are needed by python.

### Why are the changes needed?

python api for this feature

### Does this PR introduce any user-facing change?

Yes adds the java and python apis for user to specify a ResourceProfile to use stage level scheduling.

### How was this patch tested?

unit tests and manually tested on yarn. Tests also run to verify it errors properly on standalone and local mode where its not yet supported.

Closes #28085 from tgravescs/SPARK-29641-pr-base.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-23 10:20:39 +09:00
..
ml [SPARK-31243][ML][PYSPARK] Add ANOVATest and FValueTest to PySpark 2020-03-27 14:05:49 +08:00
mllib [SPARK-30930][ML] Remove ML/MLLIB DeveloperApi annotations 2020-03-16 12:41:22 -05:00
resource [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
sql [SPARK-31441] Support duplicated column names for toPandas with arrow execution 2020-04-14 14:08:56 +09:00
streaming [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items deprecated in Spark 2.2.0 or earlier, for Spark 3 2019-09-09 10:19:40 -05:00
testing [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
tests [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
__init__.py [SPARK-31088][SQL] Add back HiveContext and createExternalTable 2020-03-26 23:51:15 -07:00
_globals.py [SPARK-23328][PYTHON] Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary 2018-02-09 14:21:10 +08:00
accumulators.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
broadcast.py [SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0 2019-10-03 19:20:51 +09:00
cloudpickle.py [SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8 2019-10-22 16:18:34 +09:00
conf.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
context.py [SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation for pinned thread mode 2019-11-21 10:54:01 +09:00
daemon.py [SPARK-26175][PYTHON] Redirect the standard input of the forked child to devnull in daemon 2019-07-31 09:10:24 +09:00
files.py [SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation 2019-07-05 10:08:22 -07:00
find_spark_home.py [SPARK-31382][BUILD] Show a better error message for different python and pip installation mistake 2020-04-09 11:04:35 +09:00
heapq3.py Fix typos detected by github.com/client9/misspell 2018-08-11 21:23:36 -05:00
java_gateway.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
join.py [SPARK-14202] [PYTHON] Use generator expression instead of list comp in python_full_outer_jo… 2016-03-28 14:51:36 -07:00
profiler.py [SPARK-26640][CORE][ML][SQL][STREAMING][PYSPARK] Code cleanup from lgtm.com analysis 2019-01-17 19:40:39 -06:00
rdd.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
rddsampler.py [SPARK-4897] [PySpark] Python 3 support 2015-04-16 16:20:57 -07:00
resourceinformation.py [SPARK-28234][CORE][PYTHON] Add python and JavaSparkContext support to get resources 2019-07-11 09:32:58 +09:00
resultiterable.py [SPARK-30205][PYSPARK] Import ABCs from collections.abc to remove deprecation warnings 2019-12-10 11:08:13 -08:00
serializers.py [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package 2020-01-09 10:22:50 +09:00
shell.py [SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4 2018-09-13 11:19:43 +08:00
shuffle.py [SPARK-25696] The storage memory displayed on spark Application UI is… 2018-12-10 18:27:01 -06:00
statcounter.py [SPARK-6919] [PYSPARK] Add asDict method to StatCounter 2015-09-29 13:38:15 -07:00
status.py [SPARK-4172] [PySpark] Progress API in Python 2015-02-17 13:36:43 -08:00
storagelevel.py [SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3 2018-11-07 22:48:50 -06:00
taskcontext.py [SPARK-31344][CORE] Polish implementation of barrier() and allGather() 2020-04-16 21:23:32 -07:00
traceback_utils.py [SPARK-1087] Move python traceback utilities into new traceback_utils.py file. 2014-09-15 19:28:17 -07:00
util.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00
version.py [SPARK-30950][BUILD] Setting version to 3.1.0-SNAPSHOT 2020-02-25 19:44:31 -08:00
worker.py [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests 2020-04-23 10:20:39 +09:00