Commit graph

2 commits

Author SHA1 Message Date
HyukjinKwon 6fb22aa42d
[SPARK-31748][PYTHON] Document resource module in PySpark doc and rename/move classes
### What changes were proposed in this pull request?

This PR is kind of a followup for SPARK-29641 and SPARK-28234. This PR proposes:

1.. Document the new `pyspark.resource` module introduced at 95aec091e4, in PySpark API docs.

2.. Move classes into fewer and simpler modules

Before:

```
pyspark
├── resource
│   ├── executorrequests.py
│   │   ├── class ExecutorResourceRequest
│   │   └── class ExecutorResourceRequests
│   ├── taskrequests.py
│   │   ├── class TaskResourceRequest
│   │   └── class TaskResourceRequests
│   ├── resourceprofilebuilder.py
│   │   └── class ResourceProfileBuilder
│   ├── resourceprofile.py
│   │   └── class ResourceProfile
└── resourceinformation
    └── class ResourceInformation
```

After:

```
pyspark
└── resource
    ├── requests.py
    │   ├── class ExecutorResourceRequest
    │   ├── class ExecutorResourceRequests
    │   ├── class TaskResourceRequest
    │   └── class TaskResourceRequests
    ├── profile.py
    │   ├── class ResourceProfileBuilder
    │   └── class ResourceProfile
    └── information.py
        └── class ResourceInformation
```

3.. Minor docstring fix e.g.:

```diff
-     param name the name of the resource
-     param addresses an array of strings describing the addresses of the resource
+     :param name: the name of the resource
+     :param addresses: an array of strings describing the addresses of the resource
+
+     .. versionadded:: 3.0.0
```

### Why are the changes needed?

To document APIs, and move Python modules to fewer and simpler modules.

### Does this PR introduce _any_ user-facing change?

No, the changes are in unreleased branches.

### How was this patch tested?

Manually tested via:

```bash
cd python
./run-tests --python-executables=python3 --modules=pyspark-core
./run-tests --python-executables=python3 --modules=pyspark-resource
```

Closes #28569 from HyukjinKwon/SPARK-28234-SPARK-29641-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2020-05-19 17:09:37 -07:00
Thomas Graves 95aec091e4 [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
### What changes were proposed in this pull request?

As part of the Stage level scheduling features, add the Python api's to set resource profiles.
This also adds the functionality to properly apply the pyspark memory configuration when specified in the ResourceProfile. The pyspark memory configuration is being passed in the task local properties. This was an easy way to get it to the PythonRunner that needs it. I modeled this off how the barrier task scheduling is passing the addresses. As part of this I added in the JavaRDD api's because those are needed by python.

### Why are the changes needed?

python api for this feature

### Does this PR introduce any user-facing change?

Yes adds the java and python apis for user to specify a ResourceProfile to use stage level scheduling.

### How was this patch tested?

unit tests and manually tested on yarn. Tests also run to verify it errors properly on standalone and local mode where its not yet supported.

Closes #28085 from tgravescs/SPARK-29641-pr-base.

Lead-authored-by: Thomas Graves <tgraves@nvidia.com>
Co-authored-by: Thomas Graves <tgraves@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-04-23 10:20:39 +09:00