[SPARK-25270] lint-python: Add flake8 to find syntax errors and undefined names
## What changes were proposed in this pull request?
Add [flake8](http://flake8.pycqa.org) tests to find Python syntax errors and undefined names.
__E901,E999,F821,F822,F823__ are the "_showstopper_" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. Most other flake8 issues are merely "style violations" -- useful for readability but they do not effect runtime safety.
* F821: undefined name `name`
* F822: undefined name `name` in `__all__`
* F823: local variable name referenced before assignment
* E901: SyntaxError or IndentationError
* E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree
## How was this patch tested?
$ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__
$ __flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics__
Please review http://spark.apache.org/contributing.html before opening a pull request.
Closes #22266 from cclauss/patch-3.
Authored-by: cclauss <cclauss@bluewin.ch>
Signed-off-by: Holden Karau <holden@pigscanfly.ca>
2018-09-07 12:35:25 -04:00
|
|
|
flake8==3.5.0
|
[MINOR][INFRA] Upgrade Jira client to 2.0.0
### What changes were proposed in this pull request?
SPARK-10498 added the initial Jira client requirement with 1.0.3 five year ago (2016 January). As of today, it causes `dev/merge_spark_pr.py` failure with `Python 3.9.4` due to this old dependency. This PR aims to upgrade it to the latest version, 2.0.0. The latest version is also a little old (2018 July).
- https://pypi.org/project/jira/#history
### Why are the changes needed?
`Jira==2.0.0` works well with both Python 3.8/3.9 while `Jira==1.0.3` fails with Python 3.9.
**BEFORE**
```
$ pyenv global 3.9.4
$ pip freeze | grep jira
jira==1.0.3
$ dev/merge_spark_pr.py
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/dev/merge_spark_pr.py", line 39, in <module>
import jira.client
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/__init__.py", line 5, in <module>
from .config import get_jira
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/config.py", line 17, in <module>
from .client import JIRA
File "/Users/dongjoon/.pyenv/versions/3.9.4/lib/python3.9/site-packages/jira/client.py", line 165
validate=False, get_server_info=True, async=False, logging=True, max_retries=3):
^
SyntaxError: invalid syntax
```
**AFTER**
```
$ pip install jira==2.0.0
$ dev/merge_spark_pr.py
git rev-parse --abbrev-ref HEAD
Which pull request would you like to merge? (e.g. 34):
```
### Does this PR introduce _any_ user-facing change?
No. This is a committer-only script.
### How was this patch tested?
Manually.
Closes #32215 from dongjoon-hyun/jira.
Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2021-04-17 22:34:24 -04:00
|
|
|
jira==2.0.0
|
2016-01-24 14:48:28 -05:00
|
|
|
PyGithub==1.26.0
|
2018-07-10 01:54:04 -04:00
|
|
|
sphinx
|
2020-07-27 04:49:21 -04:00
|
|
|
pydata_sphinx_theme
|
[SPARK-32204][SPARK-32182][DOCS] Add a quickstart page with Binder integration in PySpark documentation
### What changes were proposed in this pull request?
This PR proposes to:
- add a notebook with a Binder integration which allows users to try PySpark in a live notebook. Please [try this here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb).
- reuse this notebook as a quickstart guide in PySpark documentation.
Note that Binder turns a Git repo into a collection of interactive notebooks. It works based on Docker image. Once somebody builds, other people can reuse the image against a specific commit.
Therefore, if we run Binder with the images based on released tags in Spark, virtually all users can instantly launch the Jupyter notebooks.
<br/>
I made a simple demo to make it easier to review. Please see:
- [Main page](https://hyukjin-spark.readthedocs.io/en/stable/). Note that the link ("Live Notebook") in the main page wouldn't work since this PR is not merged yet.
- [Quickstart page](https://hyukjin-spark.readthedocs.io/en/stable/getting_started/quickstart.html)
<br/>
When reviewing the notebook file itself, please give my direct feedback which I will appreciate and address.
Another way might be:
- open [here](https://mybinder.org/v2/gh/HyukjinKwon/spark/SPARK-32204?filepath=python%2Fdocs%2Fsource%2Fgetting_started%2Fquickstart.ipynb).
- edit / change / update the notebook. Please feel free to change as whatever you want. I can apply as are or slightly update more when I apply to this PR.
- download it as a `.ipynb` file:
![Screen Shot 2020-08-20 at 10 12 19 PM](https://user-images.githubusercontent.com/6477701/90774311-3e38c800-e332-11ea-8476-699a653984db.png)
- upload the `.ipynb` file here in a GitHub comment. Then, I will push a commit with that file with crediting correctly, of course.
- alternatively, push a commit into this PR right away if that's easier for you (if you're a committer).
References:
- https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html
- https://databricks.com/jp/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html - my own blog post .. :-) and https://koalas.readthedocs.io/en/latest/getting_started/10min.html
### Why are the changes needed?
To improve PySpark's usability. The current quickstart for Python users are very friendly.
### Does this PR introduce _any_ user-facing change?
Yes, it will add a documentation page, and expose a live notebook to PySpark users.
### How was this patch tested?
Manually tested, and GitHub Actions builds will test.
Closes #29491 from HyukjinKwon/SPARK-32204.
Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-08-25 23:23:24 -04:00
|
|
|
ipython
|
|
|
|
nbsphinx
|
2020-10-27 01:03:57 -04:00
|
|
|
numpydoc
|