Commit graph

8 commits

Author SHA1 Message Date
hyukjinkwon 46b2126024
[SPARK-19002][BUILD][PYTHON] Check pep8 against all Python scripts
## What changes were proposed in this pull request?

This PR proposes to check pep8 against all other Python scripts and fix the errors as below:

```bash
./dev/create-release/generate-contributors.py
./dev/create-release/releaseutils.py
./dev/create-release/translate-contributors.py
./dev/lint-python
./python/docs/epytext.py
./examples/src/main/python/mllib/decision_tree_classification_example.py
./examples/src/main/python/mllib/decision_tree_regression_example.py
./examples/src/main/python/mllib/gradient_boosting_classification_example.py
./examples/src/main/python/mllib/gradient_boosting_regression_example.py
./examples/src/main/python/mllib/linear_regression_with_sgd_example.py
./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
./examples/src/main/python/mllib/naive_bayes_example.py
./examples/src/main/python/mllib/random_forest_classification_example.py
./examples/src/main/python/mllib/random_forest_regression_example.py
./examples/src/main/python/mllib/svm_with_sgd_example.py
./examples/src/main/python/streaming/network_wordjoinsentiments.py
./sql/hive/src/test/resources/data/scripts/cat.py
./sql/hive/src/test/resources/data/scripts/cat_error.py
./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py
./sql/hive/src/test/resources/data/scripts/dumpdata_script.py
./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py
./sql/hive/src/test/resources/data/scripts/escapednewline.py
./sql/hive/src/test/resources/data/scripts/escapedtab.py
./sql/hive/src/test/resources/data/scripts/input20_script.py
./sql/hive/src/test/resources/data/scripts/newline.py
```

## How was this patch tested?

- `./python/docs/epytext.py`

  ```bash
  cd ./python/docs $$ make html
  ```

- pep8 check (Python 2.7 / Python 3.3.6)

  ```
  ./dev/lint-python
  ```

- `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working)

  ```bash
  python -m doctest -v ./dev/merge_spark_pr.py
  ```

- `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working)

  ```bash
  python generate-contributors.py
  python translate-contributors.py
  ```

- Examples (Python 2.7 / Python 3.3.6)

  ```bash
  ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py
  ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p
  ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py
  ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py
  ```

- Examples (Python 2.7 only / Python 3.3.6 not working)
  ```
  ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py
  ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py
  ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py
  ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py
  ```

- `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes)

  Manually tested only changed ones.

- `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working)

  Manually tested this after disabling actually adding comments and links.

And also via Jenkins tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #16405 from HyukjinKwon/minor-pep8.
2017-01-02 15:23:19 +00:00
Reynold Xin ae74c3fa84 [RELEASE] Add more contributors & only show names in release notes.
Author: Reynold Xin <rxin@databricks.com>

Closes #8660 from rxin/contrib.
2015-09-08 17:36:00 -07:00
Andrew Or 4e1112e7b0 [Release] Update contributors list format and sort it
Additionally, we now warn the user when a duplicate author name
arises, in which case he/she needs to resolve it manually.
2014-12-16 22:14:18 -08:00
Andrew Or b85044ecfa [Release] Cache known author translations locally
This bypasses unnecessary calls to the Github and JIRA API.
Additionally, having a local cache allows us to remember names
that we had to manually discover ourselves.
2014-12-16 19:28:43 -08:00
Andrew Or 6f80b749e0 [Release] Major improvements to generate contributors script
This commit introduces several major improvements to the script
that generates the contributors list for release notes, notably:

(1) Use release tags instead of a range of commits. Across branches,
commits are not actually strictly two-dimensional, and so it is not
sufficient to specify a start hash and an end hash. Otherwise, we
end up counting commits that were already merged in an older branch.

(2) Match PR numbers in addition to commit hashes. This is related
to the first point in that if a PR is already merged in an older
minor release tag, it should be filtered out here. This requires us
to do some intelligent regex parsing on the commit description in
addition to just relying on the GitHub API.

(3) Relax author validity check. The old code fails on a name that
has many middle names, for instance. The test was just too strict.

(4) Use GitHub authentication. This allows us to make far more
requests through the GitHub API than before (5000 as opposed to 60
per hour).

(5) Translate from Github username, not commit author name. This is
important because the commit author name is not always configured
correctly by the user. For instance, the username "falaki" used to
resolve to just "Hossein", which was treated as a github username
and translated to something else that is completely arbitrary.

(6) Add an option to use the untranslated name. If there is not
a satisfactory candidate to replace the untranslated name with,
at least allow the user to not translate it.
2014-12-16 17:55:27 -08:00
Andrew Or a4dfb4efef [Release] Correctly translate contributors name in release notes
This commit involves three main changes:

(1) It separates the translation of contributor names from the
generation of the contributors list. This is largely motivated
by the Github API limit; even if we exceed this limit, we should
at least be able to proceed manually as before. This is why the
translation logic is abstracted into its own script
translate-contributors.py.

(2) When we look for candidate replacements for invalid author
names, we should look for the assignees of the associated JIRAs
too. As a result, the intermediate file must keep track of these.

(3) This provides an interactive mode with which the user can
sit at the terminal and manually pick the candidate replacement
that he/she thinks makes the most sense. As before, there is a
non-interactive mode that picks the first candidate that the
script considers "valid."

TODO: We should have a known_contributors file that stores
known mappings so we don't have to go through all of this
translation every time. This is also valuable because some
contributors simply cannot be automatically translated.
2014-12-03 19:10:07 -08:00
Andrew Or 5da21f07d8 [Release] Translate unknown author names automatically 2014-12-02 16:36:12 -08:00
Andrew Or c86e9bc4fd [Release] Automate generation of contributors list
This commit provides a script that computes the contributors list
by linking the github commits with JIRA issues. Automatically
translating github usernames remains a TODO at this point.
2014-11-26 23:16:23 -08:00