[SPARK-33003][PYTHON][DOCS] Add type hints guidelines to the documentation
### What changes were proposed in this pull request? Add type hints guidelines to developer docs. ### Why are the changes needed? Since it is a new and still somewhat evolving feature, we should provided clear guidelines for potential contributors. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Closes #30094 from zero323/SPARK-33003. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
This commit is contained in:
parent
82d500a05c
commit
d7f15b025b
|
@ -77,6 +77,50 @@ There are a couple of additional notes to keep in mind when contributing to code
|
|||
* Be Pythonic.
|
||||
* APIs are matched with Scala and Java sides in general.
|
||||
* PySpark specific APIs can still be considered as long as they are Pythonic and do not conflict with other existent APIs, for example, decorator usage of UDFs.
|
||||
* If you extend or modify public API, please adjust corresponding type hints. See `Contributing and Maintaining Type Hints`_ for details.
|
||||
|
||||
Contributing and Maintaining Type Hints
|
||||
----------------------------------------
|
||||
|
||||
PySpark type hints are provided using stub files, placed in the same directory as the annotated module, with exception to ``# type: ignore`` in modules which don't have their own stubs (tests, examples and non-public API).
|
||||
As a rule of thumb, only public API is annotated.
|
||||
|
||||
Annotations should, when possible:
|
||||
|
||||
* Reflect expectations of the underlying JVM API, to help avoid type related failures outside Python interpreter.
|
||||
* In case of conflict between too broad (``Any``) and too narrow argument annotations, prefer the latter as one, as long as it is covering most of the typical use cases.
|
||||
* Indicate nonsensical combinations of arguments using ``@overload`` annotations. For example, to indicate that ``*Col`` and ``*Cols`` arguments are mutually exclusive:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@overload
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
threshold: float = ...,
|
||||
inputCol: Optional[str] = ...,
|
||||
outputCol: Optional[str] = ...
|
||||
) -> None: ...
|
||||
@overload
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
thresholds: Optional[List[float]] = ...,
|
||||
inputCols: Optional[List[str]] = ...,
|
||||
outputCols: Optional[List[str]] = ...
|
||||
) -> None: ...
|
||||
|
||||
* Be compatible with the current stable MyPy release.
|
||||
|
||||
|
||||
Complex supporting type definitions, should be placed in dedicated ``_typing.pyi`` stubs. See for example `pyspark.sql._typing.pyi <https://github.com/apache/spark/blob/master/python/pyspark/sql/_typing.pyi>`_.
|
||||
|
||||
Annotations can be validated using ``dev/lint-python`` script or by invoking mypy directly:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
mypy --config python/mypy.ini python/pyspark
|
||||
|
||||
|
||||
|
||||
Code Style Guide
|
||||
|
@ -90,4 +134,3 @@ the APIs were inspired by Java. PySpark also follows `camelCase` for exposed API
|
|||
There is an exception ``functions.py`` that uses `snake_case`. It was in order to make APIs SQL (and Python) friendly.
|
||||
|
||||
PySpark leverages linters such as `pycodestyle <https://pycodestyle.pycqa.org/en/latest/>`_ and `flake8 <https://flake8.pycqa.org/en/latest/>`_, which ``dev/lint-python`` runs. Therefore, make sure to run that script to double check.
|
||||
|
||||
|
|
Loading…
Reference in a new issue