[MINOR][DOCS] Use ASCII characters when possible in PySpark documentation

### What changes were proposed in this pull request?

This PR replaces the non-ASCII characters to ASCII characters when possible in PySpark documentation

### Why are the changes needed?

To avoid unnecessarily using other non-ASCII characters which could lead to the issue such as https://github.com/apache/spark/pull/32047 or https://github.com/apache/spark/pull/22782

### Does this PR introduce _any_ user-facing change?

Virtually no.

### How was this patch tested?

Found via (Mac OS):

```bash
# In Spark root directory
cd python
pcregrep --color='auto' -n "[\x80-\xFF]" `git ls-files .`
```

Closes #32048 from HyukjinKwon/minor-fix.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
This commit is contained in:
HyukjinKwon 2021-04-04 09:49:36 +03:00 committed by Max Gekk
parent 571acc87fe
commit 2ca76a57be
5 changed files with 6 additions and 6 deletions

View file

@ -42,7 +42,7 @@ SQL query engine.
Running on top of Spark, the streaming feature in Apache Spark enables powerful
interactive and analytical applications across both streaming and historical data,
while inheriting Sparks ease of use and fault tolerance characteristics.
while inheriting Spark's ease of use and fault tolerance characteristics.
**MLlib**

View file

@ -22,7 +22,7 @@ Upgrading from PySpark 2.4 to 3.0
* In Spark 3.0, PySpark requires a pandas version of 0.23.2 or higher to use pandas related functionality, such as ``toPandas``, ``createDataFrame`` from pandas DataFrame, and so on.
* In Spark 3.0, PySpark requires a PyArrow version of 0.12.1 or higher to use PyArrow related functionality, such as ``pandas_udf``, ``toPandas`` and ``createDataFrame`` with “spark.sql.execution.arrow.enabled=true”, etc.
* In Spark 3.0, PySpark requires a PyArrow version of 0.12.1 or higher to use PyArrow related functionality, such as ``pandas_udf``, ``toPandas`` and ``createDataFrame`` with "spark.sql.execution.arrow.enabled=true", etc.
* In PySpark, when creating a ``SparkSession`` with ``SparkSession.builder.getOrCreate()``, if there is an existing ``SparkContext``, the builder was trying to update the ``SparkConf`` of the existing ``SparkContext`` with configurations specified to the builder, but the ``SparkContext`` is shared by all ``SparkSession`` s, so we should not update them. In 3.0, the builder comes to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a ``SparkSession``.

View file

@ -107,7 +107,7 @@ In the case of a ``spark-submit`` script, you can use it as follows:
Note that ``PYSPARK_DRIVER_PYTHON`` above should not be set for cluster modes in YARN or Kubernetes.
If youre on a regular Python shell or notebook, you can try it as shown below:
If you're on a regular Python shell or notebook, you can try it as shown below:
.. code-block:: python

View file

@ -161,11 +161,11 @@ class FPGrowth(JavaEstimator, _FPGrowthParams, JavaMLWritable, JavaMLReadable):
.. [1] Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, and Edward Y. Chang. 2008.
Pfp: parallel fp-growth for query recommendation.
In Proceedings of the 2008 ACM conference on Recommender systems (RecSys '08).
Association for Computing Machinery, New York, NY, USA, 107114.
Association for Computing Machinery, New York, NY, USA, 107-114.
DOI: https://doi.org/10.1145/1454008.1454027
.. [2] Jiawei Han, Jian Pei, and Yiwen Yin. 2000.
Mining frequent patterns without candidate generation.
SIGMOD Rec. 29, 2 (June 2000), 112.
SIGMOD Rec. 29, 2 (June 2000), 1-12.
DOI: https://doi.org/10.1145/335191.335372

View file

@ -143,7 +143,7 @@ class BisectingKMeans(object):
-----
See the original paper [1]_
.. [1] Steinbach, M. et al. A Comparison of Document Clustering Techniques. (2000).
.. [1] Steinbach, M. et al. "A Comparison of Document Clustering Techniques." (2000).
KDD Workshop on Text Mining, 2000
http://glaros.dtc.umn.edu/gkhome/fetch/papers/docclusterKDDTMW00.pdf
"""