Commit graph

8 commits

Author SHA1 Message Date
wankunde 60e324aa9f [SPARK-34688][PYTHON] Upgrade to Py4J 0.10.9.2
### What changes were proposed in this pull request?
This PR upgrade Py4J from 0.10.9.1 to 0.10.9.2 that contains some bug fixes and improvements.

* expose shell parameter in Popen inside launch_gateway. ([bartdag/py4j220efc3](220efc3716))
* fixed Flake8 errors ([bartdag/py4j6c6ee9a](6c6ee9aedc))

### Why are the changes needed?
To leverage fixes from the upstream in Py4J.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Jenkins build and GitHub Actions will test it out.

Closes #31796 from wankunde/py4j.

Authored-by: wankunde <wankunde@163.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
2021-03-11 09:51:41 -06:00
HyukjinKwon 6b86aa0b52 [SPARK-33984][PYTHON] Upgrade to Py4J 0.10.9.1
### What changes were proposed in this pull request?

This PR upgrade Py4J from 0.10.9 to 0.10.9.1 that contains some bug fixes and improvements.
It contains one bug fix (4152353ac1).

### Why are the changes needed?

To leverage fixes from the upstream in Py4J.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Jenkins build and GitHub Actions will test it out.

Closes #31009 from HyukjinKwon/SPARK-33984.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2021-01-04 10:23:38 -08:00
HyukjinKwon 6ab29b37cf [SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the documentation base
### What changes were proposed in this pull request?

This PR proposes to redesign the PySpark documentation.

I made a demo site to make it easier to review: https://hyukjin-spark.readthedocs.io/en/stable/reference/index.html.

Here is the initial draft for the final PySpark docs shape: https://hyukjin-spark.readthedocs.io/en/latest/index.html.

In more details, this PR proposes:
1. Use [pydata_sphinx_theme](https://github.com/pandas-dev/pydata-sphinx-theme) theme - [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/) use this theme. The CSS overwrite is ported from Koalas. The colours in the CSS were actually chosen by designers to use in Spark.
2. Use the Sphinx option to separate `source` and `build` directories as the documentation pages will likely grow.
3. Port current API documentation into the new style. It mimics Koalas and pandas to use the theme most effectively.

    One disadvantage of this approach is that you should list up APIs or classes; however, I think this isn't a big issue in PySpark since we're being conservative on adding APIs. I also intentionally listed classes only instead of functions in ML and MLlib to make it relatively easier to manage.

### Why are the changes needed?

Often I hear the complaints, from the users, that current PySpark documentation is pretty messy to read - https://spark.apache.org/docs/latest/api/python/index.html compared other projects such as [pandas](https://pandas.pydata.org/docs/) and [Koalas](https://koalas.readthedocs.io/en/latest/).

It would be nicer if we can make it more organised instead of just listing all classes, methods and attributes to make it easier to navigate.

Also, the documentation has been there from almost the very first version of PySpark. Maybe it's time to update it.

### Does this PR introduce _any_ user-facing change?

Yes, PySpark API documentation will be redesigned.

### How was this patch tested?

Manually tested, and the demo site was made to show.

Closes #29188 from HyukjinKwon/SPARK-32179.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-07-27 17:49:21 +09:00
Nicholas Chammas 4d5166f840 [SPARK-30880][DOCS] Delete Sphinx Makefile cruft
Split off from #27534.

### What changes were proposed in this pull request?

This PR deletes some unused cruft from the Sphinx Makefiles.

### Why are the changes needed?

To remove dead code and the associated maintenance burden.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Tested locally by building the Python docs and reviewing them in my browser.

Closes #27625 from nchammas/SPARK-30731-makefile-cleanup.

Lead-authored-by: Nicholas Chammas <nicholas.chammas@liveramp.com>
Co-authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-21 14:53:40 +09:00
Dongjoon Hyun fc4e56a54c [SPARK-30884][PYSPARK] Upgrade to Py4J 0.10.9
This PR aims to upgrade Py4J to `0.10.9` for better Python 3.7 support in Apache Spark 3.0.0 (master/branch-3.0). This is not for `branch-2.4`.

- Apache Spark 3.0.0 is using `Py4J 0.10.8.1` (released on 2018-10-21) because `0.10.8.1` was the first official release to support Python 3.7.
    - https://www.py4j.org/changelog.html#py4j-0-10-8-and-py4j-0-10-8-1
- `Py4J 0.10.9` was released on January 25th 2020 with better Python 3.7 support and `magic_member` bug fix.
    - https://github.com/bartdag/py4j/releases/tag/0.10.9
    - https://www.py4j.org/changelog.html#py4j-0-10-9

No.

Pass the Jenkins with the existing tests.

Closes #27641 from dongjoon-hyun/SPARK-30884.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
2020-02-20 09:09:30 -08:00
David Toneian b2134ee73c [SPARK-30823][PYTHON][DOCS] Set %PYTHONPATH% when building PySpark documentation on Windows
This commit is published into the public domain.

### What changes were proposed in this pull request?
In analogy to `python/docs/Makefile`, which has
> export PYTHONPATH=$(realpath ..):$(realpath ../lib/py4j-0.10.8.1-src.zip)

on line 10, this PR adds
> set PYTHONPATH=..;..\lib\py4j-0.10.8.1-src.zip

to `make2.bat`.

Since there is no `realpath` in default installations of Windows, I left the relative paths unresolved. Per the instructions on how to build docs, `make.bat` is supposed to be run from `python/docs` as the working directory, so this should probably not cause issues (`%BUILDDIR%` is a relative path as well.)

### Why are the changes needed?
When building the PySpark documentation on Windows, by changing directory to `python/docs` and running `make.bat` (which runs `make2.bat`), the majority of the documentation may not be built if pyspark is not in the default `%PYTHONPATH%`. Sphinx then reports that `pyspark` (and possibly dependencies) cannot be imported.

If `pyspark` is in the default `%PYTHONPATH%`, I suppose it is that version of `pyspark` – as opposed to the version found above the `python/docs` directory – that is considered when building the documentation, which may result in documentation that does not correspond to the development version one is trying to build.

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
Manual tests on my Windows 10 machine. Additional tests with other environments very welcome!

Closes #27569 from DavidToneian/SPARK-30823.

Authored-by: David Toneian <david@toneian.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
2020-02-14 13:49:11 +09:00
Kousuke Saruta 55ab777078 [SPARK-3870] EOL character enforcement
We have shell scripts and Windows batch files, so we should enforce proper EOL character.

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>

Closes #2726 from sarutak/eol-enforcement and squashes the following commits:

9748c3f [Kousuke Saruta] Fixed make.bat
252de89 [Kousuke Saruta] Removed extra characters from make.bat
5b81c00 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
8633ed2 [Kousuke Saruta] merge branch 'master' of git://git.apache.org/spark into eol-enforcement
5d630d8 [Kousuke Saruta] Merged
ba10797 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
7407515 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into eol-enforcement
772fd4e [Kousuke Saruta] Normized EOL character in make.bat and compute-classpath.cmd
ac7f873 [Kousuke Saruta] Added an entry for .gitattributes to .rat-excludes
1570e77 [Kousuke Saruta] Added .gitattributes
2014-10-31 12:39:52 -07:00
Masayoshi TSUZUKI 66af8e2508 [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows
Modified not to pollute environment variables.
Just moved the main logic into `XXX2.cmd` from `XXX.cmd`, and call `XXX2.cmd` with cmd command in `XXX.cmd`.
`pyspark.cmd` and `spark-class.cmd` are already using the same way, but `spark-shell.cmd`, `spark-submit.cmd` and `/python/docs/make.bat` are not.

Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>

Closes #2797 from tsudukim/feature/SPARK-3943 and squashes the following commits:

b397a7d [Masayoshi TSUZUKI] [SPARK-3943] Some scripts bin\*.cmd pollutes environment variables in Windows
2014-10-14 18:50:14 -07:00