Fixed pandoc dependency issue in python/setup.py

## Problem Description

When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.

The following is a sample failure:
```
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
  Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
    100% |████████████████████████████████| 188.3MB 16.8MB/s
    Complete output from command python setup.py egg_info:
    Maybe try:

        sudo apt-get install pandoc
    See http://johnmacfarlane.net/pandoc/installing.html
    for installation options
    ---------------------------------------------------------------

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
        long_description = pypandoc.convert('README.md', 'rst')
      File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
        outputfile=outputfile, filters=filters)
      File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
        _ensure_pandoc_path()
      File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
        raise OSError("No pandoc was found: either install pandoc and add it\n"
    OSError: No pandoc was found: either install pandoc and add it
    to your PATH or or call pypandoc.download_pandoc(...) or
    install pypandoc wheels with included pandoc.

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/
```

## What changes were proposed in this pull request?

This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without requiring pandoc to be installed.

## How was this patch tested?

I tested this by building a wheel package of pyspark with the change applied. Then, in a clean virtual environment with pypandoc installed but pandoc not available on the system, I installed pyspark from the wheel.

Here is the output

```
$ pip freeze | grep pypandoc
pypandoc==1.4
$ which pandoc
$ pip install --no-cache-dir ../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Processing /home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Requirement already satisfied: py4j==0.10.6 in /home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from pyspark==2.3.0.dev0)
Installing collected packages: pyspark
Successfully installed pyspark-2.3.0.dev0
```

Author: Tucker Beck <tucker.beck@rentrakmail.com>

Closes #18981 from dusktreader/dusktreader/fix-pandoc-dependency-issue-in-setup_py.
This commit is contained in:
Tucker Beck 2017-09-07 09:38:00 +09:00 committed by hyukjinkwon
parent fa0092bddf
commit aad2125475

View file

@ -151,6 +151,8 @@ try:
long_description = pypandoc.convert('README.md', 'rst')
except ImportError:
print("Could not import pypandoc - required to package PySpark", file=sys.stderr)
except OSError:
print("Could not convert - pandoc is not installed", file=sys.stderr)
setup(
name='pyspark',