-
Notifications
You must be signed in to change notification settings - Fork 29k
Fixed pandoc dependency issue in python/setup.py #18981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed pandoc dependency issue in python/setup.py #18981
Conversation
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
This change simply adds an additional exception handler for the OSError
that is raised.
The following is a sample failure:
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
100% |████████████████████████████████| 188.3MB 16.8MB/s
Complete output from command python setup.py egg_info:
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
outputfile=outputfile, filters=filters)
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
_ensure_pandoc_path()
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/
|
Is this the only thing that needs to change, or does some other script/doc need to change to make sure pandoc is installed? does OSError occur in other situations or mostly only when pandoc isn't installed here? |
|
@srowen you could call download_pandoc(). However, there's no utility for having pandoc installed for a pyspark user. It's possible that the convert() function could raise an OSError under other circumstances, but most likely that this condition is the only source for that exception |
|
Thanks for fixing this @dusktreader :) OSError seems to mostly occur when pandoc is not installed. I think automatically attempting to install pandoc is more likely to lead us into difficulty rather than skipping it (since otherwise in part pypandoc would just do this automatically). Jenkins, OK to Test. |
|
Also to be clear pandoc is pretty optional, we want to use for packaging but in local mode it doesn't really matter. So LGTM pending jenkins. |
|
looks something gone wrong with Jenkins command.. |
|
ok to test |
|
Test build #80904 has finished for PR 18981 at commit
|
|
LGTM too. |
|
retest this please |
|
Test build #81459 has finished for PR 18981 at commit
|
## Problem Description
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
The following is a sample failure:
```
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
100% |████████████████████████████████| 188.3MB 16.8MB/s
Complete output from command python setup.py egg_info:
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
outputfile=outputfile, filters=filters)
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
_ensure_pandoc_path()
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/
```
## What changes were proposed in this pull request?
This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without requiring pandoc to be installed.
## How was this patch tested?
I tested this by building a wheel package of pyspark with the change applied. Then, in a clean virtual environment with pypandoc installed but pandoc not available on the system, I installed pyspark from the wheel.
Here is the output
```
$ pip freeze | grep pypandoc
pypandoc==1.4
$ which pandoc
$ pip install --no-cache-dir ../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Processing /home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Requirement already satisfied: py4j==0.10.6 in /home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from pyspark==2.3.0.dev0)
Installing collected packages: pyspark
Successfully installed pyspark-2.3.0.dev0
```
Author: Tucker Beck <[email protected]>
Closes #18981 from dusktreader/dusktreader/fix-pandoc-dependency-issue-in-setup_py.
(cherry picked from commit aad2125)
Signed-off-by: hyukjinkwon <[email protected]>
|
Merged to master and branch-2.2 |
## Problem Description
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
The following is a sample failure:
```
$ which pandoc
$ pip freeze | grep pypandoc
pypandoc==1.4
$ pip install pyspark
Collecting pyspark
Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
100% |████████████████████████████████| 188.3MB 16.8MB/s
Complete output from command python setup.py egg_info:
Maybe try:
sudo apt-get install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
long_description = pypandoc.convert('README.md', 'rst')
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
outputfile=outputfile, filters=filters)
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
_ensure_pandoc_path()
File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/
```
## What changes were proposed in this pull request?
This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without requiring pandoc to be installed.
## How was this patch tested?
I tested this by building a wheel package of pyspark with the change applied. Then, in a clean virtual environment with pypandoc installed but pandoc not available on the system, I installed pyspark from the wheel.
Here is the output
```
$ pip freeze | grep pypandoc
pypandoc==1.4
$ which pandoc
$ pip install --no-cache-dir ../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Processing /home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
Requirement already satisfied: py4j==0.10.6 in /home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from pyspark==2.3.0.dev0)
Installing collected packages: pyspark
Successfully installed pyspark-2.3.0.dev0
```
Author: Tucker Beck <[email protected]>
Closes apache#18981 from dusktreader/dusktreader/fix-pandoc-dependency-issue-in-setup_py.
(cherry picked from commit aad2125)
Signed-off-by: hyukjinkwon <[email protected]>
### What changes were proposed in this pull request? This PR removes any dependencies on pypandoc. It also makes related tweaks to the docs README to clarify the dependency on pandoc (not pypandoc). ### Why are the changes needed? We are using pypandoc to convert the Spark README from Markdown to ReST for PyPI. PyPI now natively supports Markdown, so we don't need pypandoc anymore. The dependency on pypandoc also sometimes causes issues when installing Python packages that depend on PySpark, as described in #18981. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually: ```sh python -m venv venv source venv/bin/activate pip install -U pip cd python/ python setup.py sdist pip install dist/pyspark-3.0.0.dev0.tar.gz pyspark --version ``` I also built the PySpark and R API docs with `jekyll` and reviewed them locally. It would be good if a maintainer could also test this by creating a PySpark distribution and uploading it to [Test PyPI](https://test.pypi.org) to confirm the README looks as it should. Closes #27376 from nchammas/SPARK-30665-pypandoc. Authored-by: Nicholas Chammas <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
Problem Description
When pyspark is listed as a dependency of another package, installing
the other package will cause an install failure in pyspark. When the
other package is being installed, pyspark's setup_requires requirements
are installed including pypandoc. Thus, the exception handling on
setup.py:152 does not work because the pypandoc module is indeed
available. However, the pypandoc.convert() function fails if pandoc
itself is not installed (in our use cases it is not). This raises an
OSError that is not handled, and setup fails.
The following is a sample failure:
What changes were proposed in this pull request?
This change simply adds an additional exception handler for the OSError
that is raised. This allows pyspark to be installed client-side without requiring pandoc to be installed.
How was this patch tested?
I tested this by building a wheel package of pyspark with the change applied. Then, in a clean virtual environment with pypandoc installed but pandoc not available on the system, I installed pyspark from the wheel.
Here is the output