Skip to content
This repository was archived by the owner on May 21, 2024. It is now read-only.

Switch to pip download instead of pip wheel #210

Closed

Conversation

lukemarsden
Copy link

@lukemarsden lukemarsden commented Aug 11, 2020

I've been wrestling for a while with the fact that the following fairly typical requirements.txt fails with create_python_pipeline:

scikit-learn == 0.20.3
numpy >= 1.8.2
joblib >= 0.13.0
pandas >= 1.0.1
PyYAML >= 5.3

The problematic package is scikit-learn, which fails to wheel with the build error below.
This PR fixes it in my testing, by switching pip wheel to pip download, and updating pip install to tolerate the fact that sometimes pip download downloads tarballs, not whl files (PyYAML in particular was like this).

I'd appreciate a careful review as this may break other requirements.txt files in subtle and unexpected ways. In particular, does pip download ever download things other than whl or tar.gz?

Also for longer term consideration, explicit support for auto-building Docker images may be less painful to implement than trying to make python packaging save/restore work reliably in all cases.


Build error

Building wheels for collected packages: scikit-learn, PyYAML
  Building wheel for scikit-learn (setup.py): started
  Building wheel for scikit-learn (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-wheel-w3p3415d/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-wheel-w3p3415d/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-mwq7pa_z
       cwd: /tmp/pip-wheel-w3p3415d/scikit-learn/
  Complete output (15 lines):
  Partial import of sklearn during the build process.
  Traceback (most recent call last):
    File "/tmp/pip-wheel-w3p3415d/scikit-learn/setup.py", line 153, in get_numpy_status
      import numpy
  ModuleNotFoundError: No module named 'numpy'
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-wheel-w3p3415d/scikit-learn/setup.py", line 250, in <module>
      setup_package()
    File "/tmp/pip-wheel-w3p3415d/scikit-learn/setup.py", line 238, in setup_package
      raise ImportError("Numerical Python (NumPy) is not "
  ImportError: Numerical Python (NumPy) is not installed.
  scikit-learn requires NumPy >= 1.8.2.
  Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html
  
  ----------------------------------------
  ERROR: Failed building wheel for scikit-learn
  Running setup.py clean for scikit-learn
  Building wheel for PyYAML (setup.py): started
  Building wheel for PyYAML (setup.py): finished with status 'done'
  Created wheel for PyYAML: filename=PyYAML-5.3.1-cp38-cp38-linux_x86_64.whl size=572466 sha256=a24187ba8af63390e58ad4622a76bd88f95cae7de244d59328500fc2ac74def0
  Stored in directory: /root/.cache/pip/wheels/13/90/db/290ab3a34f2ef0b5a0f89235dc2d40fea83e77de84ed2dc05c
Successfully built PyYAML
Failed to build scikit-learn
ERROR: Failed to build one or more wheels

… some downloads are tarballs. Fixes scikit-learn (at least, 0.20.3) in requirements.txt
@CLAassistant
Copy link

CLAassistant commented Aug 11, 2020

CLA assistant check
All committers have signed the CLA.

@lukemarsden
Copy link
Author

lukemarsden commented Aug 11, 2020

Oh! On a whim, I tried a newer version of scikit-learn, and 0.23.2 does not have the wheel build failure that we saw in 0.20.3. So, we could just document to use newer versions of scikit-learn, as an alternative to this change?

@ysimonson
Copy link
Contributor

I suspect this may just simply be the order of requirements.txt entries - does the issue resolve if you put numpy before scikit-learn?

@lukemarsden
Copy link
Author

lukemarsden commented Aug 11, 2020 via email

@ysimonson
Copy link
Contributor

I don't think we should merge this because it'll be obviated by #213. That said, there is the question of whether we should do this in the python language builder.

I don't have a good sense of the trade-offs of pip wheel vs pip download, but my understanding is wheels are a bit better because they avoid re-compiling dependencies on every run, and they appear to work for non-wheeled packages anyway (as verified by our tests.) I suspect too (but am not certain) that the issue with scikit-learn isn't being triggered by wheels per se, but something to do with that version's setup.py. All of this to say I don't have a strong opinion on either methodology, but am slightly inclined toward wheels.

@lukemarsden
Copy link
Author

I agree. I wonder, though if we can get the best of both worlds: try to build wheels from source packages but fallback to just downloading them if the wheel fails. If you think this is worth trying, I'll move the discussion over to pachyderm/pachyderm.

@ysimonson
Copy link
Contributor

Assuming there's places where pip wheel won't work, that makes sense as a strategy. I'm not sure if/when that's the case though.

@ysimonson ysimonson closed this Aug 17, 2020
@echohack echohack deleted the make-sklearn-work-use-pip-download-install branch January 4, 2021 19:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants