-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit numpy\h5py max versions due to tensorflow 2.3.1 max supported versions #990
Conversation
Numpy 1.19.4 has an issue on windows which prevents it from running correctly (numpy package refers to this URL https://tinyurl.com/y3dm3h86). I just tested with 1.19.3 and the issue is not present yet
Actually, even TF 2.3.1 specifies numpy<1.19.0 (and h5py<2.11.0), so I'm aligning the maximum versions here also.
requirements.txt
Outdated
@@ -1,6 +1,8 @@ | |||
Cython>=0.25 | |||
h5py>=2.6 | |||
numpy>=1.15 | |||
h5py>=2.6 ; sys_platform != "Windows" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the platform check needed for h5py
? Is this related to h5py/h5py#1732?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually added it because of the numpy dependency, but maybe it's overzealous of me (pip should resolve the h5py version that better suits the windows numpy constraint). Wasn't aware of that issue. I believe that shouldn't be an issue as we are saving preprocessed data and not model weights in h5, but the bug behind the reason why weights saving is broken may pop up somewhere in Ludwig too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually those two constraints should be not platform specific: tensorflow 2.3.1 requires thaose constraints independently of platform (and I don't know why pip doesn't know that from the beginning and only tells that in the end)
I found out about this with a pip install ludwig --upgrade --upgrade-strategy eager
Might be a bug / missing feature on pip, which should correctly map the support/requirements map of each package, but it upgrades both numpy and h5py beyond what's supported by tensorflow 2.3.1, giving a warning in the end.
I thought the quickest solution was to limit the version in requirements, but if you have a cleaner solution, it's of course way better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The platform constraint should be set only for neuropod, since it's a package that's not available under windows, and makes a pip install ludwig[full]
fail, also ludwig[serve] fails for the same issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird though, because without those constraints it looks like on unix and mac machines there is no issue, so i guess pip does it's job at figuring out the correct versions for those platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clear. In that case you shouldn't need to edit anything (apart from neuropod), since dependencies are correctly computed during install (will have to double-check that, but I think they are ok), and you can let pip figure out the ceiling version.
The numpy bug on windows that initially throw me into this is past the maximum supported version in tensorflow (due to the eager update), so if we're not taking eager update into account we can just forget this PR and only fix the neuropod on windows requirement.
Let me check for a correct pip install ludwig
behavior on requirements under windows, just to be sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok nevermind, pip install ludwig
installs the most up to date version of numpy, regardless of TF requirements, and results in this:
tensorflow 2.3.1 requires h5py<2.11.0,>=2.10.0, but you'll have h5py 3.1.0 which is incompatible.
tensorflow 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.
At this point, your proposal on constraining the version only on windows only works if this issue is only present on windows. If linux\other correctly expand the dependencies and figure out that one of the dependent packages (tensorflow) has a more strict requirement than the base package (ludwig), it's ok. If this issue is also present on linux\others, I stand with my current code edit, limiting the uppermost version directly in ludwig's own requirements.txt
Another way would be to remove the dependency on numpy and h5py alltogether, and rely on the chained dependency of tensorflow, making it choose the most appropriate version for itself.
For example, TF 2.3.1's numpy must be >=1.16.0, whilist Ludwig's requirement is >=1.15.0 (and we could potentially have the same issue in the lower end of the spectrum, having pip thinking it's all ok because numpy's version is 1.15.0, while tensorflow 2.3.1 requirement would be >= 1.16.0).
I know it's a little bit tangled of an explanation, but I hope it's clear :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tested on Mac:
Collecting h5py>=2.6
Downloading h5py-3.1.0-cp38-cp38-macosx_10_9_x86_64.whl (2.9 MB)
|████████████████████████████████| 2.9 MB 26.0 MB/s
Collecting numpy>=1.15
Downloading numpy-1.19.4-cp38-cp38-macosx_10_9_x86_64.whl (15.3 MB)
|████████████████████████████████| 15.3 MB 9.6 MB/s
tensorflow 2.3.1 requires h5py<2.11.0,>=2.10.0, but you'll have h5py 3.1.0 which is incompatible.
tensorflow 2.3.1 requires numpy<1.19.0,>=1.16.0, but you'll have numpy 1.19.4 which is incompatible.
With additional suggestion:
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default
I tried following the suggestion of using --use-feature=2020-resolver
and I got:
Collecting h5py>=2.6
Downloading h5py-2.10.0-cp38-cp38-macosx_10_9_x86_64.whl (3.0 MB)
|████████████████████████████████| 3.0 MB 2.6 MB/s
Collecting numpy>=1.15
Downloading numpy-1.18.5-cp38-cp38-macosx_10_9_x86_64.whl (15.1 MB)
|████████████████████████████████| 15.1 MB 3.9 MB/s
Which is indeed the behavior one would expect from the dependency resolution. I can test on linux too, and on windows, but I believe it's not a platform issue.
Now te question is what do we want to do about this? My take is currently the following: for the time being I would be ok with putting the same constraints of TF 2.3.1 on both numpy and h5py, although this is not very sustainable moving forward, as we would have to keep track of them every single minor release of TF. So what I would do is to monitor when pip 20.3 is released and remove the constraints as soon as that happens. Should have been release in October, so I believeit won't be long.
What do you think @carlogrisetti and @tgaddair ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good for someone that keeps its system maintained... The user that had an open issue recently on #1000 (heh, what a milestone), was using pip 8.x for example, and although I hope it will be a sporadic aituation, many current and slightly previous pip versions will be around for a lot of time.
The question is: do we want to ease the life of older pips, or reply to anyone having this issue to "enlarge your pip"?
Why did you dismiss the "remove numpy/h5py requirement and just let TF choose its own versions" option I proposed? (It's an actual genuine question) Do we have some narrower requirements for Ludwig itself? (I don't think so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, people using older pips may keep on posting issues. On the other hand the reson for avoiding this is maintainability. But you are probably right, we can just follow whatever restrictions tensorflow has and work with that, but that would make being compatible with multiple version of tensorflow like we do now (2.2 2.3 and nightly if you look at the tests) more difficult.
PIP package not available for windows
Let's try to recap: The issue is not the windows bug anymore, since tensorflow 2.3.1 requires a much lower version of numpy, and we can use that limit to fix both issues. In order to do that we should also modify the requirements as per my last commit. Is there something i'm missing? PS: one more thing: @tgaddair i see you added the sys_platform constraint as "Windows", but everywhere I look I see that written as "win32". Is that equivalent or a typo on your side? |
@carlogrisetti, not sure about "win32", my suggestion was taken directly from the Python standard: https://www.python.org/dev/peps/pep-0508/#environment-markers Note that this is for Looking back at my comment, I see that I did use |
platform_system is way better, since it generalizes based on architecture. Fixed that, thanks! |
Numpy 1.19.4 has an issue on windows which prevents it from running correctly (numpy package refers to this URL https://tinyurl.com/y3dm3h86).
I just tested with 1.19.3 and the issue is not present yet