Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building wheels #70

Closed
jakirkham opened this issue Mar 3, 2018 · 47 comments · Fixed by #224
Closed

Building wheels #70

jakirkham opened this issue Mar 3, 2018 · 47 comments · Fixed by #224

Comments

@jakirkham
Copy link
Member

As numcodecs includes source that needs to be compiled, there can be some challenges or technical hurtles that can be encountered by a user that they may not be aware of. While we do solve this in some sense by supplying conda-forge packages with prebuilt binaries, pip remains the defacto way for Python users to get packages. However in the case of pip the user will get the sdist, which needs compilation. While this does work, there is definitely some appeal to providing a pip solution that does not require compilation (i.e. prebuilt wheels). This would help users avoid compatibility problems like this one ( #69 ).

One solution would be to piggy back off of whatever conda-forge ends up doing to also supply wheels ( conda-forge/conda-smithy#608 ). This would work well for Windows. Though on macOS conda-forge uses the 10.9 SDK (instead of the 10.6 SDK that Python tries to support). Also on Linux conda-forge uses CentOS 6 with glibc 2.12 (instead of CentOS 5 with glibc 2.5 used by manylinux1). In practice these two cases that don't sync up will be hard for Python to require much longer and some packages already don't comply (happy to go into details if this is of interest). This certainly would enjoy the nice benefit of the architecture and community conda-forge has in place to solve these issues. It also seems that proponents of the wheel format are interested in collaborating, which definitely should help.

If that doesn't work, the alternative would be to build wheels here. For Linux, there is manylinux1 Docker image, which could be used fairly easily to build these. For macOS, we could try to build them here or reach out to MacPython for help. For Windows, it would probably be best to reuse the conda-forge solution as much as possible as that already fits the requirements well. Probably would require some like issue ( conda/conda-build#2490 ) to solved. Though I don't think anyone that has had time to do that. Alternatively one can build a wheel with conda-build.

@jakirkham
Copy link
Member Author

Have a couple of examples of wheel builds in this comment. For Numcodecs, in particular, please take a look at PR ( conda-forge/numcodecs-feedstock#17 ), which would build both conda and wheel packages from the same build.

@alimanfoo
Copy link
Member

alimanfoo commented Mar 5, 2018 via email

@jakirkham
Copy link
Member Author

jakirkham commented Mar 5, 2018

Thanks @alimanfoo.

Right now my feeling is reusing our work at conda-forge to do this is the best path. We will have more resources generally (compute, manpower, storage, etc.) to push this forward. Not to mention we can easily store the wheels on anaconda.org right along side our conda packages.

Expect the wheel folks will be interested in smoothing out the compatibility issues once they see some demonstrated success of wheels in conda-forge. Perhaps by making manylinux2 for instance or broadening the manylinux spec to include arbitrary glibc. There's already some pressure on them to do this as tensorflow and other packages are breaking the manylinux spec anyways ( tensorflow/tensorflow#8802 ). Also CentOS 5 is an unsupported OS as of last year. So they will have to update and they know this ( pypa/manylinux#96 ).

The technical issues of getting wheels in there are minimal IMHO. Will have to see how others in conda-forge feel about having wheels included.

@jakirkham
Copy link
Member Author

cc @rabernat

@jeromekelleher
Copy link
Member

Just wanted to say that binary wheels on PyPI would be useful for me. I like to test my code using both the pip and conda infrastructure. Compiling numcodecs for each CI build takes quite a bit of time and a bdist_wheel would be very handy.

I'm loving zarr/numcodecs by the way; thanks for all the work!

@alimanfoo
Copy link
Member

alimanfoo commented Apr 26, 2018 via email

@jakirkham
Copy link
Member Author

Well we are in the midst of migrating to conda-build 3. So that is pretty much zapping all developer time. Not to mention other work activities outside of conda-forge. IOW it is stalled due to lack of time.

That said, we are able to build wheels using the conda-forge recipe using PR ( conda-forge/numcodecs-feedstock#17 ). There are of course the compatibility issues we already discussed mainly on Linux, but it looks like PyPA is working on a new manylinux2 spec ( pypa/manylinux#152 ), which would resolve this. The main issue is uploading the wheels actually. The current upload script would need to be reworked for wheels ( conda-forge/conda-forge-ci-setup-feedstock#6 ). Not sure what that entails yet.

@alimanfoo
Copy link
Member

Thanks @jakirkham. Am I right in understanding that conda-forge/numcodecs-feedstock#17 is now successfully building wheels on all platforms?

@jakirkham
Copy link
Member Author

It does appear that way. Hadn't had time to check since that comment. :)

The main challenge is we don't have a way to upload them from the CIs yet. Though one could certainly build those locally and upload them to PyPI or wherever one wishes.

@alimanfoo
Copy link
Member

alimanfoo commented May 1, 2018 via email

@jakirkham
Copy link
Member Author

So these days there is conda-press, we could try using this to make wheels from the conda-forge packages.

ref: https://regro.github.io/conda-press-docs/

@jakirkham
Copy link
Member Author

So I tried creating some wheels from the conda-forge packages using conda-press and uploaded them to my Anaconda channel. If some people could please try these out and let us know if this works, that would be very help. 🙂

pip install -i https://pypi.anaconda.org/jakirkham/label/test/simple numcodecs

ref: https://anaconda.org/jakirkham/numcodecs

@jrbourbeau
Copy link
Member

I'm getting this error:

(numcodecs-test-2) ➜  ~ pip install -i https://pypi.anaconda.org/jakirkham/label/test/simple numcodecs
Looking in indexes: https://pypi.anaconda.org/jakirkham/label/test/simple
Collecting numcodecs
  Downloading https://pypi.anaconda.org/jakirkham/label/test/simple/numcodecs/0.6.4/numcodecs-0.6.4-0_py37h4a8c4bd-cp37-cp37m-macosx_10_9_x86_64.whl (1.2MB)
     |████████████████████████████████| 1.2MB 724kB/s
ERROR: Could not find a version that satisfies the requirement numpy (from numcodecs) (from versions: none)
ERROR: No matching distribution found for numpy (from numcodecs)

@jakirkham
Copy link
Member Author

Yeah I think that is because pip is assuming this is a full mirror. We probably have to tell it to supplement it with PyPI or manually install the missing pieces.

That said, it appears some content was missing from the wheels. Am following up in issue ( conda-incubator/conda-press#52 ).

@jni
Copy link

jni commented Nov 9, 2019

@jakirkham I thought the manylinux problems had been fixed upstream recently? Something about manylinux2010? Anyway, I'm generally ok with requiring Linux users to have a compiler, so even just providing windows and Mac wheels would make me happy...

@thewtex
Copy link

thewtex commented Nov 9, 2019

I think conda-press is looking very promising.

That said, for something simple like numcodecs, it may be simpler to just create the wheels directly. We resolved this for the python-blosc package:

https://dev.azure.com/blosc/python-blosc/_build/results?buildId=223

It is fairly simple and elegant to set up continuous deployment on Azure Pipelines, too.

@jakirkham
Copy link
Member Author

Where are Blosc’s wheels published?

@thewtex
Copy link

thewtex commented Nov 9, 2019

They are not published to PyPI yet since we just generated them on Monday and @FrancescAlted and @aleix11alcacer are enjoying travels. But, they can be downloaded from the Artifacts button in the Azure interface:

image

@jakirkham
Copy link
Member Author

Oh great! Well congrats on getting them built. Also thanks for working on this.

The reason I ask is Numcodecs is primarily building bindings to Blosc internally. One solution here might be we rely on the new Blosc wheels all of you have made to do that for us.

I don’t know if that will be able to completely eliminate compilation in Numcodecs, but it is a start.

@thewtex
Copy link

thewtex commented Nov 9, 2019

Cool, please let me know if I can be of help.

@FrancescAlted
Copy link

It would be nice if you can use the excellent work from @thewtex to generate builds for numcodecs. However, you should not expect that the Blosc team would maintain an official build for python-blosc, and the reason is that generating wheels has been an important source of headaches for the projects that we have decided to produce them (see the issues that the PyTables team has had in producing wheels in my talk about what's new in Blosc and PyTables in recent PyData NYC 2019). In short, right now Blosc is a very modest project and we don't have resources for maintaining binary wheels.

Having said this, we would be grateful if somebody would commit as a release manager for Blosc and take responsibility for producing a curated selection of wheels. Another possibility, and probably more long term solution, would be to receive some donation to Blosc via NumFOCUS, allowing to put some people for taking care of releasing binary wheels through the different releases of both Blosc and Python (but being realistic, we should not expect this to happen anytime soon as the only donation that Blosc has received since becoming a NumFOCUS sponsored project has been a small development grant for getting Blosc2 out of alpha stage.

@jakirkham
Copy link
Member Author

Yep, completely understand that, @FrancescAlted. I think we are in the same boat.

Have been personally quite happy with Conda for this kind of thing, but I get that different people use different package mangers (including pip) depending on their needs and use cases. I don't know how we scale out building across these ecosystems. It's a very hard problem.

I'm hopeful that something like conda-press could answer this for pip wheels. Though it's still a bit young.

I'm not sure how we answer the other packaging use cases.

@Czaki
Copy link
Contributor

Czaki commented Mar 30, 2020

Hi. What is progress with building wheel for numcodecs? I can contribute with building wheel using cibuildwheel.

@nbren12
Copy link

nbren12 commented Mar 31, 2020

I would really like this. I spend approx 10 mins a day staring at numcodecs installing.

@jakirkham
Copy link
Member Author

I'm not aware of anyone working on this. No objections to someone building this for Numcodecs as long as they are willing to keep up maintenance of it.

@nbren12
Copy link

nbren12 commented Mar 31, 2020

I would definitely be open to helping out, but don't have much experience with the conda-forge etc. Any guidance on how to do this? My first thought would be to use Travis CI + docker, and then use the manylinux docker image to build the wheel and push it to PyPI. Or maybe @Czaki has some more experience.

@hammer
Copy link

hammer commented Aug 31, 2020

Another possibility, and probably more long term solution, would be to receive some donation to Blosc via NumFOCUS, allowing to put some people for taking care of releasing binary wheels through the different releases of both Blosc and Python

@FrancescAlted what amount would you need to make this happen?

@FrancescAlted
Copy link

@hammer Well, a reasonable amount for taking care of things like this would be a similar amount than for a small grant from NumFOCUS (about $5,000 USD). A small grant not only shows appreciation but it also motivates us to put resources beyond short-term on the tasks (well above 1 year). Half of a small grant could also me a possibility, but our comittment mid-term should not be as high.

@hammer
Copy link

hammer commented Sep 4, 2020

@FrancescAlted great I have sent you an email to discuss the logistics of the donation.

@rabernat
Copy link
Contributor

rabernat commented Sep 4, 2020

I feel it would be extremely easy to obtain $5000 from any number of sources to support this work. Some ideas:

  • NumFocus
  • USGS (cc @rsignell-usgs), who is using Zarr extensively
  • AWS / GCP / Azure, who all actively support cloud-native data format development

@rabernat
Copy link
Contributor

rabernat commented Sep 4, 2020

Furthermore, if we could frame this as a service (rather than a donation), any number of universities or federal labs could potentially support it.

@hammer
Copy link

hammer commented Sep 4, 2020

@rabernat I appreciate your optimism but I am skeptical that obtaining funding for improving the build and release process of an infrastructure open source library is possible from other sources. I'd be happy to share my experiences asking for money for this kind of work in the past!

I'm personally offering to make this donation once I get the EIN from @FrancescAlted, so luckily in this case it will not be necessary to seek funding from those alternative sources.

@rabernat
Copy link
Contributor

rabernat commented Sep 4, 2020

I'm personally offering to make this donation once I get the EIN

🥇 Well this is of course much easier! Thanks for your amazing generosity!

@FrancescAlted
Copy link

FrancescAlted commented Sep 4, 2020

@hammer For the record, I have replied to your private message. In the name of the Blosc team, thank you so much!

@joshmoore
Copy link
Member

Sorry, closing was automatically triggered via #224. Happy to (have) re-open(-ed), or a new issue can be started for blosc build integration.

@FrancescAlted
Copy link

FrancescAlted commented Nov 19, 2020

Now that @hammer donation is complete (thanks Jeff!) we are working towards producing wheels for Blosc. Right now our focus is python-blosc, but perhaps you guys are rather more interested on C-Blosc itself. We have two possibilities:

  1. Provide both static and dynamic libraries in the python-blosc wheel. That means that after installing it via pip, the libraries are accessible to other projects that just need C-Blosc libraries (like zarr/numcodecs).

  2. Do another wheel for just the C-Blosc library. AFAIK, this is possible (people like cmake folks or Intel are doing this for pure C/C++ libraries for long time), but we are not sure on how to proceed in this case, so we would be grateful to get some advice here.

Indeed 1) is easier for us, but we would like to keep the zarr team as happy as possible ;-). Thoughts?

@jakirkham
Copy link
Member Author

Thanks Jeff! 😄

Originally was also thinking 1 would make the most sense. So agree with that as well. There are some parts of Numcodecs that link directly to things like LZ4, Zstd, etc. as included by Blosc. That said, there are also other Python libraries that supply wheels and Conda packages for those. So we could handle them that way. Alternatively we could do the same thing we are doing now just with the sources of those libraries. IOW I think we have enough other options to handle C/C++ without issues.

If anyone else here has thoughts, feel free to share 🙂

@Czaki
Copy link
Contributor

Czaki commented Nov 19, 2020

I also meet the idea of writing some small wrapper that loads c/c++ extension during runtime rather than linking during compilation. This allows using the external library without going through python code (speedup) but without repetition of a binary blob.

But I do not implement such a thing.

@FrancescAlted
Copy link

FrancescAlted commented Nov 20, 2020

Thanks for the feedback. For now, we will work towards providing wheels for python-blosc (option 1), as suggested. However, we will still try to pursue option 2) so as to build wheels for the pure C-Blosc library (and perhaps make it a dependency for python-blosc). Again, any info on how to create a wheel for a pure C project is welcome.

Will keep you informed.

@jakirkham
Copy link
Member Author

Thanks Francesc!

@joshmoore joshmoore mentioned this issue Nov 24, 2020
2 tasks
@FrancescAlted
Copy link

Hi there!

We have been hard at work generating wheels for python-blosc and we have come with a pretty comprehensive set at: https://pypi.org/manage/project/blosc/release/1.9.3.dev0/ (this release is yanked, so install it via pip install blosc==1.9.3.dev0).

We are including not only the python-blosc extension inside the wheels, but also the static and dynamic libraries of C-Blosc (I think this is what you need for numcodecs). This is our first attempt at building wheels, but we have been successful on testing the libraries on Mac and Linux already (Windows is still being explored, but the libraries are there already for you to test).

Could the Zarr team please give this a try and tell us how it goes?

BTW, the guys behind this effort have mainly been @oscargm98 and @aleixalcacer, and they will be glad to hear and help you.

@FrancescAlted
Copy link

Hi. We have manually (and not only via CI) tested the wheels on Windows, Linux and Mac, and they look good to us. We have made some detailed instructions on how to compile some C-Blosc examples by using the python-blosc wheels at: https://github.com/Blosc/c-blosc/blob/master/COMPILING_WITH_WHEELS.rst.

Our intention is to release soon a new version of C-Blosc (1.21.0) with updated codecs (specially the new zstd 1.4.8), and shortly after an updated version of python-blosc (1.10.0) so as to use latest C-Blosc sources and do a release of it with the (official) wheels. It would be nice if you can provide feedback for Zarr (please note that, until we do the new release, you still need to use pip install blosc==1.9.3.dev0). Thanks!

@oscargm98
Copy link

Hi there! We have just released a new version of python-blosc with binary wheels fully implemented: https://github.com/Blosc/python-blosc/releases/tag/v1.10.0

Note that support for AVX2 run time discovery has been activated. That means that, if you use the libraries in this wheels, you can leverage the full capabilities of Intel CPUs. Please give it a spin and report back any possible flaws. Enjoy!

@hammer
Copy link

hammer commented Dec 23, 2020

Thanks @oscargm98! I've filed #262 to track using these wheels in the numcodecs build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.