Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Limit Request: duckdb - 100 GB #1129

Closed
2 tasks done
hannes opened this issue Jun 1, 2021 · 26 comments
Closed
2 tasks done

Project Limit Request: duckdb - 100 GB #1129

hannes opened this issue Jun 1, 2021 · 26 comments

Comments

@hannes
Copy link

hannes commented Jun 1, 2021

Project

https://pypi.org/p/duckdb

Does this project already exist?

  • Yes

Size of release/project

100 GB

Which indexes

PyPI

Reasons for the request

We upload dev versions to PyPI as well, and since we sometimes go through lots of those between releases, the project size balloons until the release happens and we clean up dev versions. This cleaning up of dev versions is currently done by script, but perhaps is also something PyPI could do? Anyway, to stop our CI breaking towards the end of a dev cycle It would be great if you could increase our limit. Thanks!

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@di
Copy link
Member

di commented Jun 3, 2021

Can you explain what this project is, and why it's so large? Why publish dev releases to PyPI if they are later removed?

@di di added the status: awaiting response Needs more information before proceeding label Jun 3, 2021
@hannes
Copy link
Author

hannes commented Jun 4, 2021

Well I would not remove them were it not for the limits. The project is a in-process database engine similar to SQLite but tailored towards analytical workloads. We ship the entire database kernel in the package to avoid external dependencies, and we do so for many platforms. So while the individual package is rather small, the combinations of Python version, OS, architecture etc. create this large size.

@pradyunsg
Copy link
Contributor

pradyunsg commented Jun 4, 2021

https://github.com/duckdb/duckdb/blob/master@%7B2021-06-04T09:13:17Z%7D/.github/workflows/main.yml#L832

The project seems to be uploading a release on every commit to their main git branch (i.e. every PR is released as a dev release).

@pradyunsg pradyunsg changed the title Limit Request: duckdb - 100 GB Limit Request: duckdb - 10 GB Jun 4, 2021
@pradyunsg pradyunsg removed the status: awaiting response Needs more information before proceeding label Sep 15, 2021
@pradyunsg
Copy link
Contributor

pradyunsg commented Sep 15, 2021

IMO it is not reasonable to use PyPI as dump for artifacts that are generated on each commit to the default branch of the project. I'm inclined to say that we should deny this limit increase request.

Could you provide context for why you are publishing each commit of a project to PyPI?

@pradyunsg pradyunsg added the status: awaiting response Needs more information before proceeding label Sep 15, 2021
@pradyunsg
Copy link
Contributor

A gentle nudge for @hannesmuehleisen, since this ticket is currently waiting on a response.

@hannes
Copy link
Author

hannes commented Dec 6, 2021

We do not push each commit, but each PR merge. These are typically larger features that we want to allow people to install using their current workflow. Regardless of whether we push PRs or not, we are going to run out of PyPI space in about 20 versions at the current rate (which is going to increase because of more alive Python version and M1 support that is upcoming.)

@pradyunsg pradyunsg removed the status: awaiting response Needs more information before proceeding label Jan 15, 2022
@pradyunsg
Copy link
Contributor

I'm gonna defer @pypa/warehouse-admins on this one.

@Alex-Monahan
Copy link

Alex-Monahan commented Feb 3, 2022

Hello pypa folks!

Thank you for the work you do to enable the magic of Python! I wanted to add a few details here about why a size increase for this repo would be a great help to the Python community.

DuckDB is designed to lower the barrier to entry for analysts to use fast SQL queries in their workloads. Our target audience typically does not have compiler experience. By hosting dev versions (generated per PR not per commit), our Python users can test out bug fixes and new features without needing to learn how to compile the project. This allows us to bring Python folks (like myself - I don't know how to compile C++!) into the DuckDB community to help test it out prior to releases.

DuckDB is also considered to be a popular and active project on snyk (downloaded 75,000 times per week), so this expanded pypi repo space would be useful to thousands of people!

We already do regular cleanup of these dev versions after each release, so we do try to be good stewards of pypi's resources. However, as Hannes stated above, even without these dev versions, we will run out of space in well under 20 releases and DuckDB releases every 1-2 months. That doesn't give us a lot of time until we are in a similar position asking for your help again.

Thank you for your consideration - we do really appreciate it!

@Alex-Monahan
Copy link

I'm gonna defer @pypa/warehouse-admins on this one.

I also reviewed your guidelines in the readme, and I believe that we fit within the guidelines that you outlined. I do recognize that they are guidelines and you have the ability to decide, but we really do believe we are following the spirit of your requirements. We recognize and appreciate the work you do for Python!

We do not release nightly, for example, and the main reason for our large bundle size is compatibility across many platforms. We also do not ship another language within the package or any pre-trained ML models. The DuckDB organization is also well established and we have a number of happy pypi users!

@pradyunsg
Copy link
Contributor

The current project size limit for duckdb is 10 GB. It's not clear to me what's being requested here.

@Alex-Monahan
Copy link

The current project size limit for duckdb is 10 GB. It's not clear to me what's being requested here.

Hello!
The title of the issue was initially set to 100GB, so we are requesting a total of 100 GB of space. Could you grant us a 100GB allocation please? We appreciate it!

@hannes
Copy link
Author

hannes commented Feb 4, 2022

We realise things cost money. To offset the increased cost of the hosting for our package, I've just setup recurring yearly donation of 99$ to the Python software foundation. I hope this can be resolved.

Screenshot 2022-02-04 at 06 56 54

@ewdurbin
Copy link
Member

ewdurbin commented Feb 4, 2022

Current usage for the project is 3.3 GiB, with a 100 MiB file size limit, and a 10.0 GiB total project size limit.

Are you anticipating reaching the 10.0 GiB limit in the near future? I'm also confused.

@Alex-Monahan
Copy link

Current usage for the project is 3.3 GiB, with a 100 MiB file size limit, and a 10.0 GiB total project size limit.

Are you anticipating reaching the 10.0 GiB limit in the near future? I'm also confused.

We just manually deleted some development releases in order to make room for our deployments to start working again. We had encountered errors due to running out of space and had to disable our pypi uploads.

Would you mind granting us the space so we can avoid these issues? We do clean up our development releases periodically, but the current space is very limiting (and will cause issues even for production releases in the next few months as we continue roughly monthly releases).

Are there any other details that you would like me to provide?

@pradyunsg pradyunsg changed the title Limit Request: duckdb - 10 GB Limit Request: duckdb - 100 GB Feb 15, 2022
@pradyunsg pradyunsg changed the title Limit Request: duckdb - 100 GB Project Limit Request: duckdb - 100 GB Feb 15, 2022
@pradyunsg
Copy link
Contributor

Updated the title and body to reflect the updated total size request. I'll defer to the admins on bumping the limit.

@hannes
Copy link
Author

hannes commented Feb 21, 2022

We just ran into the limit again.

@hannes
Copy link
Author

hannes commented Mar 21, 2022

And again. Any update on this?

@hannes
Copy link
Author

hannes commented Mar 21, 2022

We are now ca. 15 versions away from running out of space without considering any pre-releases

@di
Copy link
Member

di commented Apr 7, 2022

Hi folks, as previously mentioned, the frequency of dev releases here is the issue. I'm not seeing a reason explained here why this project a) needs to publish so many dev releases so frequently and b) why these dev releases need to exist indefinitely.

Currently the project sits at 3.4GB without these releases, which is well below the 10GB limit. The current practice of culling or removing dev releases is acceptable from our perspective, so you're welcome to continue doing that.

Unless we can get some explanation why this frequency of releases is necessary, or the project takes steps to reduce the frequency of releases, we're unlikely to increase the limit here.

@di di added the status: awaiting response Needs more information before proceeding label Apr 7, 2022
@gforsyth
Copy link

gforsyth commented May 4, 2022

Hi all! Happy PyPI and DuckDB user here! I've also been collaborating with the DuckDB team and make frequent use of their dev releases, so I'm far from a neutral participant here, but I wanted to offer up a proposal.

DuckDB is trying to make it easy for people to install without requiring a bunch of compilation on the user side, something I imagine we all appreciate. The combinatorial explosion of OS, Architecture, and Python version means that while a given whl is on the order of about 10mb, a release (dev or otherwise) is closer to 500mb.

I've also greatly appreciated those pre-release wheels as features or bug-fixes make it in, especially with downstream packages that make use of duckdb because compiling duckdb on github actions can take a long time.

So, proposal:

DuckDB changes the frequency of dev releases from per-PR to "nightly at most" -- cut a release at some time each day, assuming that commits have landed in the previous 24 hours. DuckDB also continues their existing practice of culling those dev releases following a "proper" release.

PyPI bumps the project size to something > 10GB to allow for those nightlies to fit in between culls.

Thoughts?

Thanks to all the PyPI maintainers for the huge amount of work y'all put in to keep things humming along smoothly.

@hannes
Copy link
Author

hannes commented Jun 5, 2022

We have automated the removal of outdated dev releases

@adriangb
Copy link

adriangb commented Aug 29, 2022

I've started using DuckDB and since it's a new project have run into a couple bugs / features that I wanted from trunk that are not in the last release version. Installing a nightly build (without having to fetch, compile, etc.) is really useful.

DuckDB changes the frequency of dev releases from per-PR to "nightly at most" -- cut a release at some time each day, assuming that commits have landed in the previous 24 hours. DuckDB also continues their existing practice of culling those dev releases following a "proper" release.

It sounds like @hannes already automated cleaning out old dev releases. As an external observer it seems like the nightly (not every commit) upload cadence, cleaning out dev releases every production release / every X weeks plus a reasonable quota increase is a decent compromise on both sides.

@hannes
Copy link
Author

hannes commented Sep 10, 2022

Thanks @adriangb we have indeed automated the deletion. We are not very far off now from running out of space with just the regular releases. So could we please (please) get the quota increased?

@di di removed the status: awaiting response Needs more information before proceeding label Oct 12, 2022
@Mytherin
Copy link

@di Unfortunately the issue is not yet resolved - while we are currently automatically deleting pre-releases and only keeping 2~3 active at a time, we are getting very close to the 10GB project size limit even without factoring in the pre-releases. As we are providing wheel builds for many platforms that pypi supports the individual releases are large (around 650MB per release). At this rate, we will be running out of space for our regular releases as well in a a few months and will be forced to start deleting actual releases. We would really like to avoid doing that as we know there are people still relying on older versions of DuckDB.

Could the maintainers please have another look at this and reconsider raising our limit?

@di
Copy link
Member

di commented Dec 21, 2022

I've set the total project size limit to 30GB for now. If the project approaches this new limit, we can revisit. Please continue removing dev releases as much as possible. Thanks!

@di di closed this as completed Dec 21, 2022
@Mytherin
Copy link

That's great news, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants