Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFP for successor to data_files-based extension discovery? #351

Closed
bollwyvl opened this issue Nov 21, 2020 · 11 comments
Closed

RFP for successor to data_files-based extension discovery? #351

bollwyvl opened this issue Nov 21, 2020 · 11 comments

Comments

@bollwyvl
Copy link
Contributor

@blink1073 I'm saying support both. I personally would be happy to never use data_files again unless their level of support changes. - #224 (comment)

The threshold of "change" is met, but not in not the way suggested, and not looking better in 2021, with changes coming to pip, setuptools, etc.

Maybe it's time to open something akin to a requests for proposals for ways forward?

It would appear the viability is falling of data_files as a way for python projects to ship extension assets, e.g. js/css, kernelspecs, and configuration, e.g. jupyter_config. I think we need to think about some ways that we can appease:

  • ease of distribution of core jupyter packages' assets
  • ease of installation of the "official" jupyter packages (which I guess is a python sdist/whl)
  • ease of re-distribution via "unixy" package managers (e.g. conda, brew, apt), etc. as that may be able to preserve some of the current end user experience

I've some ideas, but would love to hear out some more thoughts! Oh, and if this belongs somewhere else, please let me know... I'm sure at some point this will end up having to have a JEP-level clarification, but...

@SylvainCorlay
Copy link
Contributor

I think that data files is still the way to go, as it provides a very clear API to manage content under PREFIX/share and PREFIX/etc.

@maartenbreddels
Copy link
Contributor

maartenbreddels commented Nov 21, 2020 via email

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Nov 21, 2020 via email

@bollwyvl
Copy link
Contributor Author

This draft PR shows one way forward which would be compatible with just about everything, require very few downstream changes, and only uses PEP 621-compliant meta data.

@maartenbreddels
Copy link
Contributor

I like it Nick, indeed I agree with many points you bring up, and it's making me a bit sad(I see data_files as super duper fundamental), but your solution makes me a bit happier. I'll comment more in your pr.

@bollwyvl
Copy link
Contributor Author

Still feeling the pain on this.

Turns out having files in-tree and in data_files leads to them being shipped twice. For "a little python" or whatever, this is no big deal. For shipping ipydrawio, however, which is a full data-drive design tool, the whl is currently sitting just under 70mb, and expands to ~200mb (lots of un-compressed XML, twice). The sdist is 30mb, because tar.gz is apparently smarter than whl, but still unpacks to the same size.

@layne-sadler
Copy link

layne-sadler commented Mar 10, 2021

Does appdirs apply here? Looks like Sublime is using it for packages.

image

@bollwyvl
Copy link
Contributor Author

🎉 flit might soon get support for data_files: pypa/flit#510

The approach looks like a single data root, so a nominal jupyter-extending package might be like:

data/
  share/
    jupyter/
  etc/
    jupyter/
src/
  kitchen_sink/
    __init__.py
pyproject.toml

...and single line in pyproject.toml would ensure all those files get deployed correctly. Big win.

@blink1073
Copy link
Contributor

Hmm, it seems like at that point we'd be better off wrapping flit to add a build step in jupyter-packaging. And server extensions with no build step could just use flit directly.

Zsailer added a commit to Zsailer/jupyter_server that referenced this issue Nov 18, 2022
@blink1073
Copy link
Contributor

Closing this, since we've settled on using shared_data from hatch.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Jan 8, 2023

Yes, we needn't change anything on this repo (or jupyter_core), as kernel (see below), extension, and other tool authors today have the option of declaring this in pyproject.toml for any number of PEP 517 build backends:

A cursory check reveals poetry and maturin still lack this feature... the former bothers me not one bit, but the latter could eventually become a concern.

Perhaps we can dream of a future where PEP XXX: Prefix Data (as 621 has disowned this problem) clarifies this so it can move into a single pyproject.toml#project field (e.g. project.prefix-data) with defined --editable behavior, instead of 10 different things with different data models. 😴 ☁️

Aside: about kernelspecs

On a partial tangent, regarding kernelspecs: jupyter_* (specifically client, perhaps) could improve the situation for reproducible, minimal distributions. Specifically, selecting data formats/syntaxes that are more cross-platform, and therefore tolerant to string replacement, would help. The worst case is JSON kernelspec files with respect to paths, especially on windows, which have been a long-standing source of problems.

In light of the above:

  • use more normalized URIs to avoid windows paths, e.g. file:///c:/prefix-placeholder
  • TOML might also be a reasonable format, as it supports python-style triple (single) quotes, e.g. '''

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants