-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monorepo support using groups and path dependencies #6850
Comments
cc @finswimmer because you asked for an issue on Discord 😃 |
@adriangb Do you think your solution is somehow similar to this? https://github.com/dermidgen/python-monorepo |
They are similar but not equivalent because:
|
Copying comments from #6845 (comment) and #6845 (comment) [...] I don't think path dependencies should be special at all; encouraging users to think of them as 'special' takes us down a path we have strictly wanted to avoid (e.g. leaking metadata beyond the Core Metadata spec between layers of the dependency tree because Poetry is used all the way down), and reduces the interoperability of Poetry with the rest of the ecosystem. I think [...] a workspace-like approach like suggested in #2270 is what we should explore, instead of trying to tear down the Core Metadata/PEP 517 wall between Poetry projects. [...] I think that makes my objection firmer -- special support for using path dependencies in this way (with a superproject that delegates to subprojects) is not desirable because it is ultimately incomplete unless we tear down the metadata barrier, and all the designs along that line have been rejected to date. It's possible we decide that the subproject approach is not tenable and go that way after all, but I think it would have to be comprehensive and not piecemeal. It would probably be better if it was a new dependency type if that were the case, instead of overloading This is effectively in reply to
This special handling (as proposed in this issue and as implied by the related PRs) has been explicitly rejected as undesirable up to this point, and while a seemingly obvious direction for extension, directly restricts and contradicts efforts to keep Poetry as interoperable as possible. When introducing net-new functionality into the ecosystem, we've generally tried to be very methodical as we will have to maintain it for a long time (even if the design is later shown to be flawed), and it will create confusion for our users (e.g. see Poetry's advanced source handling, which is far beyond that of any other tool, and frequently a source of confusion). As I said in my comments above, I don't think it's impossible we go that direction, but I do think we're going to draw the line at implementing anything intended to encourage that pattern until a comprehensive design is proposed and accepted. |
First an update on the situation in the OP. poetry lock --no-update doesn't workSeveral PRs have since addressed this: #6843, #6844 and python-poetry/poetry-core#520. This is now completely solved and a non-issue. No caching in Docker builds because source is required to install 3rd party dependenciesThis still needs something like #6845 if we want to get good docker caching. BoilerplateI think this can easily be solved either with a plugin that reduces boilerplate by internally expanding each entry in some No support for dependency groups in path dependenciesI don't think this is that big of a deal. I've had good luck using the tried and true pattern of making a I do agree that if Poetry wanted to support this it should probably not bastardize the |
The proposal in #6845 does not make any assumptions about the subprojects / path dependencies and they don't have to be Poetry projects for it to work. So I don't think there's any leaking of metadata because the subprojects are Poetry projects. All this proposal does is add a an argument to the
I specifically said "let's table this for now" because I agree with you: this is more complicated and requires some design work. This issue and the associated PRs (of which only #6845 is left really) don't proposing doing anything like this.
As a maintainer I understand that every feature carries a burden of future maintenance. I specifically designed the implementation in #6845 to be as simple as possible: it is only a couple lines of relatively simple code (boilerplate for the CLI argument and a single logical statement). Most of the PR is documentation and tests. I don't think maintenance burden would be too bad even if Poetry decided to pursue some other implementation of "subprojects" down the road.
Part of the reason I'm pursuing this pattern is that it can be implemented mostly by fixing some existing bugs/weird behavior (most of which we have done) and implement some minor non-intrusive features (like the CLI argument in #6845). There have been several "comprehensive designs" proposed and they've all died at some point (#2270 has been open for nearly 3 years with no progress being made). I think there's a lot to be said for making some minor changes now, learning from real world usage and then using this feedback to inform any future design decision. As mentioned above #6845 is the main thing missing for this approach and if that needs to be maintained for a couple years and then deprecated once a new comprehensive design is proposed and accepted then it's a price worth paying to keep moving forward. That said, just for clarify, what does it mean for a comprehensive deign to be proposed and approved? I've mostly seen PRs or issues being opened. Is there some sort of formal PEP-like process? Who oversees it or makes the decisions? |
@radoering since you've worked on some of the related issues your input here would be very much appreciated |
I've been working around this by using
I've been using this plugin to build wheels for projects with path dependencies: https://github.com/artisanofcode/poetry-stickywheel-plugin/ It's been working great so far, but as the name suggests, it doesn't work at all for sdists: artisanofcode/poetry-stickywheel-plugin#10 For my current use case that's fine, wheels are good enough. |
That does sound ugly 🙃. Aside from #6845 the other workaround I’ve seen/used is to use the
Thank you for linking that project, I’ll take a look! I suspect that wheels are more than good enough for 98% of use cases. |
I've also had to employ this trick for a specific use case :( |
I will try to describe my understanding. Please correct me if some of my assumptions are wrong. If I understand correctly, the only use case to skip all path dependencies during installation is docker (or similar) caching. This is especially useful in a monorepo use case but is not restricted to monorepos. Every time you run Dependency groups can't be used for this use case, because there might be transitive path dependencies and non-path sub-dependencies of path dependencies that shouldn't be ignored in order to cache as much as possible. (Maybe, we should write that somewhere in the docs in case someone is wondering.) Actually, this is not about path dependencies, i.e. file dependencies and directory dependencies, but only about directory dependencies, isn't it? If that's the case we should change the wording. (I know that both are sometimes lumped together but there are already differences between them, because directory dependencies are more volatile than file dependencies, e.g. directory dependencies are always resolved when running |
Yep that’s right. I agree that Docker + monorepos is going to be the main use case but there could be others, and #6845 is not monorepo or Docker specific in any way.
I wouldn’t call the environment “broken”, maybe “incomplete” but that depends on what your goal is. There’s already a precedent for this with
Yup exactly. I tried to touch upon this in the docs included with #6845 but I’m happy to add more detail where needed or unclear.
Yes this is only about path dependencies. I thought the name of the new option ( Regarding file dependencies, the reasons I excluded them from this work is:
|
My point is that a directory dependency is a path dependency and a file dependency is a path dependency, too.
|
Good point. Any ideas for the right term then? |
|
Let’s discuss more in #6845 since this is quite specific to that |
@adriangb Yes, I believe that PR does solve my problems :) Thank you! |
I was experimenting with this arrangement and there seems to be a potential problem with cli use of poetry. Using https://github.com/adriangb/python-monorepo, if you add pytest as a dev dependency in poetry/pyproject.toml you can run the following commands:
The failure will happen whether or not you have a poetry.toml with In a repo wil only 1 pyproject.toml file you can run commands in the virtual env from anywhere inside that repo, but with this arrangement you can only do so from the very top directory. People can probably get used to this, but it does seem like this would be a major source of complaints. How hard would it be to add some mechanism to flag that a pyproject.toml file doesn’t have an associated virtual env and poetry needs to search higher up the directory tree? |
pytest is particularly difficult to make work with monorepos, especially with a I’m not sure what error you get, is it because poetry creates a new virtual env in We can make a plug-in that lets you do There is also something to be said for each workspace having all of the dependencies it needs, including pytest, so that you can always work on that single workspace as if it were it’s own non-workspace package. |
The specific error is
This does look to be because poetry creates a virtual env in
For myself, we already use taskipy which lets you define tasks in your pyproject toml. So you can run
which would provide isolation to ensure the listed dependencies for each workspace is correct. The concern I have is more about developers working in one workspace wanting to run say an individual test and tripping over the fact you need to add extra options depending on where you are.
Yes. If you have a plugin that would create a virtual environment for each workspace and keep them up to date when changing the lockfile I think that would solve this problem. It would need that plugin though, otherwise you'd have to update each workspace manually. |
I pushed up a Makefile that is similar to what I used to manage a ~12 workspace project. You can choose to work from the root repo using the Locally we'd usually just have a single venv with all of the project installed. So you weren't really testing each one in isolation but you also didn't have to deal with multiple envs or syncing deps. I was also never able to get VSCode's pytest integration to work from the root repo so if I had to run tests in a debugger I'd have to cd into the project folder and manually select the top level virtual env. In CI we ran each project in it's own job so we could test them in isolation. We had a pre-commit hook to template the list of workspaces into a YAML file which we then fed into GitLab CI to run a parameterized job who's It all worked pretty well, but I’m not sure this is the ideal or only workflow. Experimenting in a plug-in is the right next step. |
Hi @adriangb First of all thanks for providing setup example and several PRs to make it work seamlessly! I have a question (which seems to be not addressed in none of the issues where this conf is discussed): is there a good way to make
Could you pls comment on that matter? |
I think it'd be doable by making a poetry plugin. But then you'd have to do |
@adriangb is there any update here on how to build wheels for subprojects that contain path dependencies? A use case we have is to be able to publish a subproject that depends on another subproject (via a path dependency) maintained in the monorepo. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Following up on https://discord.com/channels/487711540787675139/1032534656777715732/1032549486230253580
Summary: I think we can make some relatively simple and universally (as in not only beneficial to monorepos) changes to the existing
path
dependencies that will make them much easier to use for monorepos. There are also some more complex feature requests to handle but I think those can be tabled for future feature work.I had to migrate a Python monorepo at work from an ad-hoc structure to something with a more explicit separation of applications and components. I explored several options including monorepo build tools (Bazel, Pants, etc.) and Python package managers (Hatch, PDM and Poetry). I won't get into the details but the TLDR is that the build tools are too complex for a simple Python-only monorepo and the other package managers have missing features. There are several ways to structure monorepos with Poetry, but after surveying a lot of blog posts, issues and plugins I came across this one: https://github.com/martinxsliu/poetry_workspace_plugin (@martinxsliu if you're out there thank you and feel free to chime in). While I don't plan on using that Plugin necessarily (I don't need most of the functionality it provides) it had some interesting ideas. The main thing I changed was using Poetry's existing
group
feature to organize the path dependencies to projects so that even without a plugin you can get basic functionality.The good
So the final pattern I landed on is this: https://github.com/adriangb/python-monorepo/tree/main/poetry
TLDR is:
So you can then do:
Which will install
app
,lib
and any 3rd party dependencies!The bad
poetry lock --no-update
doesn't workThis is alluded to in the plugin linked to above: https://github.com/martinxsliu/poetry_workspace_plugin/blob/master/poetry_workspace/plugin.py#L83-L107
Even worse: if you delete any path dependencies (delete the folder, delete it from
pyproject.toml
)poetry lock --no-update
straight up crashes becauseDirectoryDependency
is used to load the lock file (which still has a reference to the deleted dependency) and unlike VCS and other dependencies it validates that the path exists in__init__
. This means your options are to (a) runpoetry lock
and update all 3rd party dependencies (b) delete it from pyproject.toml first, runpoetry --lock-update
, delete the folder and runpoetry lock --no-update
again) or (c) editpoetry.lock
before runningpoetry lock --no-update
. Especially with--no-update
becoming the default at some point, this is problematic.I've submitted a couple of PRs to fix this:
poetry lock --no-update
after deleting a path dependencyNo caching in Docker builds because source is required to install 3rd party dependencies
When you're building a Dockerfile that uses Poetry it's best to use
--no-root
to install 3rd party dependencies first (which only requires apyproject.toml
and apoetry.lock
lockfile) so that the layer which installs 3rd party dependencies gets cached across builds (it's often both the slowest and least frequently changed step). This breaks down if you have any path dependencies because you'll need to copy them over in order to install 3rd party dependencies (even using groups there's no way to skip the path dependencies but install its transitive 3rd party dependencies).I opened #6845 as a way to address this. I'm not sure if
--no-path
is the best way; I'm open to other options, but the basic idea is there.No way to build sdists/wheels for projects with path dependencies
In my example repo you can't run
poetry build
oncli
because it depends onlib
. One might want to do this if e.g. you have a backend and a client you are publishing on PyPi that share a common component/library. There are several open issues around this, which I won't get into detail on. My thoughts is that it would be nice to have a feature likenamespace-lib = { path = "../lib", wheel = { version = "~={}.{}.{}"} }
where the{}
inversion
means "insert the current version" ornamespace-lib = { path = "../lib", wheel = { target = "path/inside/wheel" } }
. I think this quite complex of a topic that needs more thought. Even Cargo hasn't fully figured this out since they don't support inheriting the version from workspace dependencies (theiroverrides
feature may be of interest but is really beyond the scope of this issue). I think this can be solved but I would table it for now since it needs more thought and will likely be more invasive to the codebase.#1168 also is related to this particular use case and might be an alternative path forward.
No support for dependency groups in path dependencies
It would be interesting to have some special handling for Poetry path dependencies so that we can "propagate" groups from path dependencies to workspace/root projects (for example, choosing to install the
test
group in a path dependency). I think this can be done but I don't think it should be a blocker for other stuff, also going to table this for now.The ugly
Boilerplate
There's a lot boilerplate involved in the whole
groups
and path deps thing, especially if you get into listing extras. I think this is where a plugin might shine: it could completely replace these sections with a[tool.workspaces]
section or just serve to keep them in sync with the filesystem (to avoid manually writing the boilerplate).The text was updated successfully, but these errors were encountered: