Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle several repositories with same content? #3355

Closed
gsemet opened this issue Nov 12, 2020 · 25 comments
Closed

How to handle several repositories with same content? #3355

gsemet opened this issue Nov 12, 2020 · 25 comments
Labels
status/duplicate Duplicate issues

Comments

@gsemet
Copy link

gsemet commented Nov 12, 2020

Hello,

I have a use case I do not see supported by poetry. Tell me if I am wrong:

  • I have an artifactory A that can mirror pypi.org (this is caching, which is good)
  • This actifactory also stores private package
  • A is on private network on a dedicated baremetal server running in-premise
  • A is only accessible to some user (private network)

so far so good.

I also have another artifactory server B that I need to use a different authentication scheme, and behave like this:

  • packages on A is duplicated on B
  • using B costs a bit more (it's on a cloud server), and require a stronger authentication
  • so when possible, A is to be used preferably
  • if A is not available, then B can be used.

if I do not use the lockfile, it can work. I would be happy if I could generate 2 lockfiles (with the same version of each package of course), and user could "choose" if he wants to fetch from A or from B. Any idea on how to do it properly?

@gsemet gsemet added kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Nov 12, 2020
@sinoroc
Copy link

sinoroc commented Nov 12, 2020

Have you already configured custom repositories? How do your current pyproject.toml and/or poetry.toml files look like?

@gsemet
Copy link
Author

gsemet commented Nov 12, 2020

My idea (maybe wrong, tell me !) is that I would like to have something like:

  • Pyproject.toml:
[[tool.poetry.source]]
name = "artifactory-A"
url = "http://artifactory.myserverA/artifactory/api/pypi/pypi/simple"
default = true

[[tool.poetry.source]]
name = "artifactory-B"
url = "http://artifactory.myserverB/artifactory/api/pypi/pypi/simple"
default = false

It would generate 2 lock files: poetry-A.lock and poetry-B.lock. If A is available, poetry install would use poetry-A.lock, if not, it would use poetry-B.lock.

Content of both file would need to be synchronised (the same packages can be found on both servers). I did that by

  • rm poetry.lock
  • change the default key in tool.poetry.source
  • rename poetry.lock to the wanted lock filename
  • now by renaming the lock file, I can install from the server I want.

I know that having 2 lock files would look like overkill, but this is the only way to reuse the existing behavior without having to rework everything. From what I see on !447, it looks like if the poetry.lock file could be configurable in both commands poetry lock --no-update and poetry install it would do the job.

@sinoroc
Copy link

sinoroc commented Nov 12, 2020

I see... Are lock files critical in your use case?

I had in my mind that you wanted to be able to override the main source repository on a case-by-case basis (per user maybe). For example, say you have the following in your pyproject.toml:

[[tool.poetry.source]]
name = "myindex"
url = "https://a.dev/simple/"
default = true

then, for cases where A is not available (or the other way around), you might want to be able to do 1 of the 3 following suggestions:

  1. poetry config --local source.myindex.url 'https://b.dev/simple/' && poetry install
  2. poetry --config='source.myindex.url=https://b.dev/simple/' install
  3. POETRY_SOURCE_MYINDEX_URL=https://b.dev/simple/ poetry install

@gsemet
Copy link
Author

gsemet commented Nov 12, 2020

lock files ensure the environment is fully reproductible for both users. But why not having several lock files?

# user within the network:
$ POETRY_LOCK_FILENAME=poetry-serverA.lock poetry install
# user outside of the network:
$ POETRY_LOCK_FILENAME=poetry-serverB.lock poetry install

Generation would be:

$ POETRY_LOCK_FILENAME=poetry-serverA.lock poetry lock --no-update
$ POETRY_LOCK_FILENAME=poetry-serverB.lock poetry lock --no-update

Would you accept a Pull Request?

@sinoroc
Copy link

sinoroc commented Nov 12, 2020

@gsemet I am not a maintainer. But yes, as far as I can tell pull requests are always welcome. Bonus points if they contain good documentation, tests, etc.
Now, would such a feature pull request be merged? I do not know. Probably some more brainstorming would be needed...

Can it not be solved differently? For example you could decide to not distribute any poetry.lock file at all (or one that has a defect on purpose), in order to force the user to symlink one of poetry-serverA.lock or poetry-serverB.lock beforehand. This way no need for any change to poetry's code base.

@gsemet
Copy link
Author

gsemet commented Nov 12, 2020

I can work on a tool that would do that without touching poetry, indeed

@gsemet
Copy link
Author

gsemet commented Nov 12, 2020

how can easilly change the "default" source from the poetry config command line?

@sinoroc
Copy link

sinoroc commented Nov 12, 2020

how can easilly change the "default" source from the poetry config command line?

@gsemet I do not think it is possible.

@gsemet
Copy link
Author

gsemet commented Nov 12, 2020

ok thanks

@kleschenko
Copy link

@gsemet you can also export your lock-file into requirements.txt and use it to install dependencies by pip using --extra-index-url:

poetry export --output requirements.txt
pip install --index http://artifactory.myserverA/artifactory/api/pypi/pypi/simple --extra-index-url http://artifactory.myserverB/artifactory/api/pypi/pypi/simple -r requirements.txt

@sinoroc
Copy link

sinoroc commented Nov 13, 2020

Somewhat related #1632
I know it is not the same as @gsemet's use case (in their use case, it is mostly about the lockfile), but in case someone lands here while actually looking for the other use case.

@gsemet
Copy link
Author

gsemet commented Nov 13, 2020

Hi. First thanks to take the time to try understand and help me dealing with this point. Really appreciate it.

So, for your solution @kleschenko, that looks fine but the external user (the one that needs to use Server B) will have to do its virtualenv completely manually.

Regarding #1632, yes it is similar, server A (and B) is a cache of Pipy + internal packages. So, for the moment we hardcode the Server A URL in the pyproject.toml, because as soon as we use one private package, we cannot use pipy.org directly. But it would be helpful to make this transparent.

I prefere to work with lock files, because it ensure the two users (the over seeing server A and the other one using server B) will both end up with exactely the same virtualenv.

I also work with Conan, and their approach is to generate as many lock file as environments. I think it is the hard way of dealing with lock files, but I think it is the safest (there are some hash, url, that may changes between two lockfile generated from the same environment (using the method I described earlier), even with same packages versions. For example, the content-hash is different:

[metadata]
lock-version = "1.1"
python-versions = "^3.7"
content-hash = "4ac40bc0f66fc47df1d894c21e43611ba1f1dbb7c0311c6bcaacb5a5b79d537e"

I do not think patching lock files will work.

I think (that's my opinion, tell me yours) the best option would be:

  • pyproject.toml has all sources:
[[tool.poetry.source]]
name = "serverA"
url = "http://artifactory.serverA/artifactory/api/pypi/pypi/simple"
default = true
[[tool.poetry.source]]
name = "serverB"
url = "https://artifactory.serverB/artifactory/api/pypi/pypi/simple"
default = false
  • by default, serverA is used.
  • poetry update generate the lock file from serverA
  • the lock mecanism changes: poetry lock --file poetry-B.lock --default-source serverB
    • it generates poetry-B.lock
    • it replaces the default source by serverB
  • the install mecanism accept an additional parameter poetry install --lock poetry-B.lock

I will do this behavior with a script or even a tool arround poetry for the moment. So far it works.

But that would help a lot if this would be directly supported by poetry. It can even cover #1632

Other opion would be to completely hide "serverB" from the pyproject.toml and provide the url and login by command line (or envvar) but at the end the lock file should have the url to serverB and the hash from serverB.

@gsemet
Copy link
Author

gsemet commented Nov 13, 2020

So, I completely restarted my approach. Main because having 2 sources in the pyproject.toml may have side effects, for some reason, for example one of the cache fetch a more recent version or maybe for another reason, we can have reference for some package on server B in poetry-A.lock and vice versa.

At end, I wonder if having 2 lock files are really necessary. One we have the version of the package we need + its hash, it should not matter where it comes from.

So I did a big "sed" in pyproject.toml and poetry.lock to replace the URL and it works fine. So user that wants to use ServerB can replace the url and name and that works. I simply have a target in Makefile to switch:

switch-A:
	sed -i '' 's|http://artifactory.serverA/|https://artifactory.serverB/|g' poetry.lock pyproject.toml
	sed -i '' 's/serverA/serverB/g' poetry.lock pyproject.toml
	@echo "do not forget to setup your credentials with:"
	@echo "    poetry config http-basic.serverA username ApiKeyAPiKEy"

Only drawback is that we should take care not to commit this change.

@sinoroc
Copy link

sinoroc commented Nov 13, 2020

@gsemet I think it was a design mistake for poetry to have [[poetry.source.url]] in the project configuration (pyproject.toml), since each user might need to access the index differently (via a different URL). See #2940 for another example of how it is counter-productive.

Again it might seem not related to your actual issue, but I believe it is. It is all due to a common misconception people (me included) have/had about what the concept behind the index is and that then spread to many areas of Python package distribution. There is some good insight about this in this discussion.

In short there should always be just 1 and only 1 index. There might be different URLs to access it, but it should still be the same content. Distributions should be the same, hashes should be the same, and so on. For a project to declare that its dependency A should be fetched from index A and dependency B from index B, is an anti-pattern. So from there it does not make sense to declare source repository URLs in the project configuration.

It should be the user's decision to pick a server URL or another. Not all developers/users of a project might want to use the same URL to access the index (based on their location, network restrictions, etc.). Not all environments (production, staging, dev) have access to the same servers. Seen from this angle it becomes obvious that the contents of lockfiles should not contain server URLs either, hashes should be enough to ensure that the end result is conform to expectations.

Then there is the case where a project mixes public and private dependencies, but that is another story...

@sinoroc
Copy link

sinoroc commented Nov 13, 2020

At end, I wonder if having 2 lock files are really necessary. One we have the version of the package we need + its hash, it should not matter where it comes from.

@gsemet Exactly! Right on point.

@gsemet
Copy link
Author

gsemet commented Nov 13, 2020

I like the approach of conan actually. The repositories are not configured in the conanfile this is a separated configuration. You can have as many "remotes" as you want, but I still think a "default" one would be helpful for dummies.

When a package A (with a given sha1), it queries each remotes sequentially until it finds the right file.

@sinoroc
Copy link

sinoroc commented Nov 13, 2020

The repositories are not configured in the conanfile this is a separated configuration.

For poetry, I believe the source repository URLs should be set via poetry config --local (i.e. in the local poetry.toml file), not only for uploading as it is now but also for fetching.

a "default" one would be helpful

Should be PyPI. Or a default one could be set globally (for the user) via poetry config (i.e. in ~/.config/pypoetry/config.toml).

I reckon that setting a source repository URL for the project (i.e. in pyproject.toml) could be helpful in some cases, but it cannot be overridden (or am I missing something?), so it is not good UX.

When a package A (with a given sha1), it queries each remotes sequentially until it finds the right file.

Yes, this is already what poetry (and pip) does as far as I know (which is good, assuming you can easily set the index URLs but that is not easy).

@gsemet
Copy link
Author

gsemet commented Nov 13, 2020

this does not seem to work, if i remove the sources from pyproject.toml and only add in poetry.toml, like this:

[repositories]
[repositories.serverA]
url = "https://artifactory.serverA/artifactory/api/pypi/pypi/simple"

[repositories.serverB]
url = "http://artifactory.serverA/artifactory/api/pypi/pypi/simple"

or even only one, it poetry install does not use it and so fails on the first private package.

@sinoroc
Copy link

sinoroc commented Nov 13, 2020

@gsemet No, it does not work. I did not mean to create confusion.

I was just mentioning that in my opinion, this would be a better solution to use the repositories from poetry.toml (i.e. poetry config) as sources as well and not just as destination (i.e.: for fetching, and not just for uploading). But it is not the case, right now those repositories are only used for uploading.

@gsemet
Copy link
Author

gsemet commented Nov 13, 2020

ah, yes, I totally agree !

So I sed this switch in the toml file for the moment. It works but this is not elegant.

@gsemet
Copy link
Author

gsemet commented Nov 18, 2020

May I know if this feature would be accepted in Poetry if I work on it ?

My idea:

  • use like you said the poetry config (either local or global) to define the remote
  • if the pyproject.toml does not reference any remote, use the one defined in local/global if defined during install.
  • The lock file should not reference the repo used if not forced in the pyproject.toml

It basically replaces the behavior "if if not defined in pyproject.toml, it's pypi.org" by "if not defined in pyproject.toml, use local/global config, if not defined in config, use pypi.org" in both poetry install and poetry lock.

@sinoroc
Copy link

sinoroc commented Nov 18, 2020

@gsemet Makes sense to me. But I am not a maintainer. I heard from one of them (@abn) that a rewrite of the configuration subsystem is on their minds. So maybe it could happen during or right after that rewrite.

In any case, a good PR (with doc and test, etc.) is always welcome. But you could also start with a minimal draft PR, to get the discussion started, and this always open the possibility for you to use your own fork (not ideal obviously, but can be really helpful instead of waiting for merge into the main project).

@sinoroc
Copy link

sinoroc commented May 1, 2021

@gsemet There is a PR that could solve this issue. Would you be able to test it, give feedback? #3624

@neersighted
Copy link
Member

Closing as a specialized variant of #1632

@neersighted neersighted closed this as not planned Won't fix, can't repro, duplicate, stale Oct 4, 2022
@neersighted neersighted added status/duplicate Duplicate issues and removed kind/feature Feature requests/implementations status/triage This issue needs to be triaged labels Oct 4, 2022
Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/duplicate Duplicate issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants