Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotGitRepository error when installing multiple packages from one git repository #6958

Closed
4 tasks done
gnuletik opened this issue Nov 3, 2022 · 15 comments · Fixed by #9658
Closed
4 tasks done

NotGitRepository error when installing multiple packages from one git repository #6958

gnuletik opened this issue Nov 3, 2022 · 15 comments · Fixed by #9658
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@gnuletik
Copy link

gnuletik commented Nov 3, 2022

  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.

Issue

It seems that a race condition occurs when installing two packages:

  • from the same git repository
  • with a different subdirectory
  • on a non-default git branch

Repro:

cd /tmp
git clone https://github.com/gnuletik/poetry-lib-monorepo-issue
cd poetry-lib-monorepo-issue
poetry install

It fails with

Package operations: 2 installs, 0 updates, 0 removals

  • Installing package1 (0.1.0 c6f487b): Failed

  NotGitRepository

  No git repository was found at /private/tmp/test-poetry/.venv/src/poetry-multipackages-example

  at /opt/homebrew/Cellar/poetry/1.2.2/libexec/lib/python3.10/site-packages/dulwich/repo.py:1090 in __init__
      1086│             elif (os.path.isdir(os.path.join(root, OBJECTDIR))
      1087│                     and os.path.isdir(os.path.join(root, REFSDIR))):
      1088│                 bare = True
      1089│             else:
    → 1090│                 raise NotGitRepository(
      1091│                     "No git repository was found at %(path)s" % dict(path=root)
      1092│                 )
      1093│
      1094│         self.bare = bare

The following error occurred when trying to handle this error:

NB: output of poetry install -vvv can be found here: https://gist.github.com/gnuletik/ddcb05ff3467f022f9d3540f379763df

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

@gnuletik gnuletik added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Nov 3, 2022
@gnuletik gnuletik changed the title NotGitRepository error when installing multiple packages from one git repository on a non-default branch NotGitRepository error when installing multiple packages from one git repository Nov 4, 2022
@24rr
Copy link

24rr commented Dec 2, 2022

Based on the error message you provided, it looks like the package you are trying to install requires a git repository, but the installation process is unable to find one at the specified location: /private/tmp/test-poetry/.venv/src/poetry-multipackages-example.

To fix this error, you will need to first determine the root cause of the problem. This may involve examining the package's code, as well as the installation process, to identify any issues. It may also be helpful to consult the documentation for the package, or seek help from the package's maintainers or the community.

Once you have determined the cause of the error, you can then take the appropriate steps to fix it. This may involve modifying the package's code, changing the way it is installed, or taking some other action.

@neersighted
Copy link
Member

@pneb In this case, the fault lies with Poetry; the diagnosis in the original issue appears correct to me. Related: #7113.

@danieldanciu
Copy link

We are also seeing this issue with a docker build that depends on multiple packages from the same git repository.

I suspect that as more and more people adopt the monorepo strategy that is now quite well supported by poetry.

None of the workarounds presented here worked for us, we had to manually serialize the installation of the packages to avoid the race condition.

@gnuletik
Copy link
Author

gnuletik commented Jun 8, 2023

@danieldanciu can you describe the following ?

we had to manually serialize the installation of the packages to avoid the race condition

Did you run a pip install (in your venv) before running poetry install?

@pdarulewski
Copy link

pdarulewski commented Jun 12, 2023

Are there any workarounds for this? I have multiple misc modules in a utilities repo and I'd really like to use a few of them in other projects.
The issue is pretty annoying because it's hard to pinpoint the exact problem. Especially when the installation seems to work locally but then it randomly fails in CI or in a Docker container, and after retrying, it works again.
I have the same issue for Poetry 1.3.2, 1.4.2, and 1.5.1.

@gnuletik
Copy link
Author

@pdarulewski I think that the root issue is in the way poetry clone multiple dependencies in parallel.

The fix could be something that disable parallel install for dependencies that comes from the same repository.

if parallel is None:
parallel = config.get("installer.parallel", True)
if parallel:
self._max_workers = self._get_max_workers(
desired_max_workers=config.get("installer.max-workers")
)

You could try to totally disable parallel installer with:

poetry config installer.parallel false

as stated here #7949 (comment)

@pdarulewski
Copy link

@gnuletik yes, I think so too, I guess I've had other errors related to the .git directory of the monorepo inside the project's virtualenv directory.
Setting the parallel to false seems to work, although as expected, the installation time is much slower. It's fine for now, thanks for the hint

@Oblynx
Copy link

Oblynx commented Sep 25, 2023

This would be a great fix! We also use monorepos to handle private python packages and end up with this issue. Turning parallelism off can increase the build time x10 for a large project...

@ogreyesp
Copy link

@gnuletik

Setting the parallel to false didn't work in my case.

@JonathanRayner
Copy link

JonathanRayner commented Jul 16, 2024

Please note that subsequent calls may succeed but a fresh install (after a poetry env remove --all) always fails.

Does anyone have any ideas on how to better consistently reproduce this? I can reproduce it sometimes locally, but not always, which is making fixing it a pain. @gnuletik I was able to reproduce it a few times with your repos, but not every time (even after deleting the environment).

*edit: I seem to be able to reproduce it more consistently running poetry install with this repo https://github.com/JonathanRayner/some_other_repo

@JonathanRayner
Copy link

I see a few possible ways forward, but can I ask: what is the expected behavior?

Suppose the following monorepo structure:

monorepo/pkg_1/pyproject.toml
monorepo/pkg_2/pyproject.toml

and another repo that wants to use pkg_1 and pkg_2 as git dependencies:

some_repo/pyproject.toml

which is

[tool.poetry]
name = "some_repo"
version = "0.1.0"
description = ""
authors = ["my name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.10 <3.13"

pkg_1 = {git = "[email protected]:MyOrg/monorepo.git", subdirectory = "pkg_1"}
pkg_2 = {git = "[email protected]:MyOrg/monorepo.git", subdirectory = "pkg_2"}

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

When the user installs some_repo, there are some possibilities of what should happen

  1. The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.
  2. The repo monorepo is cloned twice, completely independently for pkg_1 and pkg_2.

@Jozefiel
Copy link

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

@JonathanRayner
Copy link

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

@Jozefiel
Copy link

  • The repo monorepo is cloned once and reused to install pkg_1 and pkg_2. This advantageous for large repos. We would need to either throw an error if pkg_1 and pkg_2 point to different branches/revs or allow for reverting to two separate clones if this is the case.

The 1. option with throwing error is probably breaking change for us. We are using monorepo approach for storing microservices APIs. Then in other projects, we combine package releases (tags) based on deployment. With throwing error, monorepo approach will not be suitable anymore.

Fair! It sounds like a separate clone per parallel install is a sensible default then? ie. each package is completely separate. Perhaps people with very large monorepos use other tooling to handle reducing redundancy with multiple clones anyway?

Maybe git worktree can solve both problems?

gustavgransbo added a commit to gustavgransbo/poetry that referenced this issue Aug 30, 2024
Multiple installs from the same git repository causes arace condition
when the repository is cloned.
This commit only allows one operation per repository to be executed by
the parallel workers.
Any extra operation will be performed serially.
Since the git repository is cached after the first operation, this is
blazingly fast.

Resolves: python-poetry#6958
radoering pushed a commit to gustavgransbo/poetry that referenced this issue Sep 27, 2024
Multiple installs from the same git repository causes arace condition
when the repository is cloned.
This commit only allows one operation per repository to be executed by
the parallel workers.
Any extra operation will be performed serially.
Since the git repository is cached after the first operation, this is
blazingly fast.

Resolves: python-poetry#6958
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
9 participants