Skip to content

Fix and Improve Flakeref for Gitlab repos.#9163

Open
Silver-Golden wants to merge 2 commits intoNixOS:masterfrom
Silver-Golden:9161
Open

Fix and Improve Flakeref for Gitlab repos.#9163
Silver-Golden wants to merge 2 commits intoNixOS:masterfrom
Silver-Golden:9161

Conversation

@Silver-Golden
Copy link
Member

@Silver-Golden Silver-Golden commented Oct 15, 2023

Motivation

Fix the broken input url's and improve the flake input url and attribute set for Gitlab

Context

The original goal of #8773 (and then #9160) was to remove the need for using %2F everytime a / was encountered.
This was limited to just the attribute set because it was using the Github format to parse the url.
Without anything else happening I would have kept with that particular pull request.

#6614 was the nudge that changed things.
Its purpose was to change all teh percent encoded characters ( %2F) into the actual character they represented.
However this broke Gitlab URL's (#9161) due to the fact that Github has owner/repo/branch but Gitlab can have Group/SubGroup1/SubGroup2/Repo

The move to create GitlabLikeInputScheme and GithubLikeInputScheme was inspired by @roberth 's comment:

Another thing that needs to happen is that the Gitlab code should stop inheriting any code that makes Github assumptions.
Most of the logic in GitArchiveInputScheme should be moved down into a new GitHubLikeArchiveInputScheme.

The change made was for both to inherit from GitArchiveInputScheme to avoid duplication.

My implmention does cause a breaking change in that anyone using the last part of the url for the ref/rev will need to change their input to use ?ref=/?rev=.
Group%2FSubGroup/Repo/Rev -> Group/SubGroup/Repo?rev=Rev
Group%2FSubGroup/Repo/Ref -> Group/SubGroup/Repo?ref=Ref
This is because it is virtually impossible to determine if the last segment is repo/ref/rev without calling the Gitlab API.
Due to the current state of all gitlab URL's (with subgroups) being broken I believe that this is a worthwhile change.
Especially since users are already used to using ?host= if they use a selfhosted gitlab instance, getting them to use ?ref=/?rev= and being explicit shouldn't be too difficult.
Some discussion can be found here: #8773 (comment)

Tests could be a good thing to add, however I don't have the skills myself for that (still new to C++).
This patch supersedes #8773.
This patch should close out #9161 on completion.

Priorities

Add 👍 to pull requests you find important.

@github-actions github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Oct 15, 2023
@fricklerhandwerk
Copy link
Contributor

Triaged in Nix team meeting:

  • assinged @roberth - it won't be a swift review, but we try our best

@Silver-Golden
Copy link
Member Author

Silver-Golden commented Oct 27, 2023

No worries about the swiftness, its a big enough change but it should help solve a host of issues
(Also hi roberth again :D )

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-10-27-nix-team-meeting-minutes-98/34695/1

@Silver-Golden Silver-Golden force-pushed the 9161 branch 2 times, most recently from f9f491d to 42f93c0 Compare November 20, 2023 01:23
@Silver-Golden
Copy link
Member Author

Silver-Golden commented Nov 20, 2023

@roberth Rebased it with the recent changes to Github.cc so no more merge conflicts.
Tested locally and works as expected.

Again, I am not all that great with c++ but hopefully its grand.

@Silver-Golden
Copy link
Member Author

Rebased yet again.
It seems it is currently an unfortunate time for me in that this file has had a lot of updates (after being stable for so long)

@fricklerhandwerk
Copy link
Contributor

Needs user-facing docs and release notes.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that this will put many flake authors in an impossible dilemma between supporting newer or older nix.

This is probably only the top of the iceberg because gitlab is often used as a private instance. sourcegraph query

Perhaps we could accommodate these users by adding a retry logic to the actual fetcher, to reinterpret its arguments, to recover the original intent?

Also concerning is that we don't have a VM test for gitlab fetching. Or any test at all really.
We could borrow the setup from the gitlab test in NixOS.

Comment on lines +423 to +424
auto owner = std::regex_replace(getStrAttr(input.attrs, "owner"), std::regex("/"), "%2F");
auto repo = std::regex_replace(getStrAttr(input.attrs, "repo"), std::regex("/"), "%2F");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it would be better to use a general percent encoding function here and in similar calls below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in create a new function to replace it the %2F?
Or would ye know of an already existing one I can use?

// FIXME: get username somewhere
Input::fromURL(fmt("git+https://%s/%s/%s.git",
host, getStrAttr(input.attrs, "owner"), getStrAttr(input.attrs, "repo")))
Input::fromURL(fmt("git+https://%s/%s/%s.git", host, owner, repo))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(non-blocking) This shouldn't have to go through fromURL, but this could be a separate issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is how it is already

Input::fromURL(fmt("git+https://%s/%s/%s.git",

I did not know enough to make changes to it, only extracting out the owner/repo.
What should it be doing instead?

@fricklerhandwerk
Copy link
Contributor

I'm concerned that this will put many flake authors in an impossible dilemma between supporting newer or older nix.

Why is that?

@Silver-Golden
Copy link
Member Author

I'm concerned that this will put many flake authors in an impossible dilemma between supporting newer or older nix.

Why is that?

Kinda is that way already.
Prior to #6614
skynet_discord_bot.url = "gitlab:compsoc1%2Fskynet/presentations?host=gitlab.skynet.ie"; worked
(I wanted to be able to use skynet_discord_bot.url = "gitlab:compsoc1/skynet/presentations?host=gitlab.skynet.ie"; )
Now it dosent, so if ye want to use the best feature of Gitlab (folders) and have a compact input you have to use an older version of nix.
Then ye have to spend a good chunk of time figuring out wtf went wrong only to realize that a combination of assumptions and inflexibility lead to this.

@Silver-Golden
Copy link
Member Author

The sourcegraph is interesting (when ye expand the search), most are for the mailserver (which we also use) and only two are using ?rev, none are using %2F.
Its clear that teh cat is long out of the bag (Last chance it could have been done in 5633c09) .

Semi random thought, kinda a hack.
The core part of the logic is mostly fine (still needs L423 type of thing anyway to handle the slashes)
The part thats not quite right is the inputFromURL.
Would it be possible/worthwhile to have a new scheme (such as "gl") which extends GitLabInputScheme and properly seperates/overwrites out the inputFromURL so:
gl:full/path/to/the/repo?rev=hash just works?

Ye would end up with two schemes for gitlab, one with github style, and one gitlab style but it wouldnt break any existing flake.

(Hopefully this isnt a fever-dream idea since its something like 3am for me)

@chasecaleb
Copy link

This issue has been causing chaos for me at work, where we use GitLab. Is there anything that I can do to help move this PR along? What else needs to be done?

@Silver-Golden
Copy link
Member Author

At this point only @roberth and @edolstra can really do anything to get this to progress.
I know it was stated:

it won't be a swift review, but we try our best

But it is stretching on for longer than I would like.

@tennox
Copy link
Contributor

tennox commented Jan 20, 2024

shy bump 😅

Or: is there a workaround for flake inputs from repos in subgroups that I can use in the meantime? 🤔

@Silver-Golden
Copy link
Member Author

shy bump 😅

Or: is there a workaround for flake inputs from repos in subgroups that I can use in the meantime? 🤔

Workaround is: this

    name_of_item = {
      type = "gitlab";
      host = "gitlab.example.com";
      owner = path%2Fto";
      repo = "repo";
    };

This would be gitlab.example.com/path/to/repo, notice teh %2F?

@chasecaleb
Copy link

@roberth @edolstra (or anyone else), can resolving this please be prioritized? I'm willing to help if there's anything I can do.

Devs at my work use NixOS with a shared module that's in a GitLab subgroup, so we have a ton of Nix flake repos that are broken by this bug. We've been pinning nix to 2.17 to avoid this bug, but now that's problematic due to CVE-2024-27297. Now we're stuck between a rock and a hard place of either being insecure or needing to change the input style on ~100 repos.

Gitlab expects the repository path to be percent-encoded in the API
Contrary got GitHub or other forges, GitLab allows nested organisation
(`gitlab.com/foo/bar/baz`). This means that we must treat GitLab urls
differently as we can't assume that `gitlab.com/foo/bar/baz` means “the
`baz` revision of the `foo/bar` repository”.

Instead make it mean “the `baz` repository in the `foo/bar`
organisation” and require passing the ref or rev via a query string.
@thufschmitt
Copy link
Member

Just gave this a look. I had to tweak it a bit because it wasn't working any more on master. Also deduped things a bit to make the diff smaller (I hope).

The current iteration seems correct (@roberth I'll let you have a second pass on this to confirm).

I'd also like to emphasise @roberth 's command above:

Also concerning is that we don't have a VM test for gitlab fetching. Or any test at all really.
We could borrow the setup from the gitlab test in NixOS.

Anyone caring about proper GitLab URL support: please consider contributing that. I don't think anyone from the maintenance team will have the bandwidth for it in a foreseeable future, and that's absolutely critical to keep it properly supported.

@thufschmitt
Copy link
Member

(Urgh apologies @Silver-Golden , I naively force-pushed after rebasing, forgetting that this wasn't my branch. Let me know if you want me to revert the changes and merge them more nicely with your existing work)

@Silver-Golden
Copy link
Member Author

@thufschmitt as long as it works it will be grand.

On a related note I have been musing more on #9163 (comment) (gl:full/path/to/the/repo?rev=hash) and suspect that it might be the best way forward, elsewise folks (like ye) may have to update a large chunks of their flakes.
To keep the gitlab: as legacy and switch over to recommending gl: for future.

@roberth
Copy link
Member

roberth commented Mar 22, 2024

IIUC it doesn't have to be gl: indefinitely if we take good advantage of the lock file. nix build etc would find the right legacy info there, but nix flake lock would on old flakes would need a little intervention. That seems ok, because if you're re-locking a flake, it's not uncommon to have to edit the expressions anyway. Editing the flake.nix inputs would "just be one of those updates".
We definitely do need gl: for now though, assuming we keep supporting string flakerefs, and I think we do.

GitLab allows to nest organisations (`gitlab.com/org/suborg/subsuborg/repo`).
The flakeref URL for GitLab repositories now supports that.

This means that it isn't possible any more to use the `/<ref>` (or `/<rev>`) syntax for GitLab repositories (as that would be ambiguous), and users need to pass an explicit query parameter (`?ref=<ref>` or `?rev=<rev>`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's precedent in other tools for //<ref>.
If we don't implement that, I think we should explicitly deny it, to avoid confusion and perhaps allow it later.
Relevant notes: #8773 (comment)

Less critically, we may also consider supporting /<rev>, as full revs are easily distinguished by their length and hex characters.

@folliehiyuki
Copy link

folliehiyuki commented Sep 10, 2024

As a normal Nix user that stores all of his flakes on GitLab, I'd love to have gitlab: ref switch the functionality gradually, instead of introducing a new gl: ref. Updating every inputs to gl: and then back to gitlab: some time in the future would be a pain. I'd rather do it once when the functionality lands. Maintaining both gl: and gitlab: ref types forever also seems like a terrible idea to me.

Having gitlab: supports both the encode character %2F and slash / at the same time for the time being would be nice, with a warning printing to terminal output if the legacy input is used. Then, in the next version (or 2 version bumps later) the legacy support can be removed.

@xanderio
Copy link

While the workaround of manually encoding the "subgroup" slash as %2F works for nix, this causes other tools to fail that read the lock file, as git itself doesn't handle URL encoding. In my case this cases renovate to fail, after renovatebot/renovate#31921 was merged. Is there anything that I could in order to get this fix merged?

CC: @SuperSandro2000

xanderio added a commit to xanderio/renovate that referenced this pull request Mar 17, 2025
Do to a bug in nix one needs to uri encode the owner of a gitlab repository
that resides in a subgroups, in this case the `/` as `%20F`. This
however breaks the `git ls-remotes` which renovate runs to discover if
the input can be updated.

This commit passes the owner attribute through `decodeURIComponent` to ensure
that `git` is unware of this workaround on nixs side.

Nix PR: NixOS/nix#9163
@xokdvium
Copy link
Contributor

xokdvium commented Oct 5, 2025

I think this has stalled because of concerns of breaking changes. I think to unblock this we can proceed to:

  1. Add more comprehensive tests for the current behavior. The infrastructure for tests is set up. Contributing tests would be very appreciated.

InputFromURLTestCase{
// FIXME: Subgroups in gitlab URLs are busted. This double-encoding
// behavior exists since 2.18. See issue #9161 and PR #8845.
.url = "gitlab:/owner%252Fsubgroup/////repo%41///branch%43////",
.attrs =
{
{"type", Attr("gitlab")},
{"owner", Attr("owner%2Fsubgroup")},
{"repo", Attr("repoA")},
{"ref", Attr("branchC")},
},
.description = "gitlab_ref_slashes_in_path_everywhere_with_pct_encoding",
.expectedUrl = "gitlab:owner%252Fsubgroup/repoA/branchC",
}),

  1. Implement Add a warning infrastructure #10281 (comment). I've assigned this to myself.
  2. Split out GitlabLikeInputScheme.
  3. Make the canonical representation of owner and repo in-memory be percent decoded as is the case for all other flake inputs. This is technically breaking change, though I doubt the current status-quo is productive. At the API download URL construction time do proper URL encoding so that / -> %2F when calling out to Gitlab API. This would allow inputs specified via attrsets to have group/subgroup specified without any hacks.
  4. Add back-compatibility shims to reintepret %252F as well as %2F as separators. This will ensure that nix stays backwards compatible for the interim. (5) would also change the canonical representation of newly created lockfiles, but old lockfiles would not be modified because of the canonicalization check that is done before overwriting stale lock files.
  5. Discourage specifying ref/rev in the path via the warnings mechanism (via (2)) specifically for GitlabLikeInputScheme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command

Projects

Status: 🏁 Review

Development

Successfully merging this pull request may close these issues.

10 participants