Skip to content

Improve Flakeref for Gitlab repos.#8773

Closed
Silver-Golden wants to merge 1 commit intoNixOS:masterfrom
Silver-Golden:master
Closed

Improve Flakeref for Gitlab repos.#8773
Silver-Golden wants to merge 1 commit intoNixOS:masterfrom
Silver-Golden:master

Conversation

@Silver-Golden
Copy link
Member

Motivation

Gitlab has the option to have repos inside folders for example: https://gitlab.com/gitlab-org/build/omnibus-mirror/alertmanager

Under the current system (docs) we would have to use

example = {
  type = "gitlab";
  owner = "gitlab-org";
  repo = "build%2Fomnibus-mirror%2Falertmanager";
};

or

example.url = "gitlab:gitlab-org/build%2Fomnibus-mirror%2Falertmanager";

Context

I first encountered the problem myself in #6435 where I learnt about teh work around and that got me going for that time.
My university's computer society is now using Gitlab and NixOS for our servers and I can foresee that using %2F will cause some issues for folks being trained/taught NixOS.
Also I think it looks ugly AF (that alone should be enough reason to fix it).

My ideal solution would have been to allow slashes in the example.url above, however due to #4061 it could cause breaking changes for any flake using the current implementation.

So as a result I am targeting the attribute directly.

example = {
  type = "gitlab";
  owner = "gitlab-org";
  repo = "build/omnibus-mirror/alertmanager";
};

In the github.cc every / in the repo field will get replaced with a %2f, maybe not the best but it is functional
If there is a better way for doing it please let me know.
I added to the docs as well, not 100% sure its the ebst placement but I figured it was best to keep it close to where teh original mentioned the workaround.

I built nix and tested the changes locally (in wsl) and it works for my example above.

Checklist for maintainers

Maintainers: tick if completed or explain if not relevant

  • agreed on idea
  • agreed on implementation strategy
  • tests, as appropriate
    • functional tests - tests/**.sh
    • unit tests - src/*/tests
    • integration tests - tests/nixos/*
  • documentation in the manual
  • documentation in the internal API docs
  • code and comments are self-explanatory
  • commit message explains why the change was made
  • new feature or incompatible change: updated release notes

Priorities

Add 👍 to pull requests you find important.

@github-actions github-actions bot added documentation new-cli Relating to the "nix" command fetching Networking with the outside (non-Nix) world, input locking labels Aug 2, 2023
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in behavior is good. Arguably the repo attribute should be in terms of the gitlab "domain", which means it's a path with /s.

I'm not so sure about the overall implementation, but that's a pre-existing tech debt that we don't have to resolve right away - ie use a URL library or such. Afaic, we can merge this after resolving the std vs boost question.

auto host = maybeGetStrAttr(input.attrs, "host").value_or("gitlab.com");
auto repo = boost::replace_all_copy(getStrAttr(input.attrs, "repo"), "/", "%2F");
// See rate limiting note below
auto url = fmt("https://%s/api/v4/projects/%s%%2F%s/repository/commits?ref_name=%s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we constructing urls with string concatenation 🤦

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of repo has to go into the url encoded, one way or another.
Currently people are manually encoding it for example: build%2Fomnibus-mirror%2Falertmanager.

The goal of this change is to allow both build%2Fomnibus-mirror%2Falertmanager and build/omnibus-mirror/alertmanager

#include <optional>
#include <nlohmann/json.hpp>
#include <fstream>
#include <boost/algorithm/string/replace.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just std::string's replace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try using that initially.
The main problem i hit there was it can only replace a char with another char.
It does not work when you try to replace a char with a string.

@Silver-Golden
Copy link
Member Author

Arguably the repo attribute should be in terms of the gitlab "domain", which means it's a path with /s.

Should I adjust it so that both the owner and repo can take slashes?

So that would give users the flexability of all these:

example1 = {
  type = "gitlab";
  owner = "gitlab-org";
  repo = "build/omnibus-mirror/alertmanager";
};

example2 = {
  type = "gitlab";
  owner = "gitlab-org/build";
  repo = "omnibus-mirror/alertmanager";
};

example3 = {
  type = "gitlab";
  owner = "gitlab-org/build/omnibus-mirror";
  repo = "alertmanager";
};

@Silver-Golden
Copy link
Member Author

Just viewed yer profile and realised I should have mentioned ye @roberth

@roberth
Copy link
Member

roberth commented Aug 11, 2023

Should I adjust it so that both the owner and repo can take slashes?

Right, I think framing it as "repo has a path" was a bit wrong.

To be correct about it, in the GitLab data model we have:
A repo is 1:1 with a project.
A project has a name, which is the final part of its path.
A project has a namespace, which is a user or group.
A groups can be nested.

In Nix we have:
A namespace is called an owner, for consistency with GitHub.

So a slash in a repo is not actually possible; any such prefix to the repo should be specified in the owner instead.
Allowing a slash in the repo would therefore make the values in owner unreliable, which seems kinda bad.

I guess for gitlab it would have made more sense to have a single attribute for the whole path to the repo including the "owner" - whatever that means in a gitlab context. We shouldn't be making up such terms.

@Silver-Golden
Copy link
Member Author

I see what ye mean, you would be thinking more of example3 above right?

Where things get tricky is ownship in Gitlab is a lot more nuanced than the flat structure of Github.
For example:

We have https://gitlab.skynet.ie/compsoc1 which as two groups, the computer society and skynet (social vs technical).
There is some overlap in membership between the two subgroups.

For https://gitlab.skynet.ie/compsoc1/skynet/nixos I would consider this the attribute

nixos = {
  type  = "gitlab";
  owner = "compsoc1/skynet";
  repo  = "nixos";
};

Which is in line of what ye are thinking of, however I would consider https://gitlab.skynet.ie/compsoc1/skynet/website/2023 to be like so:

website = {
  type  = "gitlab";
  owner = "compsoc1/skynet";
  repo  = "website/2023";
};

In both cases I consider compsoc1/skynet to be the owner because its teh same group of people managing both.
The closest ye would get on Github is something like this https://github.com/Byron/gitoxide/tree/main/cargo-smart-release where tehre are multiple related projects in a single repo.

(Semi serious, pockets folders is the quiet game changing feature of Gitlab)

@roberth
Copy link
Member

roberth commented Aug 11, 2023

pockets

As in something useful that we tend to take for granted? Interesting.

ownership in Gitlab is a lot more nuanced

Right! And this is not really represented in their data model. That means we won't agree on what the split should be and we should simplify the interface so that we don't burden users with coming up with their own ownership ideas where it doesn't really apply unambiguously.

<Insert seizing means of production communism joke here.>

But seriously, we should probably get rid of owner. Maybe path could be a replacement for both owner and repo?

@edolstra, would it be ok to have an attribute called path in the fetchTree arguments? There's some potential for confusion with the subflake dir parameter I guess. wdyt?

@edolstra
Copy link
Member

Yes, path is a bit ambiguous. Maybe there is a better name?

@K900
Copy link
Contributor

K900 commented Aug 11, 2023

Maybe we should just keep repo, and let owner be unset?

@Silver-Golden
Copy link
Member Author

Silver-Golden commented Aug 11, 2023

It wasnt the intention of mine earlier but I am more in favor of allowing slashes in both owner and repo, like in #8773 (comment) , gives flexability.
Also allowing owner to be unset would be grand as well

@Silver-Golden
Copy link
Member Author

Ye, I would be quite happy with this

# top level group is "owner"
nixos = {
  type  = "gitlab";
  owner = "compsoc1";
  repo  = "skynet/nixos";
};
# subgroup is "owner"
nixos = {
  type  = "gitlab";
  owner = "compsoc1/skynet";
  repo  = "nixos";
};
# no owner
nixos = {
  type  = "gitlab";
  repo  = "compsoc1/skynet/nixos";
};

@roberth
Copy link
Member

roberth commented Aug 11, 2023

# no owner

That'd be my preferred syntax, and if we do keep the owner attr around as "user input" syntax, I think we should canonicalize to it before returning the "sourceInfo" to the user.

I don't think libfetchers is set up to normalize the attributes and remove owner from the returned attrset, is it?

@Silver-Golden
Copy link
Member Author

Why lock it down to a specific one though?
Especially since all 3 can co-exist without any issue.

If its about making it easier for the user then any of those will be better than what we have now.
It took me about an hour last year to figure out why why "gitlab:c2842/misc/classbot"; wasnt getting accepted.

Especially easier to fit in than a new attribute.

@roberth
Copy link
Member

roberth commented Aug 11, 2023

I'm mostly ok with accepting multiple input syntaxes, but not happy about propagating that entropy into the rest of a Nix expression.
For example if in my flake I override the foo input of your flake, and your flake has a conditional based on foo.owner, I may unknowingly cause a problem by overriding foo (follows) despite specifying the exact same repository, just because we disagree about how long the owner prefix is.

So what I'm suggesting is for fetchTree to canonicalize the input attributes to prevent such a situation from arising. If inputs.foo.owner doesn't exist, we can't accidentally disagree about its value.

@Silver-Golden
Copy link
Member Author

Silver-Golden commented Aug 12, 2023

@roberth I can see where ye are coming from, however that is an already existing problem.

Currently I could have it like so:

nixos = {
  type  = "gitlab";
  owner = "compsoc1";
  repo  = "skynet%2Fnixos";
};
nixos = {
  type  = "gitlab";
  owner = "compsoc1%2Fskynet";
  repo  = "nixos";
};

So if you try to override the repo without also overriding the owner ye may run into issues.
I repeat, that is current behaviour, not introduced in this patch

And as far as I can tell there has been no issue raised in the three years since the lines were last touched.


Pushed up a commit to also allow slashes in owner.


@edolstra have you any thoughts on the convo above?

@Ericson2314
Copy link
Member

Triaged in the Nix team meeting

  • @edolstra: flakeref is tricky because extra slash means branch name

    • roberth: could use double forward slash for branch name. Some other tool(s) do this (Haskell??)
  • @roberth: flakeref and structured data representation are distinct syntaxes and can be considered separately

  • @roberth: it's still experimental (but painful to change nonetheless)

  • Tricky problem: Gitlab doesn't have a owner/repo distinction, instead repos can be arbitrary paths. But switching to arbitrary paths would make refs/revs ambiguous.

Scratch space:

  • double slash: gitlab://group/subgroup/repo//branch
  • currently: gitlab://group%2Fsubgroup/repo/branch
  • gitlab://org/section/repo\sha
  • gitlab://org/section/repo|sha
  • gitlab://org/section/repo?ref=sha
  • gitlab://org/section/repo__________SOMETHING________sha

Decision: Will discuss further

@Silver-Golden
Copy link
Member Author

@Ericson2314 thanks for discussing it in the meeting.
Are ye planning to expand the scope (in this MR/issue) from teh attribute to gitlab: url as well?
Or will it be seperate?

@roberth
Copy link
Member

roberth commented Aug 19, 2023

Are ye planning to expand the scope [...] to gitlab: url as well?

It's related but can be considered separately. The team hasn't made a decision about the URL yet, so I'd focus on just the structured data syntax.

Possible migration path for URLs

We could just break it, or do a migration. For a migration plan we only need to consider the user facing interface, but that's not to say that the implementation will be easy. libfetchers' architecture doesn't help with any of this, I believe.

  1. current state
  2. make the structured data work without owner (I don't think it already does)
  3. add //ref support to all VCS fetchers
  4. deprecate: warn when owner is set
  5. deprecate: warn when the gitlab: URI contains a path. Tell users to pass ?repo=group/subgroup/repo
  6. make the breaking change to the gitlab: URI syntax: third path component is not a ref anymore. Refs must now be ?ref= or //ref

1-3 are a good goal for 23.11
4-5 can be done for 24.05
the migration could be completed in 24.11.

So if we want to do a user friendly migration for the flakeref URIs it will take a while. Flakeref URIs are just a nice to have for the flakes feature, so we could stabilize flakes before stabilizing the URIs.

@roberth roberth self-assigned this Aug 21, 2023
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-08-18-nix-team-meeting-minutes-80/32081/1

@fricklerhandwerk
Copy link
Contributor

Reviewed in Nix team meeting 2023-08-21:

  • rehashed the state of discussion
  • discussed flake URIs more generally
  • considered making a breaking change for gitlab flakerefs
  • decision:
    • structured representation: owner becomes optional, but is prepended to the repo if specified, and a warning is printed
    • flake URI: options but no decision yet
      • add a special syntax such as //ref
        • even more special syntax
      • require escaping if there is more than one / in the path
        • not ergonomic
      • do more involved parsing, e.g. count path components
        • breaking change, but possible
      • resolve details with the service API
        • too slow
      • remove the special syntax and require people to use query parameters or the structured representation
        • cheap

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-08-21-nix-team-meeting-minutes-81/32082/1

@Silver-Golden
Copy link
Member Author

@roberth @edolstra

Not sure if its intended but it seems the way the URL is handled has changed.
Best that I can tell is that something is replacing%2F with / in the path attribute before it gets to the fetcher because it is splitting off the part after the last / as if its the branch/ref, even if it is inputted in the form of %2F.

If ye want I could split this off into a new ticket.

replacing old 'nix-2.13.2'
installing 'nix-2.18.1'

This worked fine on 2.13.2 at least, not sure what was last version it worked on.

compsoc_public.url = "gitlab:compsoc1%2Fcompsoc/presentations?host=gitlab.skynet.ie";
silver@DESKTOP-7RB7S9E:/mnt/c/College/Compsoc/nixos$ nix develop
error:
       … while updating the lock file of flake 'git+file:///mnt/c/College/Compsoc/nixos?ref=refs/heads/main&rev=8ea737d57b224cdc8e5009ff47cb1e013ca71ac0'

       … while updating the flake input 'compsoc_public'

       … while fetching the input 'gitlab:compsoc1/compsoc/presentations'

       error: unable to download 'https://gitlab.skynet.ie/api/v4/projects/compsoc1%2Fcompsoc/repository/commits?ref_name=presentations': HTTP error 404

       response body:

       {"message":"404 Project Not Found"}

If I change the last slash in it to %2F result is the same

compsoc_public.url = "gitlab:compsoc1%2Fcompsoc%2Fpresentations?host=gitlab.skynet.ie";
silver@DESKTOP-7RB7S9E:/mnt/c/College/Compsoc/nixos$ nix develop
warning: Git tree '/mnt/c/College/Compsoc/nixos' is dirty
error:
       … while updating the lock file of flake 'git+file:///mnt/c/College/Compsoc/nixos'

       … while updating the flake input 'compsoc_public'

       … while fetching the input 'gitlab:compsoc1/compsoc/presentations'

       error: unable to download 'https://gitlab.skynet.ie/api/v4/projects/compsoc1%2Fcompsoc/repository/commits?ref_name=presentations': HTTP error 404

       response body:

       {"message":"404 Project Not Found"}

and for funsies use all /, still same error

compsoc_public.url = "gitlab:compsoc1/compsoc/presentations?host=gitlab.skynet.ie";
silver@DESKTOP-7RB7S9E:/mnt/c/College/Compsoc/nixos$ nix develop
warning: Git tree '/mnt/c/College/Compsoc/nixos' is dirty
error:
       … while updating the lock file of flake 'git+file:///mnt/c/College/Compsoc/nixos'

       … while updating the flake input 'compsoc_public'

       … while fetching the input 'gitlab:compsoc1/compsoc/presentations'

       error: unable to download 'https://gitlab.skynet.ie/api/v4/projects/compsoc1%2Fcompsoc/repository/commits?ref_name=presentations': HTTP error 404

       response body:

       {"message":"404 Project Not Found"}

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Oct 9, 2023

@Silver-Golden yes, that's been an intentional change here #6614

@Silver-Golden
Copy link
Member Author

Welp, once again it feels that gitlab is not even a second class citizen :(

This most likely means that it will require a rewrite of the src/libfetchers/github.cc logic to get it working, possibly #9031 in the longterm but as it currently stands I would see the current behaviour as a bug.
I'll do up an issue tomorrow and take a look at ways to solve it.

@roberth
Copy link
Member

roberth commented Oct 10, 2023

Best that I can tell is that something is replacing%2F with / in the path attribute before it gets to the fetcher because it is splitting off the part after the last / as if its the branch/ref, even if it is inputted in the form of %2F.

an intentional change here #6614

Still a bug though.

#6614 should be re-done by factoring out the parser, putting the freshly parsed strings into a struct and adding a bunch of unit tests.

I don't know how people can accept custom parsing code without a ton of tests. This is insane.

Another thing that needs to happen is that the gitlab code should stop inheriting any code that makes github assumptions.
Most of the logic in GitArchiveInputScheme should be moved down into a new GitHubLikeArchiveInputScheme.

@Silver-Golden
Copy link
Member Author

@roberth I am planning to open a new issue ticket and merge request to fix the new bug introduced above, anything in particular from this one do ye want me to add?
(carved out some free time today and tomorrow)

@roberth
Copy link
Member

roberth commented Oct 15, 2023

@Silver-Golden I think that's a problem in the parsing/decoding logic, which wasn't really touched by this PR.

@Silver-Golden
Copy link
Member Author

Switching over to #9160 for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation fetching Networking with the outside (non-Nix) world, input locking new-cli Relating to the "nix" command

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants