Skip to content

sourcehut: update all component; lots of fixes#245394

Merged
tomberek merged 29 commits intoNixOS:masterfrom
christoph-heiss:pkgs/sourcehut
Nov 11, 2023
Merged

sourcehut: update all component; lots of fixes#245394
tomberek merged 29 commits intoNixOS:masterfrom
christoph-heiss:pkgs/sourcehut

Conversation

@christoph-heiss
Copy link
Contributor

Description of changes

This updates all sourcehut components to their latest release; as of 23-07-2023.

Additionally, some things changed and had to be adapted. Further, some bug fixes (notably, should fix #199778, #201424 and #201425 IIRC).

I'm currently only running the metasrht and gitsrht services, thus the other are basically untested (apart from running nixosTests.sourcehut). I will do that once I have the spare time, but I thought to open a (draft) PR anyway, such that other people can chime in and maybe even help with testing. It's a rather big thing that definitely needs more than one pair of eyes.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Jul 25, 2023
@christoph-heiss christoph-heiss requested a review from tomberek July 25, 2023 14:31
@ofborg ofborg bot requested a review from eadwu July 25, 2023 15:30
@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Jul 25, 2023
@nessdoor
Copy link
Contributor

nessdoor commented Aug 7, 2023

Thank you for taking the trouble of going through this much needed update.

I'm in the process of setting up my first Sourcehut forge, therefore I'm sorry if I won't be able to provide much knowledgeable feedback, but I'll try my best.

It seems that your PR suffers from the same problem as upstream where the dispatch.sr.ht service has been deprecated, but its presence in the services list triggers a deprecation assertion that is fatal to the build:
https://github.com/NixOS/nixpkgs/blob/818cfdd3c325e9fc725c6ce4dd1046978bbf8ebb/nixos/modules/services/misc/sourcehut/service.nix#L258-L259
Explicitly setting the services array is a viable workaround, but it is completely undocumented.

On this note, what about updating the manual with a working example as well, since the one that is written there is no longer valid? Maybe include something like the one in #133984. But maybe documentation can be updated as part of a separate PR, depending on how much work this one requires.

@christoph-heiss
Copy link
Contributor Author

It seems that your PR suffers from the same problem as upstream where the dispatch.sr.ht service has been deprecated, but its presence in the services list triggers a deprecation assertion that is fatal to the build:
[..]
Explicitly setting the services array is a viable workaround, but it is completely undocumented.

Thanks for trying it out and catching that! In my config, I have set it explicitly, since I don't use/need all services. I'll fix it and will set set up a test server with all services enabled for the future.

On this note, what about updating the manual with a working example as well, since the one that is written there is no longer valid?

Good point, I will do that too. Should not be too much of a hassle, I think.

@nessdoor
Copy link
Contributor

nessdoor commented Aug 7, 2023

I have set it explicitly, since I don't use/need all services.

Apparently, setting it explicitly was the old way of doing it, now replaced by the <service>.enable flag. I guess people never noticed the problem because everyone has legacy configs.

Another pain point is the need for all those OAuth IDs and secrets to be set manually, but I guess it would be non-trivial to come up with a safe way of generating them. Maybe this will have to be addressed in a future PR, but at least the documentation should be explicit about that.

@christoph-heiss
Copy link
Contributor Author

christoph-heiss commented Aug 7, 2023

Apparently, setting it explicitly was the old way of doing it, now replaced by the <service>.enable flag. I guess people never noticed the problem because everyone has legacy configs.

Well, then I'd suggest to explicitly deprecate setting the services array, so that people migrate. I will take a look at it, probably for the best in the long run anyway.

Another pain point is the need for all those OAuth IDs and secrets to be set manually, but I guess it would be non-trivial to come up with a safe way of generating them. Maybe this will have to be addressed in a future PR, but at least the documentation should be explicit about that.

Yeah, secret management would definitely push this PR to far. That should be brainstormed separately, I'd rather have this PR go somewhere sooner than later.

In any case, big thanks for helping out with this!

@ofborg ofborg bot added the 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. label Aug 7, 2023
@nessdoor
Copy link
Contributor

nessdoor commented Aug 9, 2023

I can confirm that, at this moment, your PR seems to be working. I managed to bring up both meta and git, and they are reachable from the Internet.

I noticed you have chosen to deprecate services through an assertion that makes the build fail. Do you think it is better this way or do you feel like we could just be emitting a warning, so that people with old configs are not confronted with a build failure?

Apart from that, during the setup process I found two sources of frustration that, maybe, we could address in this PR.

The first one is that secret management is terribly inconsistent between secrets that are to be included in the INI files and those that, instead, are accessed by path. Those that are to be written to the INI files are the easiest to manage, as the activation script decrypts them as root and copies their contents verbatim to the INIs, meaning that the secret management utility that one uses (in my case, agenix) doesn't have to treat them in any special way. The major issue with secrets that are accessed by path, though, is that nowhere it is written that Sourcehut services run in heavily-restricted chroots, and therefore don't have access to the filesystem. If it weren't for your NixOS configs I found on your repo (thanks for that), I would have never guessed that I had to set the serviceConfig.BindReadOnlyPaths just to make the secrets accessible. And this passage is mandatory, as for some reason one cannot opt-out from the mailing service and needs to provide a GPG key pair (maybe another thing that can be improved?).

The other issue is that different services run under different users/groups, meaning that one has to either replicate the secrets and assign each one to the user of the service, or create an ad-hoc group with the various services' users, and assign it the ownership of the shared secrets.

Is there any way in which we could handle secrets better? Personally, I went for an additional srhtsec group that owns the shared secrets, added gitsrcht and metasrht to it, then created a /run/sourcehut/secrets directory and instructed agenix to copy the GPG key there after decription.

Lastly, something still related to the GPG key but that can be solved through documentation: Sourcehut expects keys in armored format, not binary, but this is not told anywhere.

In the coming days, if I have time, I will try to spin up a build instance. I'll let you know how that goes. I haven't actually created any users, yet, so there may be hidden problems in there as well, but I don't expect any.

@christoph-heiss
Copy link
Contributor Author

I noticed you have chosen to deprecate services through an assertion that makes the build fail. Do you think it is better this way or do you feel like we could just be emitting a warning, so that people with old configs are not confronted with a build failure?

Yeah, was my first thought too, but there isn't something like mkDeprecatedOptionModule, so I decided to simply drop that. But since one already need to set enable = true per individual service, users can simply drop that line from their config without having to change something else. So just breaking builds isn't that bad either.
I'll try to simply deprecate it instead of removing, even if IMHO has not a lot of value.

The major issue with secrets that are accessed by path, though, is that nowhere it is written that Sourcehut services run in heavily-restricted chroots, and therefore don't have access to the filesystem. If it weren't for your NixOS configs I found on your repo (thanks for that), I would have never guessed that I had to set the serviceConfig.BindReadOnlyPaths just to make the secrets accessible.

Thanks for pointing that out, I totally forgot about that issue due to having it "solved" in my config. I'll fix that 👍

Is there any way in which we could handle secrets better? Personally, I went for an additional srhtsec group that owns the shared secrets, added gitsrcht and metasrht to it, then created a /run/sourcehut/secrets directory and instructed agenix to copy the GPG key there after decription.

Hm .. having a separate group for all (shared) secrets does sounds like a good idea, I think. Maybe even something like srhtservice (or _srhtservice?), where all separate users like gitsrht etc. are members?

Lastly, something still related to the GPG key but that can be solved through documentation: Sourcehut expects keys in armored format, not binary, but this is not told anywhere.

I will update the documentation w.r.t. that when I get around to that issue. The doc is definitely lacking lots of things.

In the coming days, if I have time, I will try to spin up a build instance. I'll let you know how that goes. I haven't actually created any users, yet, so there may be hidden problems in there as well, but I don't expect any.

Thanks! If something non-obvious comes up, I happily can help you out. Also, would be good to take notes of such things, so that I can improve it or add it to the documentation, as appropriate.

@nessdoor
Copy link
Contributor

I'll try to simply deprecate it instead of removing, even if IMHO has not a lot of value.

I totally see your point, both solutions are acceptable, I think. I was just wondering which way would be better, but it's totally up to you.

Hm .. having a separate group for all (shared) secrets does sounds like a good idea, I think. Maybe even something like srhtservice (or _srhtservice?), where all separate users like gitsrht etc. are members?

The group hack was the least convoluted solution to the problem without having to modify the module, but maybe it's not the cleanest as an upstream technique. I would not assume that all services share the same subset of secrets, therefore we would violate the least-privilege/need-to-know principle by doing that. Since we're building chroot jails for all services, we might as well copy and chown the necessary secrets inside each jail upon service activation, letting the derivation of each Sourcehut service choose which secrets it wants. I think this can be delegated to some logic near the generators for their Systemd services and INI configurations. Since the (encrypted) secrets are kept under the Nix store and tied to the system derivation, any change of secrets would entail a change of inputs, triggering a rebuild of the jails and reload of the services, so we shouldn't have any problems with stale secrets, either.

Well then, I'm off to setup this build thing. User management, repo creation and push/pull all work as expected. I am not going to test the public registration and mailing subsystem anytime soon, though.

@nessdoor
Copy link
Contributor

nessdoor commented Aug 15, 2023

Alright, here I am with a status update.

First, let's begin with the easy stuff.

The issue with the shared secrets had already been found in #177423, and #177423 (comment) suggests to address it by relying on the LoadCredential= Systemd directive, which I think is a great idea.

Moving on.

Setting up build is a two-level configuration process, where we must both configure the service and define the VM images running the builds. The image configuration looks like is the only fundamentally different part from other Sourcehut services.
The current module accepts an attribute set of derivations producing VM disk images under the services.sourcehut.build.images attribute. Configuring a machine image from scratch is tiresome, as can be seen from #133984 (comment), so a nice NixOS image definition that can be evaluated at build time is already supported by upstream. It is very easy to use, as it is sufficient to evaluate:

(import ("${pkgs.sourcehut.buildsrht}/lib/images/nixos/image.nix") {
            pkgs =  <optional-pkgs-override>;
          })

in order to produce the necessary derivation.
Unfortunately, this evaluation fails under my Flake configuration, but not for the reason stated in #126111 (comment). Apparently, it is now possible to evaluate store paths in a pure environment, but the problem is that there is no way to provide a system input attribute to nixos/lib/eval-config.nix, making it default to builtins.currentSystem , which is unavailable inside pure builds. This issue ought to be resolved upstream, and I'll try to contact the NixOS image maintainer (@fgaz) to see if we can have an optional system attribute passed through the image.nix function.
For now, I am building a NixOS image with the following derivation, sourced directly from the upstream source:

          let
            baseSystemConfiguration = { pkgs, ... }:
              {
                # passwordless ssh server
                services.openssh = {
                  enable = true;
                  permitRootLogin = "yes";
                  extraConfig = "PermitEmptyPasswords yes";
                };
                
                users = {
                  mutableUsers = false;
                  # build user
                  extraUsers."build" = {
                    isNormalUser = true;
                    uid = 1000;
                    extraGroups = [ "wheel" ];
                    password = "";
                  };
                  users.root.password = "";
                };
                security.sudo.wheelNeedsPassword = false;
                nix.trustedUsers = [ "root" "build" ];
                
                # builds.sr.ht-image-specific network settings
                networking = {
                  hostName = "build";
                  dhcpcd.enable = false;
                  defaultGateway.address = "10.0.2.2";
                  usePredictableInterfaceNames = false; # so that we just get eth0 and not some weird id
                  interfaces."eth0".ipv4.addresses = [{
                    address = "10.0.2.15";
                    prefixLength = 25;
                  }];
                  enableIPv6 = false;
                  nameservers = [
                    # OpenNIC anycast
                    "134.195.4.2" # https://servers.opennicproject.org/edit.php?srv=ns4.any.dns.opennic.glue
                    # Google as a fallback :(
                    "8.8.8.8"
                  ];
                  firewall.allowedTCPPorts = [ 22 ]; # allow ssh
                };
                
                environment.systemPackages = with pkgs; [
                  gitMinimal
                  mercurial
                  curl
                  gnupg
                ];
              };
            qemuSystemConfiguration = { pkgs, ... }:
              {
                imports = [ baseSystemConfiguration ];
                fileSystems."/".device = "/dev/disk/by-label/nixos";
                boot.initrd.availableKernelModules = [
                  "xhci_pci"
                  "ehci_pci"
                  "ahci"
                  "usbhid"
                  "usb_storage"
                  "sd_mod"
                  "virtio_balloon"
                  "virtio_blk"
                  "virtio_pci"
                  "virtio_ring"
                ];
                boot.loader = {
                  grub = {
                    version = 2;
                    device = "/dev/vda";
                  };
                  timeout = 0;
                };
              };
            makeDiskImage = import "${pkgs.path}/nixos/lib/make-disk-image.nix";
            evalConfig = import "${pkgs.path}/nixos/lib/eval-config.nix";
            config = (evalConfig {
              modules = [ qemuSystemConfiguration ];
              system = "x86_64-linux";
            }).config;
          in
            makeDiskImage {
              inherit pkgs config;
              lib = pkgs.lib;
              diskSize = 16000;
              format = "qcow2-compressed";
              contents = [{
                source = pkgs.writeText "gitconfig" ''
                [user]
                name = builds.sr.ht
                email = builds@sr.ht
                '';
                target = "/home/build/.gitconfig";
                user = "build";
                group = "users";
                mode = "644";
              }];
            };

Finally, for the difficult part, I tried to run the service.

The web UI works correctly, and I submitted the following manifest to test everything out:

image: nixos/unstable
tasks:
  - say-hello: |
      echo hello
  - say-world: |
      echo world

This run produced an error 500 response, the job is stuck in the pending state, logs cannot be retrieved from the web UI (Error fetching logs for task "None"), and the following is spat out on the system log:

buildsrht-api: 2023/08/14 21:50:39 invalid redis URL scheme: redis+socket
buildsrht-api: 2023/08/14 21:50:39 invalid redis URL scheme: redis+socket
buildsrht-api: 2023/08/14 21:50:39 goroutine 38 [running]:
buildsrht-api: git.sr.ht/~sircmpwn/core-go/server.EmailRecover({0xef2a50, 0xc0002a5d40}, {0xccaa40, 0xc0001fd750})
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/server/email.go:48 +0x1a7
buildsrht-api: github.com/99designs/gqlgen/graphql.(*OperationContext).Recover(0xc0002e04c8?, {0xef2a50, 0xc0002a5d40}, {0xccaa40?, 0xc0001fd750?})
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/context_operation.go:124 +0x3e
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation_submit.func1()
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:4385 +0x6a
buildsrht-api: panic({0xccaa40, 0xc0001fd750})
buildsrht-api:         runtime/panic.go:884 +0x213
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph.(*mutationResolver).Submit(0xa?, {0xef2a50, 0xc0002a5d40}, {0xc0000422a0, 0x65}, {0x155d3b8, 0x0, 0x0}, 0xc0001fd040, 0x0, ...)
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/schema.resolvers.go:346 +0x3b3
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation_submit.func2.1({0xef2a50?, 0xc0002a5d40})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:4392 +0x2a4
buildsrht-api: git.sr.ht/~sircmpwn/core-go/server.Access({0xef2a50, 0xc0002a5d40}, {0xc09ab3?, 0xef2a50?}, 0xc000081b20, {0xdce1c2, 0x4}, {0xdcd9f4, 0x2})
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/server/directives.go:55 +0x21d
buildsrht-api: main.main.func1({0xef2a50?, 0xc0002a5d40?}, {0x0?, 0x0?}, 0xee8080?, {0xdce1c2?, 0xc000081b20?}, {0xdcd9f4?, 0xc0000e2860?})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/server.go:34 +0x3d
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation_submit.func2.2({0xef2a50, 0xc0002a5d40})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:4406 +0xd7
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation_submit.func2({0xef2a50, 0xc0002a5d40})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:4409 +0xe4
buildsrht-api: github.com/99designs/gqlgen/graphql/executor.processExtensions.func4({0xef2a50?, 0xc0002a5d40?}, 0xc0001ede30?)
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/executor/extensions.go:72 +0x26
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation_submit(0xc0001fcf60, {0xef2a50, 0xc0002a5bc0}, {0xc0000d9d80?, {0xc0001fcfc0?, 0x0?, 0xc0000e2a18?}})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:4389 +0x219
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation.func1({0xef2a50?, 0xc0002a5bc0?})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:12064 +0x45
buildsrht-api: github.com/99designs/gqlgen/graphql/executor.processExtensions.func3({0xef2a50?, 0xc0002a5bc0?}, 0xc9f940?)
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/executor/extensions.go:69 +0x26
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executionContext)._Mutation(0xc0001fcf60, {0xef2a50, 0xc0002a5b30}, {0xc0001fccc0, 0x1, 0x1})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:12063 +0x724
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api.(*executableSchema).Exec.func2({0xef2a50?, 0xc0002a5b00?})
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/graph/api/generated.go:1363 +0x93
buildsrht-api: github.com/99designs/gqlgen/graphql/executor.(*Executor).DispatchOperation.func1.1.1({0xef2a50, 0xc0002a5b00})
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/executor/executor.go:119 +0x2b
buildsrht-api: github.com/99designs/gqlgen/graphql/executor.processExtensions.func2({0xef2a50?, 0xc0002a5b00?}, 0xc9f940?)
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/executor/extensions.go:66 +0x26
buildsrht-api: github.com/99designs/gqlgen/graphql/executor.(*Executor).DispatchOperation.func1.1({0xef2a50, 0xc0002a5a40})
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/executor/executor.go:118 +0x113
buildsrht-api: github.com/99designs/gqlgen/graphql/handler/transport.POST.Do({0xef2a50?}, {0x7fc7a917f380?, 0xc000083a40}, 0xc0000f9e00, {0xef1a70, 0xc0001e54a0})
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/handler/transport/http_post.go:91 +0x504
buildsrht-api: github.com/99designs/gqlgen/graphql/handler.(*Server).ServeHTTP(0xc000080fc0, {0x7fc7a917f380, 0xc000083a40}, 0xc0000f9e00)
buildsrht-api:         github.com/99designs/gqlgen@v0.17.29/graphql/handler/server.go:121 +0x273
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xcdc900?, {0x7fc7a917f380?, 0xc000083a40?}, 0xc00003e845?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: github.com/go-chi/chi.(*Mux).routeHTTP(0xc0000845a0, {0x7fc7a917f380, 0xc000083a40}, 0xc0000f9d00)
buildsrht-api:         github.com/go-chi/chi@v4.1.2+incompatible/mux.go:431 +0x1f9
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0x7fc7a917f380?, 0xc000083a40?}, 0x14fa8e0?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/webhooks.Middleware.func1.1({0x7fc7a917f380, 0xc000083a40}, 0xc0000f9c00)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/webhooks/middleware.go:15 +0x144
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0x7fc7a917f380?, 0xc000083a40?}, 0x14fa6e0?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/account.Middleware.func1.1({0x7fc7a917f380, 0xc000083a40}, 0xc0000f9b00)
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/account/middleware.go:24 +0x144
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0x7fc7a917f380?, 0xc000083a40?}, 0x14fa840?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/builds.sr.ht/api/loaders.Middleware.func1({0x7fc7a917f380, 0xc000083a40}, 0xc0000f9a00)
buildsrht-api:         git.sr.ht/~sircmpwn/builds.sr.ht/api/loaders/middleware.go:243 +0x34d
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0x7fc7a917f380?, 0xc000083a40?}, 0x14fa8c0?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/server.(*Server).WithDefaultMiddleware.func2.1({0x7fc7a917f380, 0xc000083a40}, 0xc0000f9900)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/server/server.go:211 +0x21a
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0x7fc7a917f380?, 0xc000083a40?}, 0xd05a00?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: github.com/go-chi/chi/middleware.Timeout.func1.1({0x7fc7a917f380, 0xc000083a40}, 0xc0000f9800)
buildsrht-api:         github.com/go-chi/chi@v4.1.2+incompatible/middleware/timeout.go:45 +0x1cd
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xc0000f9700?, {0x7fc7a917f380?, 0xc000083a40?}, 0xc000039960?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: github.com/go-chi/chi/middleware.RequestLogger.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9700)
buildsrht-api:         github.com/go-chi/chi@v4.1.2+incompatible/middleware/logger.go:46 +0x17d
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xc0000f9700?, {0xef2280?, 0xc0000fa1c0?}, 0xc000100000?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: github.com/go-chi/chi/middleware.RealIP.func1({0xef2280, 0xc0000fa1c0}, 0xc0000f9700)
buildsrht-api:         github.com/go-chi/chi@v4.1.2+incompatible/middleware/realip.go:34 +0x9e
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0xef2280?, 0xc0000fa1c0?}, 0x14fa850?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/auth.internalAuth({0xc0001fc170, 0x2, 0x0?}, {0xc0002b00f0, 0xe4, 0xf0}, {0xef2280, 0xc0000fa1c0}, 0xc0000f9400, {0xeee280, ...})
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/auth/middleware.go:286 +0x573
buildsrht-api: git.sr.ht/~sircmpwn/core-go/auth.Middleware.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9400)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/auth/middleware.go:717 +0x4f3
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0xef2280?, 0xc0000fa1c0?}, 0x14fa8a0?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/redis.Middleware.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9300)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/redis/middleware.go:21 +0x145
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0xef2280?, 0xc0000fa1c0?}, 0x14fa880?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/database.Middleware.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9200)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/database/middleware.go:21 +0x145
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0xef2280?, 0xc0000fa1c0?}, 0x14fa890?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/email.Middleware.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9100)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/email/worker.go:240 +0x145
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef2a50?, {0xef2280?, 0xc0000fa1c0?}, 0xdd8d19?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/config.Middleware.func1.1({0xef2280, 0xc0000fa1c0}, 0xc0000f9000)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/config/middleware.go:23 +0x144
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0x40f8aa?, {0xef2280?, 0xc0000fa1c0?}, 0xf8?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: git.sr.ht/~sircmpwn/core-go/server.(*Server).WithDefaultMiddleware.func1.1({0xef2280, 0xc0000fa1c0}, 0x14faa01?)
buildsrht-api:         git.sr.ht/~sircmpwn/core-go@v0.0.0-20230411141100-89b1b48997a8/server/server.go:183 +0x6c
buildsrht-api: net/http.HandlerFunc.ServeHTTP(0xef29a8?, {0xef2280?, 0xc0000fa1c0?}, 0x14faa10?)
buildsrht-api:         net/http/server.go:2122 +0x2f
buildsrht-api: github.com/go-chi/chi.(*Mux).ServeHTTP(0xc0000845a0, {0xef2280, 0xc0000fa1c0}, 0xc0000f8d00)
buildsrht-api:         github.com/go-chi/chi@v4.1.2+incompatible/mux.go:86 +0x296
buildsrht-api: net/http.serverHandler.ServeHTTP({0xef0060?}, {0xef2280, 0xc0000fa1c0}, 0xc0000f8d00)
buildsrht-api:         net/http/server.go:2936 +0x316
buildsrht-api: net/http.(*conn).serve(0xc0001f0480, {0xef2a50, 0xc0002a4330})
buildsrht-api:         net/http/server.go:1995 +0x612
buildsrht-api: created by net/http.(*Server).Serve
buildsrht-api:         net/http/server.go:3089 +0x5ed
buildsrht-api: 2023/08/14 21:50:39 "POST http://localhost:5102/query HTTP/1.1" from xx.xx.xx.xxx - 200 78B in 11.880368ms
gunicorn: [2023-08-14 21:50:39,045] ERROR in app: Exception on /submit [POST]
gunicorn: Traceback (most recent call last):
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 2529, in wsgi_app
gunicorn:     response = self.full_dispatch_request()
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
gunicorn:     rv = self.handle_user_exception(e)
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
gunicorn:     rv = self.dispatch_request()
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
gunicorn:     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/srht/oauth/__init__.py", line 43, in wrapper
gunicorn:     return f(*args, **kwargs)
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/buildsrht/blueprints/jobs.py", line 271, in submit_POST
gunicorn:     job_id = submit_build(current_user, _manifest, note=note,
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/buildsrht/runner.py", line 26, in submit_build
gunicorn:     resp = exec_gql("builds.sr.ht", """
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/srht/graphql/client.py", line 20, in exec_gql
gunicorn:     return op.execute(site, user=user, client_id=client_id, valid=valid)
gunicorn:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/srht/graphql/client.py", line 100, in execute
gunicorn:     raise GraphQLError(resp)
gunicorn: srht.graphql.client.GraphQLError: {'errors': [{'message': 'internal system error', 'path': ['submit']}], 'data': None}

This error is utterly incomprehensible to me, so I hope it is something that can be easily solved on the GraphQL side, but I have no idea what GraphQL is apart from the obvious meaning of its name. Maybe is just a misconfiguration on my part.

Anyway, this concludes the status report. I can see that there are some small issues that can be tackled straightforwardly, so maybe it is wiser to chip at Sourcehut little by little, opening some spun-off PRs and then rebasing this larger update onto them? But I don't know how the rest of the updates come into play in all of this. In any case, I am more than willing to help by spinning off a services deprecation PR, a credentials PR and, if necessary, a patch for the image.nix expression, in case the maintainer is unresponsive. Just give me the word and I'll be on it.

@fgaz
Copy link
Member

fgaz commented Aug 15, 2023

This issue ought to be resolved upstream, and I'll try to contact the NixOS image maintainer (@fgaz) to see if we can have an optional system attribute passed through the image.nix function.

Sure, feel free to send a patch to sr.ht-dev and cc me :)

@nessdoor
Copy link
Contributor

I was about to send you a private email, good timing!

Just let me learn how to send patches via email and I'll post it to the list.

@nessdoor
Copy link
Contributor

nessdoor commented Aug 20, 2023

Time for another status report.

The modification to image.nix has been proposed upstream in patch 43741. If it passes, we won't need to define the NixOS image ourselves.

As for the long stack trace generated while trying to submit build manifests, I might have found a related patch in patches/redis-socket/core/0001-Fix-Unix-socket-support-in-RedisQueueCollector.patch that is worth a look, I think. I will try to do that either tomorrow or soon after.

Then, I decided to try out paste, which should be a rather simple service to setup and run. Turns out it doesn't work out of the box. The web UI is OK, but as soon as you try to create a paste, the following appears in the log:

gunicorn[24498]: [2023-08-19 17:27:36,708] ERROR in app: Exception on /new-paste [POST]
gunicorn[24498]: Traceback (most recent call last):
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
gunicorn[24498]:     conn = connection.create_connection(
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
gunicorn[24498]:     raise err
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
gunicorn[24498]:     sock.connect(sa)
gunicorn[24498]: ConnectionRefusedError: [Errno 111] Connection refused
gunicorn[24498]: During handling of the above exception, another exception occurred:
gunicorn[24498]: Traceback (most recent call last):
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
gunicorn[24498]:     httplib_response = self._make_request(
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request
gunicorn[24498]:     conn.request(method, url, **httplib_request_kw)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connection.py", line 239, in request
gunicorn[24498]:     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
gunicorn[24498]:   File "/nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/lib/python3.10/http/client.py", line 1283, in request
gunicorn[24498]:     self._send_request(method, url, body, headers, encode_chunked)
gunicorn[24498]:   File "/nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/lib/python3.10/http/client.py", line 1329, in _send_request
gunicorn[24498]:     self.endheaders(body, encode_chunked=encode_chunked)
gunicorn[24498]:   File "/nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/lib/python3.10/http/client.py", line 1278, in endheaders
gunicorn[24498]:     self._send_output(message_body, encode_chunked=encode_chunked)
gunicorn[24498]:   File "/nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/lib/python3.10/http/client.py", line 1038, in _send_output
gunicorn[24498]:     self.send(msg)
gunicorn[24498]:   File "/nix/store/9c03r86hcdn43dm3hsgjirifvyzfkhwh-python3-3.10.12/lib/python3.10/http/client.py", line 976, in send
gunicorn[24498]:     self.connect()
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect
gunicorn[24498]:     conn = self._new_conn()
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
gunicorn[24498]:     raise NewConnectionError(
gunicorn[24498]: urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fa75f184430>: Failed to establish a new connection: [Errno 111] Connection refused
gunicorn[24498]: During handling of the above exception, another exception occurred:
gunicorn[24498]: Traceback (most recent call last):
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
gunicorn[24498]:     resp = conn.urlopen(
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
gunicorn[24498]:     retries = retries.increment(
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
gunicorn[24498]:     raise MaxRetryError(_pool, url, error or ResponseError(cause))
gunicorn[24498]: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=5111): Max retries exceeded with url: /query (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa75f184430>: Failed to establish a new connection: [Errno 111] Connection refused'))
gunicorn[24498]: During handling of the above exception, another exception occurred:
gunicorn[24498]: Traceback (most recent call last):
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 2529, in wsgi_app
gunicorn[24498]:     response = self.full_dispatch_request()
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1825, in full_dispatch_request
gunicorn[24498]:     rv = self.handle_user_exception(e)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
gunicorn[24498]:     rv = self.dispatch_request()
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
gunicorn[24498]:     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/srht/oauth/__init__.py", line 43, in wrapper
gunicorn[24498]:     return f(*args, **kwargs)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/pastesrht/blueprints/public.py", line 96, in new_paste_POST
gunicorn[24498]:     paste_id = create_paste(valid, files, visibility)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/pastesrht/blueprints/public.py", line 45, in create_paste
gunicorn[24498]:     resp = op.execute("paste.sr.ht", valid=valid)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/srht/graphql/client.py", line 79, in execute
gunicorn[24498]:     r = requests.post(f"{origin}/query",
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/api.py", line 115, in post
gunicorn[24498]:     return request("post", url, data=data, json=json, **kwargs)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/api.py", line 59, in request
gunicorn[24498]:     return session.request(method=method, url=url, **kwargs)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
gunicorn[24498]:     resp = self.send(prep, **send_kwargs)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
gunicorn[24498]:     r = adapter.send(request, **kwargs)
gunicorn[24498]:   File "/nix/store/gsx7dnmci82zr5m2ymh8z3mhzm64q5z4-python3-3.10.12-env/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
gunicorn[24498]:     raise ConnectionError(e, request=request)
gunicorn[24498]: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5111): Max retries exceeded with url: /query (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fa75f184430>: Failed to establish a new connection: [Errno 111] Connection refused'))

The cause behind this failure is that paste tries to contact its API backend, which, for some reason, doesn't exist. Apparently, pastesrht has been packaged incorrectly, as the buildGoModule for its API component is absent, resulting in a half-baked package. The fix is trivial, as the other services give plenty of examples on how to compile these hybrid Python/Go packages correctly.

This defect could have been caught by a simple test waiting for a socket to appear on the API port, so I suggest we add something along those lines to verify that these backend components are actually up and running. I guess very few people self-host their pastebins on NixOS machines...

@tomberek
Copy link
Contributor

Getting all the services to work together and configured was a hurdle last time I was looking at this. There were some cases where there was so much service isolation that they couldn't communicate. Are there tests we can add to to make this less fragile?

@nessdoor
Copy link
Contributor

I would advise some narrow-scoped integration tests that carry out some basic inter-component transaction, in addition to simple tests to verify that all components are up.

For example, pushing a new repo to git would test the full integration between git, meta and any intervening databases (you know, that thing that goes under the name of "thread testing"), and that would suffice, I think. After all, upstream already carries out their set of application tests, so we just need to ascertain that our inter-service connections work.

By the way, @tomberek, since you are the original maintainer of this module, do you have any working configuration for a multi-service Sourcehut installation to show us and that can guide us through some of the basic pitfalls? Many of the hurdles that I am encountering are being solved by reading other people's configs.

@nessdoor
Copy link
Contributor

nessdoor commented Aug 21, 2023

I don't know if I am stating the obvious, but it seems that the builds failure is due to the fact that a Redis endpoint URL meant for Celery is leaking to the Redis Go client, which should have its own URL, instead.
At the moment, I don't have enough disk space to regenerate a builds installation, I'll try to confirm my suspects as soon as I repartition my server and can generate a config.ini. default.nix and service.nix are too freaking confusing for me to inspect manually, I already spent some hours on those and didn't spot anything suspicious; too many recursive merges, options defined here and there, attribute overrides all over the place and closures being passed around.

Also, one day we should refactor this module to respect RFC42, because the amount of options is hideous and the repetition is disturbing.

@nessdoor
Copy link
Contributor

Found it.

At line 22 of api/graph/celery.go, builds pulls the redis config option from the INI file, which is the URI for Celery workers.

When running in the default local configuration, the Redis store is on the same machine as the worker, so the connection is established via socket, resulting in a URI with a redis+socket:// schema. The Celery library for Python mandates this schema (although it is currently undocumented, see celery/celery#6988), but unfortunately that is not the case for the Go library.
More specifically, the deprecated functions NewRedisCeleryBroker and NewRedisCeleryBackend of gocelery ultimately rely on a call to the deprecated NewRedisPool, which in turn tries to open a connection to Redis by passing the connection URI to the DialURL function of redigo.
The invalid redis URL scheme: redis+socket error message is the result of the DialURLContext function of redigo not knowing what to do with that schema.

The reason why nobody reported this bug before is probably because everyone using builds is either connecting to the local Redis instance via loopback, using a standard redis:// schema, or is running a remote Redis store.

What can we do now? I think that, for the time being, we should either patch the Go source code with something that mangles the schema into the right form (difficult), or we should make it so that the local Redis instance, at least for builds, is accessed through a loopback connection, so that both Python Celery and redigo are happy (easy).
We should also think about informing upstream of this issue, since it is something they should either explicitly state as unsupported in their documentation, or manage in their code. Maybe the problem will disappear once they refactor to remove the deprecated functions, as they will have to interface with redigo directly, but, given the fact that this bug has gone unnoticed for so long, I wouldn't count on that.

@nessdoor
Copy link
Contributor

With a simple modification to my system configuration, I was able to make builds communicate with its Redis instance via loopback, and can confirm that the problem is solved and the web UI behaves correctly, with the build being submitted.

Sadly, it seems that builds suffers from a defect in its derivation. Due to the commenting of line 948 of default.nix at the end of year 2021, the control script in charge of running the images is not getting copied to the lib/images directory of the service derivation, resulting in a

run_build panic: boot: fork/exec /nix/store/<hash>-buildsrht-worker-images/images/control: no such file or directory

In addition, the control script contains many hard-coded absolute paths, meaning that it hasn't been fixpath'ed nor parameterized, making it broken by default.

Again, this is an easy issue to solve, as the reason for commenting that line is no longer applicable, any future re-occurrence of the problem can be easily side-stepped, and the absolute paths can all be replaced by store paths and build-time parameters. What troubles me the most is that, once again, this bug has been sitting there for more than a year, with 3 intervening NixOS releases shipping broken code and no failure detected by the testing suite.

That said, I begin to wonder if it is meaningful to hold back on the merge of this PR, @christoph-heiss. Your updates are not breaking any of the core services that people seem to actually use (git and meta), and the other services seem to be broken from the start, so a version bump wouldn't hurt them, either. I think that, strategically speaking, it would make sense to have this global minor version bump committed, and then flesh-out the rest in subsequent PRs (after all, we are on unstable). Otherwise, I feel like the testing process of this PR would take too long and make it disappear in the PR soup from hell, wasting everybody's time.

Or, maybe we should ask for help on Discourse, and have other Sourcehut administrators provide suggestions.

@fgaz, sorry to trouble you again, but I noticed that you lamented the lack of meaningful Minio integration in #164440. Does this mean you are using any other service besides meta and git that are not Git LFS? Can you provide some insight into any other broken things not caught by mainstream users?

@fgaz
Copy link
Member

fgaz commented Aug 22, 2023

@fgaz, sorry to trouble you again, but I noticed that you lamented the lack of meaningful Minio integration in #164440. Does this mean you are using any other service besides meta and git that are not Git LFS? Can you provide some insight into any other broken things not caught by mainstream users?

iirc I stumbled upon that issue while attempting to test a pages.sr.ht patch on which I gave up, so I don't have anything else sorry

@nessdoor
Copy link
Contributor

nessdoor commented Aug 23, 2023

It's fine it's fine, better that way. Thanks!

@nessdoor
Copy link
Contributor

nessdoor commented Sep 4, 2023

Success reports follow.

First of all, the fix of the paste issue was trivial and, after building the API server and setting its unit file up, everything works fine.

The patch to the image.nix expression for building NixOS builder images has been revised and re-proposed upstream as patch number 44014. Awaiting further feedback or application.

As for builds itself, after having:

  • massaged the /images directory into better shape
  • wrapped the control script
  • exposed /bin to the chrooted worker
  • exposed a Docker socket to such worker
  • patched control to follow symlinks when mounting volumes

I finally succeeded in running the NixOS image. Great success!

Still a problem remains for build, though: for some reason, the API daemon is incapable of retrieving logs. All its requests to /query/log/<build-id>/log are met with a 502 response. I suspect a simple misconfiguration of Nginx, but I am not familiar with webservers, so it may take some time do debug.

All (temporary) fixes that I applied can be found at my Nixpkgs fork.

I have yet to decide which service to test next, but probably todo, as it would be a nice companion for my branch of temporary Sourcehut-on-NixOS patchwork.

I will write a checklist with all we've tested and with all we are yet to test, so that people may come over and help out without going through my walls of text.

@nessdoor
Copy link
Contributor

nessdoor commented Sep 4, 2023

Oh, and I just noticed #202608. Sad story indeed, I can resonate with much of that frustration.

@apfelkuchen6 I would love to hear your opinion! If you have time and are not nauseated by this module, of course...

@nessdoor
Copy link
Contributor

nessdoor commented Sep 4, 2023

@tomberek
Copy link
Contributor

tomberek commented Sep 5, 2023

I know this is a large amount of work. I apologize for not testing and reviewing yet.

christoph-heiss and others added 13 commits November 11, 2023 13:01
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
This breaks the (already fragile) gitsrht-dispatch -> gitsrht-keys
command chain.

Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
…le` flags

Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
An empty log directory, in case it stays unused, does not hurt anyone.

Signed-off-by: Christoph Heiss <christoph@c8h4.io>
These are needed, as the used sourcehut version is not compatible with
the newer major-releases for both packages.

Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
Signed-off-by: Christoph Heiss <christoph@c8h4.io>
@christoph-heiss
Copy link
Contributor Author

I have rebased this branch on the current master and addressed most of @h7x4's feedback.
A full rebuild and all tests pass cleanly.

I also had to add overrides for two more packages, which got major releases in the meantime. And a fix for PostgreSQL >= 15 was also needed.

This PR has now gone way beyond its scope. I'm not willing to do any more changes, as otherwise this will stay open forever. Either it gets merged as-is or never.

I cannot justify wasting more time & energy, just to constantly keep up with the master branch.

Copy link
Member

@sarcasticadmin sarcasticadmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just a couple of small observations around other places that still had set -x

This PR has now gone way beyond its scope. I'm not willing to do any more changes, as otherwise this will stay open forever. Either it gets merged as-is or never.

Just saw this comment. Please ignore since I totally can understand where your coming from

source = pkgs.writeShellScript "srht-dispatch" ''
source = pkgs.writeShellScript "srht-dispatch-wrapper" ''
set -e
set -x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you still need mean to set -x here?

];
environment.systemPackages = optional cfg.meta.enable
(pkgs.writeShellScriptBin "metasrht-manageuser" ''
set -eux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another one set -x that could potentially be dropped

extraService
])) extraServices)

# Work around 'pq: permission denied for schema public' with postgres v15, until a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the note here

@tomberek
Copy link
Contributor

Result of nixpkgs-review pr 245394 run on x86_64-linux 1

2 packages blacklisted:
  • nixos-install-tools
  • tests.nixos-functions.nixos-test
21 packages built:
  • sourcehut.buildsrht
  • sourcehut.buildsrht.dist
  • sourcehut.coresrht
  • sourcehut.coresrht.dist
  • sourcehut.gitsrht
  • sourcehut.gitsrht.dist
  • sourcehut.hgsrht
  • sourcehut.hgsrht.dist
  • sourcehut.hubsrht
  • sourcehut.hubsrht.dist
  • sourcehut.listssrht
  • sourcehut.listssrht.dist
  • sourcehut.mansrht
  • sourcehut.mansrht.dist
  • sourcehut.metasrht
  • sourcehut.metasrht.dist
  • sourcehut.pagessrht
  • sourcehut.pastesrht
  • sourcehut.pastesrht.dist
  • sourcehut.todosrht
  • sourcehut.todosrht.dist

Copy link
Contributor

@tomberek tomberek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build and functions as expected. NixOS tests pass.
(also did testing on x86_64-linux)

@tomberek tomberek merged commit 7859adb into NixOS:master Nov 11, 2023
@tomberek
Copy link
Contributor

@christoph-heiss : thanks so much for the work. The constant update and rebase is not fun. (this is mostly my fault, i've not been using or testing along the way, making it much slower to do review).

@christoph-heiss christoph-heiss deleted the pkgs/sourcehut branch November 11, 2023 19:09
@nessdoor
Copy link
Contributor

I'll try to make it quick, but I won't assure that I will be able to land all the fixes before the branch-off. Great thing that at least we have a refreshed package set and fixed service module, now!

Thank you, see you soon!

@nessdoor
Copy link
Contributor

Oh right, can someone make sure that #201424 and #201425 are indeed no longer applicable? I can say that #201425 is probably solved, by now, but I have little clue about the other one.

@christoph-heiss
Copy link
Contributor Author

Oh right, can someone make sure that #201424 and #201425 are indeed no longer applicable? I can say that #201425 is probably solved, by now, but I have little clue about the other one.

Yes, #201424 and #201425 indeed have been fixed this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sourcehut: go binaries (metasrht-api etc.) crash

6 participants