Skip to content

systemd-confinement: Fix with services using restrictive UMasks#147639

Draft
aszlig wants to merge 2 commits intoNixOS:masterfrom
aszlig:systemd-confinement-umask
Draft

systemd-confinement: Fix with services using restrictive UMasks#147639
aszlig wants to merge 2 commits intoNixOS:masterfrom
aszlig:systemd-confinement-umask

Conversation

@aszlig
Copy link
Member

@aszlig aszlig commented Nov 28, 2021

NOTE: This currently is just a failing test in preparation for the fix, do not merge.

This is just in preparation to fixing #147599, because when using confinement with a non-root user and a restrictive UMask, the leading directories for all of the bind mounts we need in order to execute the program in question will get that UMask as well.

In that issue, I also suggested a workaround, which is to set the following option:

{ TemporaryFileSystem = [ "/" "/nix" "/nix/store" ]; }

While this works, @martined has already opened an upstream issue at systemd/systemd#21548 so before we jump to introduce such ugly things as the workaround above, let's first see what upstream has to say.

Nevertheless, this test case is there so that once we have a definitive answer, we already have a regression test.

This is just in preparation to fixing NixOS#147599, because when using
confinement with a non-root user and a restrictive UMask, the leading
directories for all of the bind mounts we need in order to execute the
program in question will get that UMask as well.

In that issue, I also suggested a workaround, which is to set the
following option:

  TemporaryFileSystem = [ "/" "/nix" "/nix/store" ];

While this works, @martined has already opened an upstream issue at
systemd/systemd#21548 so before we jump to
introduce such ugly things as the workaround above, let's first see what
upstream has to say.

Nevertheless, this test case is there so that once we have a definitive
answer, we already have a regression test.

Signed-off-by: aszlig <aszlig@nix.build>
@github-actions github-actions bot added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Nov 28, 2021
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Nov 28, 2021
This is only a code style change, since with the port to Python in
1320f23, the test became less readable
because the description attribute got inlined in the individual
testScript snippet in a series of "with subtest(...):".

While I do welcome the port to Python, I think this also made the code
harder to follow because every testScript now contains additional
boilerplate.

The reason why the descriptions got inlined was probably because at the
time we were using "Black" (a Python code formatter), which would barf
if a description would exceed a certain amount of characters.

However, since Black is no longer used and we switched to a linter a
long time ago (c362a28), so we can now
re-introduce the description attribute.

Signed-off-by: aszlig <aszlig@nix.build>
@martinetd
Copy link
Member

Thanks for this!

It looks like I had been beaten to it ( systemd/systemd#19899 ) and something got merged just a couple of weeks ago to fix it, for what will be v250.

Unfortunately I don't think that's something they'd backport, so we're still stuck for stable release for a while -- I'm thinking the workaround you suggested while ugly might be something we can do waiting for the proper fix? Not sure honestly...

@aszlig
Copy link
Member Author

aszlig commented Nov 30, 2021

Unfortunately I don't think that's something they'd backport, so we're still stuck for stable release for a while -- I'm thinking the workaround you suggested while ugly might be something we can do waiting for the proper fix? Not sure honestly...

Well, or we backport/apply the upstream fix to our version of systemd, which we then can drop when upgrading to either 249.X or 250. When using a restrictive UMask value, /nix/store is not the only path that could be bind-mounted, so our little workaround would need to include all those leading directories.

@aszlig
Copy link
Member Author

aszlig commented Nov 30, 2021

@martinetd: Just opened a backport pull request in systemd/systemd-stable#148 in the hopes that they accept the backport. If it doesn't work out we can still apply the commits downstream.

Addendum: Since I wasn't sure whether you tested systemd/systemd#21320 specifically or just systemd main, I specifically tested the backport pull request for the test added here and it succeeds.

@martinetd
Copy link
Member

Thanks!
I personally think just a warning and letting the user deal with it would probably be enough (this isn't a regression, so the main takeaway would be avoiding people wasting time figuring out why service doesn't start and let them deal with either workaround (lower umask to xxx6 or tmpfs mounts)), but if upstream would be willing to backport it I agree it's simpler.

I've glanced at the PR and most of it is just adding a new macro and using it, but the actual code change only added a umask()+cleanup call in two places so it's actually much a smaller change than I thought it was. Let's see how it goes.

@martinetd
Copy link
Member

Aaand the backport PR has just been merged...! systemd-stable v249.8 has been released with it.
(Note there also is 250.2 since then, which also has the fix, not sure which version we want? but nixpkgs master is still at 249.7)

@aszlig since you added a test I'll let you bump systemd version?

@aszlig
Copy link
Member Author

aszlig commented Feb 2, 2022

@martinetd: Since there is #156096, I'm not sure whether we should bump to 249.8 or wait for 250. @jonringer: Do you have a rough estimate when you expect this to land in master?

@jonringer
Copy link
Contributor

@martinetd: Since there is #156096, I'm not sure whether we should bump to 249.8 or wait for 250. @jonringer: Do you have a rough estimate when you expect this to land in master?

I was going to include it after the next staging cycle began. So about a month.

@jonringer
Copy link
Contributor

I would also be okay with you adding the commits to the other PR

@martinetd
Copy link
Member

I think it depends a bit on what we want for 21.11 -- for that branch we'd want 249.8 and not 250 if we want the fixes there?
I'm using nixos-unstable myself and worked around the problem for now so am not in a particular hurry (I'm still rebuilding systemd locally for #147354 though... probably would have been worth updating with 249.8 last month but I forgot about it :| closing now), I don't have an opinion so leaving it to you.

@jonringer
Copy link
Contributor

I think it depends a bit on what we want for 21.11 -- for that branch we'd want 249.8 and not 250 if we want the fixes there?
I'm using nixos-unstable myself and worked around the problem for now so am not in a particular hurry (I'm still rebuilding systemd locally for #147354 though... probably would have been worth updating with 249.8 last month but I forgot about it :| closing now), I don't have an opinion so leaving it to you.

21.11 should not be receiving the 250 bump. So 249.8 makes sense for stable.

@martinetd
Copy link
Member

Thanks for confirming! The question was more about do we want it or not, but given the other fixes I'd guess we would.
What would the path be for a minor update? assuming it doesn't need to go through staging, master > release-21.11 branch? or straight to release-21.11 branch? (but then 21.11 would be ahead of master until 250 gets in...)

@jonringer
Copy link
Contributor

Yea, the only thing that complicates this is that systemd250 is close to ready for staging.

cc @vcunat on thoughts.

@martinetd
Copy link
Member

FWIW I've tested locally and am running 249.8 on my machines without problem so far; I've already been recompiling systemd for an older bug might as well get both fixes.

The included patches applied without any change (not sure how to test musl patches though), here's the rebased version with these 2 patches + systemd bump, even if that's the easy part:
https://github.com/martinetd/nixpkgs/tree/systemd-249-8

@vcunat
Copy link
Member

vcunat commented Feb 6, 2022

Are you sure this will be worth trying to fix in 21.11? On a quick glance I don't expect it would be easy to be certain that you don't introduce some minor breakage.

@martinetd
Copy link
Member

I've hijacked this PR but the main reason I'm building my own systemd is that my pid1 sometimes crashes when upgraded on one machine due to #147103, and that seems worth the trouble even if the conditions to hit the crash are probably not that common (for me it happens because of some udev thing wants to umount dvds on upgrade for some reason, and systemd gets lost in the dependency tree). It's double the pain when services like hostapd get stopped before the crash and physical access is awful, I'm not taking the risk that it happens again. But once again I'm on unstable and can keep rebuilding systemd until 250 gets in, so I don't really care.
#147354 (the PR I originally made back in nov to fix that) was waiting for the fix to be backported in systemd-stable forever, and by the time 249.8 got released we had this other PR waiting for it with additional patches for confinement tests so I preferred to continue here instead.

Anyway yes, there's no way to be certain it won't break something else instead. I'd say that's systemd maintainers jobs to not backport risky patches, but I've also had a look at the git log and while I don't see anything suspicious in the 109 commits there it's impossible to say it won't break a thing. (mandatory xkcd here: https://xkcd.com/1172/ )
As I said, I've done the easy part :)

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Aug 12, 2022
@wegank wegank added the 2.status: merge conflict This PR has merge conflicts with the target branch label Mar 20, 2024
@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 20, 2024
@wegank wegank added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.status: merge conflict This PR has merge conflicts with the target branch 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

systemd confinement does not work well with UMask=0077 + non-root user

5 participants