Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[email protected] might use the network but does not wait for it #3348

Open
mvollmer opened this issue May 22, 2023 · 3 comments
Open

[email protected] might use the network but does not wait for it #3348

mvollmer opened this issue May 22, 2023 · 3 comments
Assignees

Comments

@mvollmer
Copy link
Collaborator

For a pool with NBDE, stratis-fstab-setup will run clevis during boot, as it should. For clevis to have a chance to work, the network needs to be up enough for the the tang server to be reachable, but the [email protected] units don't have any dependency on network-online.target or similar.

Putting "_netdev" into the fstab entry doesn't help either: it delays the actual mounting of the filesystem, but it doesn't delay starting of [email protected], which is started as early as allowed by its own dependencies.

I think stratis-fstab-setup (the script) should probably just sit in a loop and retry clevis on every change to the network status, whenever there might be a chance for it to work. (But it's not clear when to give up, hmm.)

Just waiting for network-online.target and then running clevis once might be enough for most cases, but please see https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

@jbaublitz
Copy link
Member

Just for reference, we use network-online.target for the root fs. CoreOS wanted to write their own handling of Stratis in the root filesystem as a result of this. We could potentially add an exponential backoff retry mechanism instead for both as suggested by CoreOS, but a stopgap that would probably be workable sooner would be to add a requirement for network-online.target. What would be your preference?

@mvollmer
Copy link
Collaborator Author

Just for reference, we use network-online.target for the root fs.

I don't immediately see how this is possible... Doesn't network-online.target indirectly depend on -.mount? Or does this all happen inside the initrd?

What would be your preference?

Something like this in stratis-fstab-setup, maybe:

if $(stratis-min pool is-stopped "$POOL_UUID"); then
	if $(stratis-min pool is-bound "$POOL_UUID"); then
		while ! systemctl is-active network-online.target; do
		  	echo Waiting for network
			sleep 1
		done
		if ! stratis pool start --unlock-method=clevis --uuid "$POOL_UUID"; then
			echo Failed to start pool with UUID $POOL_UUID using Csslevis. >&2
			exit 1
		fi
       else ...

(Note that by the time network-online.target has been reached, stratis-min has stopped working...)

@jbaublitz
Copy link
Member

jbaublitz commented May 25, 2023

Just for reference, we use network-online.target for the root fs.

I don't immediately see how this is possible... Doesn't network-online.target indirectly depend on -.mount? Or does this all happen inside the initrd?

We actually worked with systemd on this because previously this did not work since NetworkManager was using legacy dracut functionality so we couldn't wait on network-online.target and have NetworkManager set up the network. I see no indication anywhere that .mount is required by network-online.target and we get no warning messages in the initrd about cyclical dependencies around that. Perhaps the reason we can do this is that it is indeed all in the initrd, but I don't see any indication that it wouldn't work for stratis-fstab-setup too.

What would be your preference?

Something like this in stratis-fstab-setup, maybe:

if $(stratis-min pool is-stopped "$POOL_UUID"); then
	if $(stratis-min pool is-bound "$POOL_UUID"); then
		while ! systemctl is-active network-online.target; do
		  	echo Waiting for network
			sleep 1
		done
		if ! stratis pool start --unlock-method=clevis --uuid "$POOL_UUID"; then
			echo Failed to start pool with UUID $POOL_UUID using Csslevis. >&2
			exit 1
		fi
       else ...

(Note that by the time network-online.target has been reached, stratis-min has stopped working...)

Just because of the warning in the documentation, it may be time to make a more robust out of the box solution with exponential backoff retry logic, but this might be a quicker fix in the meantime if requiring network-online.target doesn't work outside of the initrd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants