Skip to content

Conversation

@cgwalters
Copy link
Member

In some cases, we want to make a decision in the initramfs
based on whether or not an Ignition config was provided at all.

A good example of this is for live ISOs, we only want
to turn on networking if a config was provided:
https://github.com/coreos/ignition-dracut/issues/94

So the idea is that we'd end up running ignition fetch --detect-offline-config
as part of a systemd generator, which could then take futher
steps like pulling in network-online.target if a config was
provided.

@cgwalters
Copy link
Member Author

This is a total WIP, not tested at all. Throwing up for some early feedback. The code feels a bit gross too, happy to have other ideas.

In some cases, we want to make a decision in the initramfs
based on whether or not an Ignition config was provided at all.

A good example of this is for live ISOs, we only want
to turn on networking if a config was provided:
https://github.com/coreos/ignition-dracut/issues/94

So the idea is that we'd end up running `ignition fetch --detect-offline-config`
as part of a systemd generator, which could then take futher
steps like pulling in `network-online.target` if a config was
provided.
@cgwalters cgwalters force-pushed the resource-requires-network branch from 7f529e4 to 452bf66 Compare March 24, 2020 20:43
Ignition: types.Ignition{Version: types.MaxVersion.String()},
}

if stageName == "fetch" && e.DetectOfflineConfig != "" {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the Ignition code so that the stages can have their own CLI args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I can't agree more.

flag.BoolVar(&flags.clearCache, "clear-cache", false, "clear any cached config")
flag.StringVar(&flags.configCache, "config-cache", "/run/ignition.json", "where to cache the config")
flag.DurationVar(&flags.fetchTimeout, "fetch-timeout", exec.DefaultFetchTimeout, "initial duration for which to wait for config")
flag.StringVar(&flags.detectOfflineConfig, "detect-config-provided", "", "If a config is provided, create a file at this path")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we write a stamp file is this is the best pattern I know of to signal the equivalent of Result<bool> as a subprocess. Keying off distinct exit codes feels clunky since it breaks up the flow of using set -e in shell.

cgwalters added a commit to cgwalters/ignition-dracut that referenced this pull request Mar 24, 2020
First, copying into `/usr` just feels wrong; ideally even in
the initramfs `/usr` should be read-only.

Second, doing it this way will help with future work for
detecting the cases in which a config is provided; see:
coreos/ignition#948
Copy link
Contributor

@darkmuggle darkmuggle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me. I like the direction and look forward to the final result.

Ignition: types.Ignition{Version: types.MaxVersion.String()},
}

if stageName == "fetch" && e.DetectOfflineConfig != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about making this a helper func? func (e *Engine) HaveConfig() (bool, err)

Ignition: types.Ignition{Version: types.MaxVersion.String()},
}

if stageName == "fetch" && e.DetectOfflineConfig != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I can't agree more.

@cgwalters
Copy link
Member Author

xref coreos/ignition-dracut#161

@darkmuggle
Copy link
Contributor

This is a total WIP, not tested at all. Throwing up for some early feedback. The code feels a bit gross too, happy to have other ideas.

Yeah, I agree with it being "gross," but short of a refactor...its fine, IMHO.

@jlebon
Copy link
Member

jlebon commented Mar 24, 2020

Thanks for starting this. Fixing the "conditional network" issue would be great!

One thing is that while the provider backchannel may be offline, the Ignition config itself might still contain references to payloads that need to be fetched over the network. So this code would need to also check the config for that.

I think trying to make this part of a systemd generator is appealing, though there's a race this opens up where other things might've changed the Ignition config before Ignition actually runs in earnest which now does make it require networking. Also, generators I think block the entire boot, and it might not be reasonable for some offline providers to take their time waiting for the backchannel to be ready (echoes of discussions in #928). Or the system might just not be fully booted enough, since generators run in a barebones state.

Random idea: one way we could make this super explicit is have a compile flag like github.com/coreos/ignition/internal/distro.rdNetworkUnit, which if set, points to the name of the network unit to make sure is active before any network operation is attempted (e.g. network.target). Clearly more heavyweight though.

@jlebon
Copy link
Member

jlebon commented Mar 25, 2020

Ouuh another idea along the lines of what you have here: we split the fetch stage into fetch-offline and fetch-online. fetch-offline runs first and if it encounters a network operation, drops a stamp file somewhere in /run and exits. Then fetch-online.service just needs to have ConditionPathExists=/path/to/stamp/file, but crucially also has Requires=network.target.

func (e *Engine) detectOfflineConfig() (bool, error) {
offlineFetchers := []providers.FuncDetectConfig{
cmdline.DetectConfig,
file.DetectConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cmdline provider very much does require networking. The file provider does not, but only runs on the file platform, which is only used for testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is confusing. The goal here isn't to detect "offline" so much as whether a config was provided at all. We know that "statically" with those two by just looking at the filesystem basically.

For other providers we may need to wait for a driver or even do a network request. Ideally we detect the "no config provided" case with those too. But...that's a higher level thing I think.

@bgilbert
Copy link
Contributor

Maybe I'm missing something, but I'm not seeing much relationship between this code and what we actually need to do. AIUI the goal is to determine whether we're okay to run Ignition with the amount of network we have, or whether we need to prompt the user to configure more network. Good cases are ones where we can fully resolve the Ignition config; bad ones are where we can't. So we'd need to:

  • See if we can obtain the initial user config. At least the system, qemu, and zvm providers can do this without network; also any provider that successfully locates a config drive.
  • Recursively resolve replace and merge directives. If there's no network at all, this will likely fail, but if there's some network it might succeed. (For example, if we DHCP'd on one NIC of a dual-NIC setup.)
  • Once we have a complete config, try to fetch or HEAD all referenced files. If we have sufficient network, this will succeed. But perhaps the resource is only available via the second NIC that didn't DHCP, so "some network" may not be enough. This is non-trivial, because we have to handle all remote protocols and we don't control the remote endpoint. Maybe it's a custom webapp that doesn't support HEAD. Maybe it's a TFTP resource.
  • We also have to deal with external TLS CAs, which I think need to be resolved in parallel with replace and merge.

We might decide to punt the "some network but not enough" case, but we do still need to detect a local config that references remote resources.

There's also a potential race condition. If we try to fetch a remote resource and fail (perhaps after some retries), maybe it's because we don't have enough network, or maybe we'll have enough network momentarily but DHCP is slow.

@jlebon
Copy link
Member

jlebon commented Mar 25, 2020

Maybe I'm missing something, but I'm not seeing much relationship between this code and what we actually need to do.

The bottom line is there's no clean way to fix the conditional networking issue without having Ignition itself understand the distinction between a local and a remote resource. Starting this PR was a good way to get the ball rolling on discussing that. :)

We might decide to punt the "some network but not enough" case, but we do still need to detect a local config that references remote resources.

Right, I mentioned this as well in #948 (comment). How does the suggestion in #948 (comment) sound to you as a way to deal with this?

@bgilbert bgilbert closed this Mar 25, 2020
@bgilbert bgilbert reopened this Mar 25, 2020
@bgilbert
Copy link
Contributor

Sorry, wrong button.

@jlebon From a systemd unit perspective, that sounds reasonable.

@cgwalters
Copy link
Member Author

We could also fix #903 with this.

@cgwalters
Copy link
Member Author

One thing is that while the provider backchannel may be offline, the Ignition config itself might still contain references to payloads that need to be fetched over the network. So this code would need to also check the config for that.

OK yes that is the more general problem but again per #948 (comment) this is more detecting presence of any config at all to start.

BTW just dropping this here...I suspect on AWS/GCP/OpenStack that use the link local IP address we could fetch the config by just bringing up "the interface"/"all interfaces" without doing DHCP. In other words, I bet we could support static IP addresses on AWS/GCP/OpenStack too (assuming we invent a mechanism for networking-in-Ignition).

@cgwalters
Copy link
Member Author

Maybe ignition-probe or something? "offline" was the wrong word to choose here.

@sohankunkerkar
Copy link
Contributor

We could also fix #903 with this.

@cgwalters Are you planning to handle non-offline states for the user config also?

@cgwalters
Copy link
Member Author

@cgwalters Are you planning to handle non-offline states for the user config also?

On further reflection...maybe 😉 Detecting "no config provided" in AWS/GCP etc. is definitely strongly related to this but I guess not exactly the same thing. So it could make sense to do that as a separate PR.

@jlebon
Copy link
Member

jlebon commented Mar 26, 2020

OK yes that is the more general problem but again per #948 (comment) this is more detecting presence of any config at all to start.

OK, so you're trying to solve a more specific problem than what Benjamin and I are talking about I think. You're trying to just get enough information to figure out if e.g. a live ISO had any Ignition config provided so we can definitely skip network setup, right?

Yeah, that can work and it's a shorter path to getting a fully offline live ISO boot, though it'd be a bigger win to solve the more general conditional networking issue. Anyway, I'm thinking #948 (comment) is not trivial, but wouldn't be too hard either, so I can try that out and see how far I get in a timebox.

@cgwalters
Copy link
Member Author

Anyway, I'm thinking #948 (comment) is not trivial, but wouldn't be too hard either, so I can try that out and see how far I get in a timebox.

OK sure. In the meantime I actually context switched to see how hard it would be to remove the ip=dhcp from the kargs and run NM via a systemd unit in the initramfs. I'll push up a WIP starting point for that soon.

(Because, ultimately we really require that to have this code do anything useful)

cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Mar 26, 2020
See https://github.com/coreos/ignition-dracut/issues/94
and coreos/ignition#948

Needs pairing with a cosa PR to drop the default `ip=dhcp` kargs.

And yes we really want to upstream this into NM by default or so.
@jlebon
Copy link
Member

jlebon commented Mar 30, 2020

Alternative approach in #956.

jlebon pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Mar 30, 2020
See https://github.com/coreos/ignition-dracut/issues/94
and coreos/ignition#948

Needs pairing with a cosa PR to drop the default `ip=dhcp` kargs.

And yes we really want to upstream this into NM by default or so.
jlebon pushed a commit to cgwalters/fedora-coreos-config that referenced this pull request Mar 30, 2020
See https://github.com/coreos/ignition-dracut/issues/94
and coreos/ignition#948

Needs pairing with a cosa PR to drop the default `ip=dhcp` kargs.

And yes we really want to upstream this into NM by default or so.

Co-Authored-By: Dusty Mabe <dusty@dustymabe.com>
@jlebon
Copy link
Member

jlebon commented Apr 2, 2020

Thoughts on closing this in favour of #956?

@cgwalters
Copy link
Member Author

Thoughts on closing this in favour of #956?

Agreed!

@cgwalters cgwalters closed this Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants