Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual invocation of cloud-init with --file option should not be deprecated #5726

Closed
sshedi opened this issue Sep 23, 2024 · 8 comments
Closed
Labels
bug Something isn't working correctly

Comments

@sshedi
Copy link
Contributor

sshedi commented Sep 23, 2024

Bug report

2024-09-11 07:39:12,452 - lifecycle.py[DEPRECATED]: Unsupported configuration: boot stage called by PID [791] outside of systemd is deprecated in 24.3 and scheduled to be removed in 29.3. Triggering cloud-init boot stages outside of intial system boot is not a fully supported operation which can lead to incomplete or incorrect configuration. As such, cloud-init is deprecating this feature in the future. If you currently use cloud-init in this way, please file an issue describing in detail your use case so that cloud-init can better support your needs: https://github.com/canonical/cloud-init/issues/new

In VMware appliances we use cloud-init --file <yaml> init for doing guest os customisations.

Creating this separate bug after discussing about this a little bit at #5684 (comment)

We use GOSC (Guest OS Customisation) using open-vm-tools. Here we use a .cab file which contains some vm customisation settings, we convert it into a yaml file and apply it using cloud-init init --file.

The .cab file contains some other files along with some OS customisation settings but originally they are in INI format and later we parse it and convert it into a yaml and feed it to cloud-init.

You can extract the tarball from this link https://packages.vmware.com/photon/photon_sources/1.0/gosc-scripts-1.3.2.tar.gz
and check RunCloudConfig function and GenerateCloudInitConfig functions.

Steps to reproduce the problem

Invoke cloud-init with --file and init options with a simple yaml config manually.

Environment details

  • Cloud-init version: 24.3
  • Operating System Distribution: PhotonOS
  • Cloud provider, platform or installer type:

cloud-init logs

NA

@sshedi sshedi added bug Something isn't working correctly new An issue that still needs triage labels Sep 23, 2024
@a-dubs a-dubs removed the new An issue that still needs triage label Sep 23, 2024
@blackboxsw
Copy link
Collaborator

blackboxsw commented Sep 24, 2024

@sshedi thanks for continued support of cloud-init here and filing this issue to give us a platform to talk about desired/supportable options here in upstream cloud-init separate from the #5684 discussion.

For upstream cloud-init, we are looking to steer downstream images/distros/customization from relying directly on calling individual boot stages in cloud-init on the command line as each of those boot stages are mapped only to a subset of cloud-init config modules (as defined in /etc/cloud/cloud.cfg's cloud_init_modules, cloud_config_modules, and cloud_final_modules).

Direct command line calls to these boot stages expose downstream image configuration errors if:

  • any of the boot stages were unintentionally omitted resulting only in partial cloud-config being applied or
  • boot stages are called in the wrong order resulting in config ordering issues in an undesired final applied config

Given recent changes in upstream cloud-init may look to change or reduce some of these boot stages to gain performance improvements, the mechanism to invoke individual boot stages may shift, get renamed or get dropped in the future.

To decouple a downstream from any cloud-init boot stage changes, we would like to support a single command line call if there is a need to apply supplemental user-data after initial boot. It could be something like:

  • cloud-init --all-stages --file ud.yaml or
  • cloud-init apply ud.yaml

That command should:

  • ensure all valid cloud-init config modules are run in order defined by /etc/cloud/cloud.cfg
  • only apply the supplement configuration in ud.yaml and disregard any available datasource provided user-data, vendor-data base or base configuration in /etc/cloud/cloud.cfg.d/*.cfg,

We'll host a bit of a conversation here in this issue to collaborate on an solution that meets your needs for supplemental configuration changes without the need to reboot a machine. Allowing for a simplified UX for applying additional config changes and ensuring all config modules are run in the right order for supplemental user-data. By having a single command to call, instead of separate boot stages, cloud-init could ensure complete config is applied as well as preserve some sort of audit trail of these additional configs applied to the system in the event that bug triage is needed on a failed config event.

@sshedi
Copy link
Contributor Author

sshedi commented Sep 30, 2024

Thanks @blackboxsw for the response.
First and foremost, performance optimisations should not have been introduced in a minor or bug fix release, as this constitutes a major change. This is just my perspective, though the maintainers may have a different rationale that I’m unaware of. In my opinion, this should have been implemented as a build-time flag, allowing users to choose between the legacy approach and the new optimised method. It’s still possible to make this adjustment, giving users the flexibility to select what best suits their needs.

ensure all valid cloud-init config modules are run in order defined by /etc/cloud/cloud.cfg

Sounds good at first but this again makes it a closed eco system. I really like the current way of triggering stages one by one. It is really flexible and simple. I can run a particular stage in cloud init and do something else entirely to bring the system to a certain state and then run the next stage of cloud init if needed.
Is there problem if we throw an error if user tries to skip a stage and tries to initiate next stage? Like, running config stage without running init stage should be a hard failure. This way, users will be more aware of the flow and use the tool properly.

only apply the supplement configuration in ud.yaml and disregarding any available datasource provided user-data, vendor-data base or base configuration in /etc/cloud/cloud.cfg.d/*.cfg,

Isn't this a deviation from current implementation? Suppose if someone has a use case of dropping some cfg files under /etc/cloud/cloud.cfg.d/ and trigger init stage of cloud init, it gets enforced.

The focus on performance optimisations and progress is appreciated, but these particular changes seem overly invasive and completely break backward compatibility, with no available workarounds. One reason I favour Ubuntu and Debian products is their commitment to stability, and the developer's careful attention to ensuring backward compatibility.

Once again, thank you so much for considering my request.

@holmanb
Copy link
Member

holmanb commented Sep 30, 2024

Thanks @blackboxsw for the response. First and foremost, performance optimisations should not have been introduced in a minor or bug fix release, as this constitutes a major change.

Currently cloud-init releases four releases per year plus bug fixes and security updates. There is currently no concept of a minor release, so this would require a change to the upstream development model.

This is just my perspective, though the maintainers may have a different rationale that I’m unaware of. In my opinion, this should have been implemented as a build-time flag, allowing users to choose between the legacy approach and the new optimised method. It’s still possible to make this adjustment, giving users the flexibility to select what best suits their needs.

Please note that the deprecation of --files happened back in April and was not caused by the single process change, which was merged in August. I would highly recommend frequently testing features that you care about against the upstream development branch. It is much more expensive to rework changes post-release.

Cloud-init hasn't historically provided build-time flags for features, although some are provided for distro-specific behavior in setup.py*. For feature disablement, the project has historically provided patches, and occasionally feature flags to simplify patch maintenance. The single process change followed this process, and we made the exceptional decision to broadcast this change in all cloud-init communication channels so that those that package cloud-init could be made aware and respond.

  • aside: note that these build flags might be broken under the upcoming changes to support PEP517

@holmanb
Copy link
Member

holmanb commented Sep 30, 2024

Sounds good at first but this again makes it a closed eco system.

I think that this actually makes it a more open ecosystem - it frees the user from unexpected configurations provided by the distro or service provider. This would make cloud-init apply have behavior that is reproducible. The same cannot be said for the --files commands that exist today.

I really like the current way of triggering stages one by one. It is really flexible and simple. I can run a particular stage in cloud init and do something else entirely to bring the system to a certain state and then run the next stage of cloud init if needed.

This would still be possible under a cloud-init apply ud.yaml invocation. Just put your stage-specific cloud-config in stage-specific configuration files.

The "four stages" concept makes significantly less sense when manually outside of the init system. Lets say that there is a user that is unfamiliar with cloud-init's first boot semantics that just wants to configure their system and wants cloud-init to be their configuration tool. In that case, "just configure my system" requires four commands which are not easily discoverable, are not intuitive, and make little sense outside of historical context. These are not qualities which would make a new user want to choose cloud-init.

If testing cloud-init manually with these commands is your only concern, it would be trivial to invoke the --all-stages flag and manually trigger these stages using the socket protocol which is implemented in the service files when you want each stage to run. Is there some user story that this request is based on?

Is there problem if we throw an error if user tries to skip a stage and tries to initiate next stage? Like, running config stage without running init stage should be a hard failure. This way, users will be more aware of the flow and use the tool properly.

If we were to choose that, then we might get some other user that wants to only run the final stage before the config stage and then we will have broken another use case. Or a user that only wants to run only the config stage multiple times in a row without any of the other stages. If we ditch the limitations imposed by stage ordering and just apply the configurations that were passed to cloud-init apply, wouldn't that be even more simple and intuitive and flexible and powerful and less error prone?

@holmanb
Copy link
Member

holmanb commented Sep 30, 2024

only apply the supplement configuration in ud.yaml and disregarding any available datasource provided user-data, vendor-data base or base configuration in /etc/cloud/cloud.cfg.d/*.cfg,

Isn't this a deviation from current implementation? Suppose if someone has a use case of dropping some cfg files under /etc/cloud/cloud.cfg.d/ and trigger init stage of cloud init, it gets enforced.

Deviation? Yes. We could probably provide a flag which allows users to enable this for a backwards compatible feature (prior art: cloud-init schema --system), but I am not convinced that enabling it would be the best default behavior. A user that reaches for cloud-init for configuring their instance is unlikely to even know that their vendor provided a configuration - so for reproducibility across systems and observance of the principle of least surprise I think that it would be best to avoid vendor-induced side-effects for the default behaviors of a configuration management tool.

these particular changes seem overly invasive and completely break backward compatibility, with no available workarounds

Could you please clarify this statement? What parts of it were overly invasive? What parts break backwards compatibility without workarounds?

A patch was provided for backwards compatibility. There shouldn't be a user-visible change in behavior as a result of the optimization besides the documented systemd service rename[1].

Any downstream that provides their own service files (most stable distros) will have no backwards compatibility issues since the previously-used commands remain intact. Any downstream that does not provide their own service files is expected to read the release notes and breaking changes documentation prior to shipping a release - after which they can choose to receive the change or may choose to apply the patch as a workaround.

[1] I wouldn't expect the rename to affect many services, feel free to prove otherwise.

blackboxsw added a commit to blackboxsw/cloud-init that referenced this issue Oct 2, 2024
Avoid raising deprecations command line calls to cloud-init boot
stages when the optional --file param is also provided.

When cloud-init's boot stages are invoked with a --file argument
this is typically an attempt to add suppplemental configuration to a
system after initial boot has completed.

Although direct calls to cloud-init boot stages could lead to
misconfiguration if boot stages are called out of sequence
of customized /etc/cloud/cloud.cfg altering which config modules are
run during the boot stage, the support of such environments is out of
scope for this feature. It remains a simple tool to apply additional
configuration via user-data onto an existing system after initial boot.

At some point in the future, a new subcommand may be provided by
cloud-init tooling to allow for supplemental configuration of user-data
post-boot, in that event this --file feature will become deprecated in
favor of a cleaner/simpler alternative.

Fixes canonicalGH-5726
@blackboxsw
Copy link
Collaborator

blackboxsw commented Oct 2, 2024

Thanks @holmanb and @sshedi this is good discussion, and for the moment, we don't have a good alternative to passing --files to boot stages. In the meantime, we agree that we will treat this as a bug that we will backport to Oracular to avoid disrupting existing users of this subcommand. --files will be allowed for the future where we don't have a better alternative to provide for this use-case @sshedi has described. In the event that we have an alternative in the future, we will provide a migration path whereby only a new quarterly upstream release will introduce this breaking change which will be documented for any stable downstreams which look to occasionally sync latest upstream cloud-init releases.

Is there problem if we throw an error if user tries to skip a stage and tries to initiate next stage? Like, running config stage without running init stage should be a hard failure. This way, users will be more aware of the flow and use the tool properly.

What would be a problem is that we are continuing to expose and reinforce "internals" of how cloud-init runs (4 separate ordered boot stages). The commands will then have to reinforce boot stage ordering as well as assert that all bootstages get called to ensure full application of any contained user-data. It would be increase the investment on a tool and entry point that we historically have discouraged individuals from using because it is prone to errors, omissions or partial configuration.

But, we don't have a better solution present at the moment (and admittedly it will take a while to get there!) so let's preserve this --file option. In the event that we spec out a new feature in cloud-init to better provide a supplemental apply operation, we will like that spec/enhancement request to this issue and reflect that feature for input.

In the meantime, we've put up #5770 that should allow direct calls of cloud-init boot stages on the command line with or without the --files parameter to avoid disruption for the specific element of the deprecation messaging.

We still don't want general use of the boot stages broadly as it's use implies a lot of knowledge of the individual boot stages and awareness of whether those boot stages are applicable for their given environment, current version of cloud-init installed and whether customization of /etc/cloud/cloud.cfg will lead them down incorrect or incomplete configuration based on their decision to "only run some stages if needed".

We do know that newer versions of cloud-init will be going toward a path of less systemd units (and possibly less boot stages) to reduce complexity in early boot where possible. The sooner we break the assumption that we can individually trigger some stages of cloud-init's setup the sooner we can find more convenient utilities and knobs for cloud-init that can provide the actual features needed to apply post-boot configuration to a running system.

blackboxsw added a commit to blackboxsw/cloud-init that referenced this issue Oct 2, 2024
Avoid raising deprecations command line calls to cloud-init boot.

When cloud-init's boot stages are invoked directly on the command line
is typically an attempt to add suppplemental configuration to a
system after initial boot has completed.

Although direct calls to cloud-init boot stages could lead to
misconfiguration if boot stages are called out of sequence
of customized /etc/cloud/cloud.cfg altering which config modules are
run during the boot stage, the support of such uses or custom
environments is out of scope for this feature. It remains a simple
tool to apply additional configuration via user-data onto an existing
system after initial boot.

At some point in the future, a new subcommand may be provided by
cloud-init tooling to allow for supplemental configuration of user-data
post-boot, in that event this --file feature will become deprecated in
favor of a cleaner/simpler alternative.

Fixes canonicalGH-5726
blackboxsw added a commit to blackboxsw/cloud-init that referenced this issue Oct 2, 2024
Avoid deprecating command line calls to cloud-init boot.

When cloud-init's boot stages are invoked directly on the command line
is typically an attempt to add suppplemental configuration to a
system after initial boot has completed.

Although direct calls to cloud-init boot stages could lead to
misconfiguration if boot stages are called out of sequence
of customized /etc/cloud/cloud.cfg altering which config modules are
run during the boot stage, the support of such uses or custom
environments is out of scope for this feature. It remains a simple
tool to apply additional configuration via user-data onto an existing
system after initial boot.

At some point in the future, a new subcommand may be provided by
cloud-init tooling to allow for supplemental configuration of user-data
post-boot, in that event this --file feature will become deprecated in
favor of a cleaner/simpler alternative.

Fixes canonicalGH-5726
@sshedi
Copy link
Contributor Author

sshedi commented Oct 4, 2024

All good points, thanks for the explanation. By invasive change, I was referring to the follwing changes:

  • Renaming cloud-init service to cloud-init-network and changing the backend of how service gets started.

  • With the latest release, we need to have BSD netcat in production servers. AFAIK, netcat is a debug and diagnostics tool; cloud-init is using Unix domain sockets currently but should we be worried about it? A regular user can start a netcat server with a random port number with access to netcat tool. In PhotonOS, BSD netcat package was not added till now (we are a minimal distro and intend to keep it that way), to add BSD netcat, we added three other packages which are required by BSD netcat. This can lead us into further questioning during security audits.

Edit:
Also, if any of these dependent packages of netcat break for some reason, cloud-init will stop working; this kind of tight coupling might cause a deadlock in future. Example: due to an embargo CVE fix or for a zero day vulnerability fix, if we upgrade the version of any of these packages and if the newer version results in some incompatibility, we will end up in a difficult situation. IMO, we should be extra careful while making critical services like cloud-init depend on other components. Feel free to correct me if I'm wrong here.

@holmanb
Copy link
Member

holmanb commented Oct 4, 2024

@sshedi This PR is closed, but I'll give a response to your latest comments.

With the latest release, we need to have BSD netcat in production servers.
...
In PhotonOS, BSD netcat package was not added till now (we are a minimal distro and intend to keep it that way), to add BSD netcat, we added three other packages which are required by BSD netcat. This can lead us into further questioning during security audits.

BSD netcat is not a hard requirement for this to work, it just happens to be a more performant option that is available most places. Like we mentioned in the announcement and elsewhere, a pure-python replacement is possible if needed. I can share a Python one-liner to replace the netcat requirement if you want to avoid the dependency.

AFAIK, netcat is a debug and diagnostics tool; cloud-init is using Unix domain sockets currently but should we be worried about it?

You should be aware of, and consider the security implications of this change. For example, is /run/cloud/ writable by unprivileged users? If yes, then you already have a security issue to address in PhotonOS.

A regular user can start a netcat server with a random port number with access to netcat tool.

The same is true of any programming language that can open a socket (such as Python). Creating a socket is typically not considered a privileged operation. That said, I don't see how user socket creation is relevant?

Also, if any of these dependent packages of netcat break for some reason, cloud-init will stop working; this kind of tight coupling might cause a deadlock in future. Example: due to an embargo CVE fix or for a zero day vulnerability fix, if we upgrade the version of any of these packages and if the newer version results in some incompatibility, we will end up in a difficult situation. IMO, we should be extra careful while making critical services like cloud-init depend on other components. Feel free to correct me if I'm wrong here.

Yes, this argument always applies to dependency additions. We typically avoid new external dependencies, and we carefully consider the risks when adding new ones. If you examine the single process PR regarding the use of netcat, you will see that this path was considered from various perspectives before being accepted as the preferred path forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly
Projects
None yet
Development

No branches or pull requests

4 participants