Skip to content

Conversation

@jiria
Copy link
Member

@jiria jiria commented Feb 6, 2025

This PR is aggregation of all the work that Mitch and I did to unblock the CI scenario using tardev-snapshotter. Should be merged with care to not break the Kata/Cc scenario.

Merge Checklist
Summary
Test Methodology

sprt and others added 30 commits April 17, 2024 20:40
- Add --version flag to the genpolicy tool that prints the current
version
- Add version.rs.in template to store the version information
- Update makefile to autogenerate version.rs from version.rs.in
- Add license to Cargo.toml

Signed-off-by: Saul Paredes <[email protected]>
Linux kernel generates a panic when the init process exits.
The kernel is booted with panic=1, hence this leads to a
vm reboot.
When used as a service the kata-agent service has an ExecStop
option which does a full sync and shuts down the vm.
This patch mimicks this behavior when kata-agent is used as
the init process.

Fixes: kata-containers#9429

Signed-off-by: Alexandru Matei <[email protected]>
genpolicy: add support for cc-local-csi
agent: shutdown vm on exit when agent is used as init process
Add missing cache improvements specifically missing in containerd pull

Signed-off-by: Saul Paredes <[email protected]>
…improvements

genpolicy: add missing cache improvements
This patch adds support for the cc-azurefile-csi driver to the genpolicy.

Signed-off-by: Archana Choudhary <[email protected]>
This patch updates policy samples, required after adding support for
cc-azurefile-csi driver in genpolicy.

Signed-off-by: Archana Choudhary <[email protected]>
genpolicy: add support for cc-azurefile-csi driver
This reverts commit 627be9b, that was
insufficient. Waiting for blk devices used just the PCI device/slot
index, but not the PCI segment/domain index.

Signed-off-by: Dan Mihai <[email protected]>
Initialize the CLH Platform a single time.

Signed-off-by: Dan Mihai <[email protected]>
Hotplug block devices on PCI segments >= 1. PCI segment 0 is used for
the network interface, any disks present at Guest boot time, etc.

Just bus 0 of each segment is used, and up to 31 devices can be
hotplugged to each bus.

Signed-off-by: Dan Mihai <[email protected]>
This pod starts successfully when using default AKS-CC settings,
and a permissive policy.

When the Kata debug options are enabled, this pod fails to start while
trying to hotplug image layer index 41. This bug is being investigated.

The genpolicy tool should also try to create a smaller policy for
this pod, because otherwise "kubectl apply" rejects the policy
annotation as being too large.

Signed-off-by: Dan Mihai <[email protected]>
Lock anyhow version to 1.0.58 because:

- Versions between 1.0.59 - 1.0.76 have not been tested yet using
  Kata CI. However, those versions pass "make test" for the
  Kata Agent.

- Versions 1.0.77 or newer fail during "make test" - see
  kata-containers#9538.

Signed-off-by: Dan Mihai <[email protected]>
Implement Agent Policy using the regorus crate instead of the OPA
daemon.

The OPA daemon will be removed from the Guest rootfs in a future PR.

Fixes: kata-containers#9388

Signed-off-by: Dan Mihai <[email protected]>
Bump release version to 3.2.0-azl1.genpolicy0

Signed-off-by: Saul Paredes <[email protected]>
Move pod-many-layers.yaml to needs_containerd_pull category

Signed-off-by: Saul Paredes <[email protected]>
 runtime: agent: use PCI segments 1+ for blk devices
Since OPA binary was replaced by the regorus crate, we can finally stop
building and shipping the binary.

Signed-off-by: Fabiano Fidêncio <[email protected]>
The PID needs to be initialized before calling isClhRunning.
waitVMM() uses isClhRunning and is called by launchClh() just
before returning from function.

Fixes: kata-containers#9230

Signed-off-by: Alexandru Matei <[email protected]>
isClhRunning uses signal 0 to test whether the process is
still alive or not. This doesn't work because the process is a
direct child of the shim. Once it is dead the process becomes
zombie.
Since no one waits for it the process lingers until
its parent dies and init reaps it. Hence sending signal 0 in
isClhRunning will always return success whether the process is
dead or not.
This patch calls wait to reap the process, if it succeeds that
means it is our child process, if not we send the signal.

Fixes: kata-containers#9431

Signed-off-by: Alexandru Matei <[email protected]>
Signed-off-by: Saul Paredes <[email protected]>
clh: isClhRunning waits for full timeout when clh exits
rootfs: Stop building and shipping OPA
We've discussed this over and over. Let's try to get to an agreement here.
I will use this issue to remove the mandatory Issue - PR dependency.

Fixes: kata-containers#9500

Signed-off-by: Zvonko Kaiser <[email protected]>
ci: cherry-pick relaxed commit check from upstream
danmihai1 and others added 12 commits January 29, 2025 02:26
Deleted files:
- systemd-sysupdate.service
- systemd-sysupdate.timer
- systemd-sysupdate-reboot.timer
- systemd-sysupdate-reboot.service
- systemd-sysupdate

Guest VMs are not being updated this way.

Signed-off-by: Dan Mihai <[email protected]>
Delete:
- systemd-network-generator.service
- systemd-network-generator

Generating network configuration based on kernel command line arguments
is currently not supported for Guest VMs.

Signed-off-by: Dan Mihai <[email protected]>
Deleted files:
- systemd-tpm2-setup.service
- systemd-tpm2-setup-early.service
- systemd-tpm2-setup

TPM is not used on Guest VMs.

Signed-off-by: Dan Mihai <[email protected]>
Delete *.mount files in addition to other systemd files.

Signed-off-by: Dan Mihai <[email protected]>
There are no userspace filesystems on the Guest VMs, so no need to mount
the FUSE control filesystem.

Note that the FUSE protocol used by virtio-fs uses a server on the
Host, and doesn't depend on the FUSE control filesystem on the Guest.

Without sys-fs-fuse-connections.mount, [email protected] gets
disabled too.

Signed-off-by: Dan Mihai <[email protected]>
This mount was not functional due to missing CONFIG_CONFIGFS_FS in the
Guest kernel.

Deleting sys-kernel-config.mount also disables [email protected].

Signed-off-by: Dan Mihai <[email protected]>
Delete:
- systemd-update-utmp-runlevel.service
- systemd-update-utmp
in addition to systemd-update-utmp.service that was already deleted
before this commit.

systemd-update-utmp-runlevel.service depends on /var/log/wtmp. However,
systemd-tmpfiles-setup.service was already deleted, so /var/log/wtmp
was not present.

Signed-off-by: Dan Mihai <[email protected]>
Deleted files:
- systemd-vconsole-setup.service
- systemd-vconsole-setup

The Guest VM doesn't use virtual consoles.

Signed-off-by: Dan Mihai <[email protected]>
Implement HypervisorLoglevel config option for clh.

Signed-off-by: Cameron Baird <[email protected]>
@jiria jiria requested review from a team as code owners February 6, 2025 00:20
[Unit]
Description=tardev containerd snapshotter daemon
After=network.target
After=local-fs.target

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miz060 - can we pull this change into your change where you re-package tardev-snapshotter in the build scripts?

This one looks at least as it can go individual of this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.