Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following the documentation for minikube deployment doesn't work #1941

Closed
ChaosInTheCRD opened this issue Mar 15, 2022 · 23 comments · Fixed by falcosecurity/falco-website#710
Labels
Milestone

Comments

@ChaosInTheCRD
Copy link

ChaosInTheCRD commented Mar 15, 2022

Describe the bug

Following the documentation for creating a "Falco Learning Environment" unfortunately does not work. Upon using helm to deploy the Falco and getting this error in the pod logs:

Tue Mar 15 15:25:42 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.

Workaround
I went on to try a handful of other virtual machine drivers for minikube to no avail. I resorted to the Kubernetes slack where I got help from @terylt.

As it turns out, it seems that there is a script that runs at startup to try and install the kernel module / ebpf probe necessary to get Falco running in the relevant environment. The bash script seems to do some guesswork to determine what operating sytem it is in, then tries to decide the correct approach to install the kernel module / ebpf probe. This does not currently work correctly for minikube.

to get the script to pass through the logic linked here (and hence correctly determine that it is in a minikube vm), the daemonset must be modified like so, after a helm template or a kubectl edit after deployment:

      containers:
        - name: falco
...
          volumeMounts:
...
            - mountPath: /host/etc/VERSION
              name: etc-fs
              readOnly: true
...
      volumes:
...
        - name: etc-fs
          hostPath:
            path: /etc/VERSION

...

The daemonset also needs to have eBPF enabled, as otherwise it continues to fail. This can either be done by setting the env var on the falco pod in the manifest:

          env:
          - name: FALCO_BPF_PROBE

or by enabling eBPF in the helm values file:

ebpf:
  # Enable eBPF support for Falco
  enabled: true

once these two steps have been taken, the pod should turn to a READY state:

NAME                READY   STATUS    RESTARTS   AGE
falco-falco-blfrb   1/1     Running   0          24m

How to reproduce it
Follow the documentation for creating a learning environment with minikube

Expected behaviour

I feel others thoughts might be mixed on this... but to me it doesn't seem unreasonable to request the user to specify the environment they are deploying to (e.g. GKE, minikube, kind etc.) in the form of an env var or a command-line argument. This way, there is no need for a script to exist or be maintained when inevitably, situations arise that break the mechanisms in which the script tries to decipher the environment it is in.

Environment

  • Falco version:
    0.31.1
  • Cloud provider or hardware configuration:
  • OS:
    Minikube (v1.25.0 - commit: 3edf4801f38f3916c9ff96af4284df905a347c86)
  • Installation method:
    Helm on Kubernetes
@terylt
Copy link
Contributor

terylt commented Mar 15, 2022

Thanks for filing the issue Tom!

Just to add to Tom's issue. I think the bug is in the falco-driver-loader script in the if statement here:

if [ -f "${HOST_ROOT}/etc/os-release" ]; then

Falco uses the /etc/VERSION file to detect it is running in minikube. Unfortunately, it never gets to that check in the if statement above because I think minikube instances also have the os-release file as well, so it pops out of that if statement without detecting it's running in minikube. The old sysdig-probe-loader script dealt with this by breaking out the /etc/VERSION check into a separate if statement, which did work, but I'm not sure of the other implications of that.

Also, I think minikube requires ebpf to be enabled, but it doesn't look like that is documented in the docs.

@leogr
Copy link
Member

leogr commented Mar 15, 2022

Minikube ships its own Falco pre-built driver (see kubernetes/minikube#6560) since it is impossible to build the driver on the fly for Minikube because it doesn't provide a compiler or kernel-headers.

Unfortunately, the last update in Minikube was two years ago 👇 https://github.com/kubernetes/minikube/tree/master/deploy/iso/minikube-iso/package/falco-module

The driver version they ship should work up to Falco 0.30.x, but not with 0.31.1.
We should open a PR in minikube to fix it.

@juju4
Copy link
Contributor

juju4 commented Mar 16, 2022

This does not affect only minikube. I have it on Ubuntu 18.04

# cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.6 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
# uname -a
Linux HOST 4.15.0-167-generic #175-Ubuntu SMP Wed Jan 5 01:56:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# journalctl -xe -u falco
Mar 16 09:54:19 HOST systemd[1]: Started Falco: Container Native Runtime Security.
-- Subject: Unit falco.service has finished start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit falco.service has finished starting up.
--
-- The start-up result is RESULT.
Mar 16 09:54:19 HOST falco[37102]: Falco version 0.31.1 (driver version b7eb0dd65226a8dc254d228c8d950d07bf3521d2)
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Falco version 0.31.1 (driver version b7eb0dd65226a8dc254d228c8d950d07bf3521d2)
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 09:54:19 HOST falco[37102]: Wed Mar 16 09:54:19 2022: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 09:54:19 HOST falco[37102]: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 09:54:19 HOST falco[37102]: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 09:54:20 HOST falco[37102]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 09:54:20 HOST falco[37102]: Wed Mar 16 09:54:20 2022: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 09:54:20 HOST falco[37102]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 09:54:20 HOST falco[37102]: Wed Mar 16 09:54:20 2022: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 09:54:21 HOST falco[37102]: Rules match ignored syscall: warning (ignored-evttype):
Mar 16 09:54:21 HOST falco[37102]:          loaded rules match the following events: access,brk,close,cpu_hotplug,drop,epoll_wait,eventfd,fcntl,fstat,fstat64,futex,getcwd,getdents,getd
Mar 16 09:54:21 HOST falco[37102]:          but these events are not returned unless running falco with -A
Mar 16 09:54:21 HOST falco[37102]: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.
Mar 16 09:54:21 HOST falco[37102]: Wed Mar 16 09:54:21 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.
Mar 16 09:54:21 HOST systemd[1]: falco.service: Main process exited, code=exited, status=1/FAILURE
Mar 16 09:54:21 HOST systemd[1]: falco.service: Failed with result 'exit-code'.
Mar 16 09:54:21 HOST systemd[1]: falco.service: Received 0B IP traffic, sent 0B IP traffic
# grep -C2 falco /var/log/apt/history.log

Start-Date: 2022-03-12  13:01:57
Upgrade: dotnet-runtime-3.1:amd64 (3.1.22-1, 3.1.23-1), dotnet-host:amd64 (6.0.2-1, 6.0.3-1), dotnet-hostfxr-3.1:amd64 (3.1.22-1, 3.1.23-1), sosreport:amd64 (4.1-1ubuntu0.18.04.3, 4.3-1ubuntu0.18.04.1), dotnet-runtime-deps-3.1:amd64 (3.1.22-1, 3.1.23-1), falco:amd64 (0.31.0, 0.31.1)
End-Date: 2022-03-12  13:02:40

Downgrading to 0.31.0 restores functionality

# apt-get install falco=0.31.0
# systemctl status falco
● falco.service - Falco: Container Native Runtime Security
   Loaded: loaded (/lib/systemd/system/falco.service; disabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system.control/falco.service.d
           └─50-CPUQuota.conf, 50-CPUShares.conf, 50-MemoryLimit.conf
   Active: active (running) since Wed 2022-03-16 10:34:02 UTC; 4s ago
 Main PID: 57182 (falco)
       IP: 0B in, 0B out
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/falco.service
           └─57182 /usr/bin/falco --pidfile=/var/run/falco.pid

Mar 16 10:34:02 HOST falco[57182]: Wed Mar 16 10:34:02 2022: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 10:34:02 HOST falco[57182]: Falco initialized with configuration file /etc/falco/falco.yaml
Mar 16 10:34:02 HOST falco[57182]: Loading rules from file /etc/falco/falco_rules.yaml:
Mar 16 10:34:03 HOST falco[57182]: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 10:34:03 HOST falco[57182]: Wed Mar 16 10:34:03 2022: Loading rules from file /etc/falco/falco_rules.local.yaml:
Mar 16 10:34:03 HOST falco[57182]: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 10:34:03 HOST falco[57182]: Wed Mar 16 10:34:03 2022: Loading rules from file /etc/falco/k8s_audit_rules.yaml:
Mar 16 10:34:04 HOST falco[57182]: Rules match ignored syscall: warning (ignored-evttype):
Mar 16 10:34:04 HOST falco[57182]:          loaded rules match the following events: access,brk,close,cpu_hotplug,drop,epoll_wait,eventfd,fcntl,fstat,fstat64,futex,getcwd,getdents,getd
Mar 16 10:34:04 HOST falco[57182]:          but these events are not returned unless running falco with -A

# apt-mark hold falco
falco set on hold.

@ChaosInTheCRD
Copy link
Author

@leogr would it be preferable to just align all minikube installs to the ebpf probe and have the init script download that, as I have done in my workaround? I am unsure as to what the "most supported" method of setting up falco is wrt ebpf/kernel driver though.

@leogr
Copy link
Member

leogr commented Mar 16, 2022

@ChaosInTheCRD ebpf and kmod have function parity, they are also almost equivalent in performance. Specifically for Minikube, upgrading the driver version directly in their repo would be preferable, so everything will work seamlessly (driver incompatibility does not happen on each Falco release, it's a rare occurrence). If I find some spare time, I'll try to investigate more and eventually open a PR in Minikube to fix that.

However, your solution is a valid alternative. Indeed, after looking at again I realized that /host/etc/VERSION must be mounted anyway since it is consumed by falco-driver-loader. For this reason, I believe we have to fix the helm chart. Would you like to open a PR in https://github.com/falcosecurity/charts ?

Btw, I hope we will fix both cases. Having Minikube works both with the kmod and ebpf is for sure the best option.

@leogr
Copy link
Member

leogr commented Mar 16, 2022

@juju4

This does not affect only minikube. I have it on Ubuntu 18.04

Your issue is slightly different. You should be able to install the new driver version by running:

falco-driver-loader --clean

Then

falco-driver-loader

Let me know if that works. Perhaps we will have to improve the documentation regarding that.

@juju4
Copy link
Contributor

juju4 commented Mar 17, 2022

no, it didn't fix the issue but made me understand the problem.

# apt-mark unhold falco
Canceled hold on falco.
# apt-get upgrade falco
# systemctl restart falco
# systemctl status falco
 falco.service - Falco: Container Native Runtime Security
   Loaded: loaded (/lib/systemd/system/falco.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system.control/falco.service.d
           └─50-CPUQuota.conf, 50-CPUShares.conf, 50-MemoryLimit.conf
   Active: activating (auto-restart) (Result: exit-code) since Wed 2022-03-16 22:31:08 UTC; 1s ago
  Process: 27908 ExecStart=/usr/bin/falco --pidfile=/var/run/falco.pid (code=exited, status=1/FAILURE)
 Main PID: 27908 (code=exited, status=1/FAILURE)
       IP: 0B in, 0B out

Mar 16 22:31:08 vps58732 systemd[1]: falco.service: Received 0B IP traffic, sent 0B IP traffic
root@vps58732:~# falco-driver-loader --clean
* Running falco-driver-loader for: falco version=0.31.1, driver version=b7eb0dd65226a8dc254d228c8d950d07bf3521d2
* Running falco-driver-loader with: driver=module, clean=yes
* Unloading falco module failed
* Removing falco failed
root@vps58732:~# falco-driver-loader 
* Running falco-driver-loader for: falco version=0.31.1, driver version=b7eb0dd65226a8dc254d228c8d950d07bf3521d2
* Running falco-driver-loader with: driver=module, compile=yes, download=yes
* Unloading falco module, if present 
* falco module still loaded, waited 5s (max wait 60s)
* falco module still loaded, waited 10s (max wait 60s)
^C

so kernel modules unloading/loading issue.
And found the issue because for this system, boot ends with disabling kernel modules. rc.local has after doing multiple modprobe, including falco

echo 1 > /proc/sys/kernel/modules_disabled

I did a reboot to confirm and new version is working after.
strangely, not seen this issue after past upgrades and this setting is not new.
sorry for the mix. yes, different issue.

@leogr
Copy link
Member

leogr commented Mar 17, 2022

Oh yeah, there are cases where falco-driver-loader can't unload the driver. Consequently, it cannot install the new driver version. We should improve our documentation about that.

This issue happened with Falco 0.31.1 because we introduced a mechanism to detect incompatible driver versions, and this specific Falco version requires a new driver version. Combining these two factors produced the issue, which does not usually happen.

@eelkonio
Copy link

eelkonio commented May 9, 2022

We also experience this issue: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting.

We build the driver through driverkit-builder, load the module successfully and start Falco. When using falco 0.31.1 it croaks with the above message. When using Falco 0.31.0 it works fine.

This happens on nodes where it replaces the old Falco instances (0.29.1) but also on clean nodes that did not have any modules loaded before this new version (0.31.1) started. Unloading modules cannot be the cause there. Version 0.31.0 does not show this problem.

@leogr
Copy link
Member

leogr commented May 13, 2022

We build the driver through driverkit-builder, load the module successfully and start Falco. When using falco 0.31.1 it croaks with the above message. When using Falco 0.31.0 it works fine.

Note that 0.31.1 has a newer driver (ie. kernel module) version than 0.31.0.

This happens on nodes where it replaces the old Falco instances (0.29.1) but also on clean nodes that did not have any modules loaded before this new version (0.31.1) started. Unloading modules cannot be the cause there. Version 0.31.0 does not show this problem.

AFAIK, the problem arises when an old driver is already installed and loaded. Basically, when one previously installed an older version and then installs the 0.31.1, the old driver remains up and running and Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. is returned.

The workaround is to uninstall the old driver manually before installing the 0.31.1.

PS

Falco 0.32.0 (not yet released) will come with a fix that forces the driver uninstallation when upgrading to a newer version.

@leogr
Copy link
Member

leogr commented May 26, 2022

/milestone 0.32.0

@poiana poiana added this to the 0.32.0 milestone May 26, 2022
@jasondellaluce
Copy link
Contributor

/remove-milestone 0.32.0
/milestone 0.33.0

@poiana poiana modified the milestones: 0.32.0, 0.33.0 Jun 6, 2022
@leogr
Copy link
Member

leogr commented Jun 7, 2022

/remove-milestone 0.32.0 /milestone 0.33.0

Hey @jasondellaluce

This issue should be actually fixed by 0.32.0

I know @alacuku had the same issue and is testing. Has 0.32.0 worked for you on minikube? Please let us know :)

@alacuku
Copy link
Member

alacuku commented Jun 7, 2022

Hi @leogr, here are my findings on Falco and Minikube.

The issue is still present even with Falco 0.32.0. That's because the latest version of minikube v1.25.2 ships with the 85c88952b018fdbce2464222c3303229f5bfcfad version of Falco kernel module. It works fine with Falco 0.31.0 but not with later versions of Falco.
Minikube developers have already bumped version of Falco to 0.31.1 (kubernetes/minikube@69fb8c2) but we need to wait for the next release of Minikube for that.

There are two options in order to use the latest version of Falco with Minikube:

  1. We start to offer prebuilt driver modules for the Minikube kernels;
  2. The end users build their own Minikube iso image with the latest Falco driver module.

@jrabbit
Copy link

jrabbit commented Jun 8, 2022

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

@leogr
Copy link
Member

leogr commented Jun 13, 2022

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit
Could you share more detail pls?

@leogr
Copy link
Member

leogr commented Jun 13, 2022

Meanwhile, I've opened a PR in minikube to bump Falco to 0.32.0 👉 kubernetes/minikube#14329

@jrabbit
Copy link

jrabbit commented Jun 13, 2022

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit Could you share more detail pls?

So we're on a full 1.19 K8s cluster so it might be a different bug but we're having trouble getting the script or manually to unload the falco kernel mod. Strangely it seemed like falco kept spawning binaries (which then access the module, blocking rmmod) when the pods were unscheduled? (Maybe this is a k8s quirk i'm not used to?). Let me validate if the nodes that succeeded were ones that were freshly spun up (and thus wouldn't have falco kernel mods to remove) and get back to you w/ that.

e: So the nodes that do have working pods aren't new, the others are in restart loops with errors in dkms and Mon Jun 13 15:54:57 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. at the end. May help to know the nodes are kops provisioned and have similar os package state. Also installed via helm w/ latest which may complicate things?

@leogr
Copy link
Member

leogr commented Jun 16, 2022

Seeing this still with 0.32.0 and Ubuntu 20.04.2 LTS on 5.4.0-1038-aws #40-Ubuntu kernels. It seems to have worked on a few of our nodes but not in any easily observed pattern.

Hey @jrabbit Could you share more detail pls?

So we're on a full 1.19 K8s cluster so it might be a different bug but we're having trouble getting the script or manually to unload the falco kernel mod. Strangely it seemed like falco kept spawning binaries (which then access the module, blocking rmmod) when the pods were unscheduled? (Maybe this is a k8s quirk i'm not used to?). Let me validate if the nodes that succeeded were ones that were freshly spun up (and thus wouldn't have falco kernel mods to remove) and get back to you w/ that.

Not sure what is going on. The module is installed on the host, so it is still present after pods get unscheduled. The bug was that 0.31.1 was not able to upgrade the module. The 0.32.0 fixed the issue.

e: So the nodes that do have working pods aren't new, the others are in restart loops with errors in dkms and Mon Jun 13 15:54:57 2022: Runtime error: Kernel module does not support PPM_IOCTL_GET_API_VERSION. Exiting. at the end. May help to know the nodes are kops provisioned and have similar os package state. Also installed via helm w/ latest which may complicate things?

For pods in the restart loop, I guess for some reason the driver is not found on our DBG and the falco-driver-loader script can't build it on the fly. Could you provide some logs?

Anyway, I think yours is a different problem. It would be better to open a dedicated issue,.

@chukmunnlee
Copy link

Got falco working according to @ChaosInTheCRD workaround but had to also delete SKIP_DRIVER_LOADER environment variable in falco container.

@leogr
Copy link
Member

leogr commented Sep 16, 2022

cc @alacuku
Should this be fixed now? Should we have to update the documentation somewhere?

@alacuku
Copy link
Member

alacuku commented Sep 16, 2022

cc @alacuku Should this be fixed now? Should we have to update the documentation somewhere?

Yes, it should be fixed now. I'm waiting for 0.33.0 falco release in order to update the docs and eventually close the issue.

@leogr
Copy link
Member

leogr commented Sep 19, 2022

cc @alacuku Should this be fixed now? Should we have to update the documentation somewhere?

Yes, it should be fixed now. I'm waiting for 0.33.0 falco release in order to update the docs and eventually close the issue.

@alacuku Thank you! 🙏

Please put Fixes https://github.com/falcosecurity/falco/issues/1941 in the falco-website's PR you will open, so we both track and automatically close this once you have done with the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants