Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metrics] Fix overhead_program metrics for return probes #3074

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

tpapagian
Copy link
Member

Let's assume the following example:

$ cat pol.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "file-monitoring-mmap"
spec:
  kprobes:
  - call: "security_mmap_file" syscall: false return: true args:
    - index: 0 type: "file" # (struct file *) used for getting the path
    - index: 1 type: "uint32" # the prot flags PROT_READ(0x01), PROT_WRITE(0x02), PROT_EXEC(0x04)
    - index: 2 type: "nop" # the mmap flags (i.e. MAP_SHARED, ...) returnArg: index: 0 type: "int" returnArgAction: "Post" selectors:
    - matchArgs:
      - index: 0 operator: "Prefix" values:
        - "/etc/" # filenames to filter for 
$ sudo ./tetragon --btf /sys/kernel/btf/vmlinux  --bpf-lib ./bpf/objs/ --metrics-server ':2112' --tracing-policy ./pol.yaml  --disable-kprobe-multi

After that, if we try to get the metrics from another terminal we get the following errors:

$ curl http://localhost:2112/metrics
An error has occurred while serving metrics:

2 error(s) occurred:
* collected metric "tetragon_overhead_program_seconds_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values
* collected metric "tetragon_overhead_program_runs_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values

The issue here, is that we get two metrics withg the same labels. This happens because we need the retprobe as well (i.e. returnArg) and this have the same name as the kprobe.

To fix that we need to add another label for the section that we use to attach. This patch adds that and the example metrics from the previous example are:

tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_kprobe",sensor="generic_kprobe"} 0
tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_retkprobe",sensor="generic_kprobe"} 0

Which reports both the attach function (i.e. security_mmap_file) and the program that we use to attach (i.e. kprobe/generic_kprobe and kprobe/generic_retkprobe).

@tpapagian tpapagian added the release-note/bug This PR fixes an issue in a previous release of Tetragon. label Nov 4, 2024
@tpapagian tpapagian requested a review from a team as a code owner November 4, 2024 19:52
@tpapagian tpapagian requested a review from tixxdz November 4, 2024 19:52
Copy link
Member

@mtardy mtardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Let's assume the following example:

$ cat pol.yaml
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "file-monitoring-mmap"
spec:
  kprobes:
  - call: "security_mmap_file"
    syscall: false
    return: true
    args:
    - index: 0
      type: "file" # (struct file *) used for getting the path
    - index: 1
      type: "uint32" # the prot flags PROT_READ(0x01), PROT_WRITE(0x02), PROT_EXEC(0x04)
    - index: 2
      type: "nop" # the mmap flags (i.e. MAP_SHARED, ...)
    returnArg:
      index: 0
      type: "int"
    returnArgAction: "Post"
    selectors:
    - matchArgs:
      - index: 0
        operator: "Prefix"
        values:
        - "/etc/" # filenames to filter for
$ sudo ./tetragon --btf /sys/kernel/btf/vmlinux  --bpf-lib ./bpf/objs/ --metrics-server ':2112' --tracing-policy ./pol.yaml  --disable-kprobe-multi

After that, if we try to get the metrics from another terminal we get
the following errors:

$ curl http://localhost:2112/metrics
An error has occurred while serving metrics:

2 error(s) occurred:
* collected metric "tetragon_overhead_program_seconds_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values
* collected metric "tetragon_overhead_program_runs_total" { label:{name:"attach"  value:"security_mmap_file"}  label:{name:"policy"  value:"file-monitoring-mmap"}  label:{name:"policy_namespace"  value:""}  label:{name:"sensor"  value:"generic_kprobe"}  counter:{value:0}} was collected before with the same name and label values

The issue here, is that we get two metrics withg the same labels. This
happens because we need the retprobe as well (i.e. returnArg) and this
have the same name as the kprobe.

To fix that we need to add another label for the section that we use to
attach. This patch adds that and the example metrics from the previous
example are:

tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_kprobe",sensor="generic_kprobe"} 0
tetragon_overhead_program_seconds_total{attach="security_mmap_file",policy="file-monitoring-mmap",policy_namespace="",section="kprobe/generic_retkprobe",sensor="generic_kprobe"} 0

Which reports both the attach function (i.e. security_mmap_file) and the
program that we use to attach (i.e. kprobe/generic_kprobe and
kprobe/generic_retkprobe).

Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/fix_overhead_program_seconds_total_ret branch from d779b55 to fd2fba3 Compare November 4, 2024 19:58
Copy link

netlify bot commented Nov 4, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit fd2fba3
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/6729275461cfec000957f70a
😎 Deploy Preview https://deploy-preview-3074--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kkourt kkourt merged commit 0e789ea into main Nov 5, 2024
46 checks passed
@kkourt kkourt deleted the pr/apapag/fix_overhead_program_seconds_total_ret branch November 5, 2024 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/bug This PR fixes an issue in a previous release of Tetragon.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants