Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash loop with invalid memory address or nil pointer dereference #1195

Closed
2 tasks done
bakito opened this issue Jul 7, 2023 · 9 comments
Closed
2 tasks done

Crash loop with invalid memory address or nil pointer dereference #1195

bakito opened this issue Jul 7, 2023 · 9 comments
Labels
kind/bug Something isn't working

Comments

@bakito
Copy link

bakito commented Jul 7, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

The tetragon pod is crashing frequently with an invalid memory address or nil pointer dereference

Tetragon Version

v0.9.0

Kernel Version

Linux bisdevsrv578 5.14.0-284.18.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 31 10:39:18 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

k3s v1.26.6+k3s1

Bugtool

No response

Relevant log output

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x18cb19e]

goroutine 75 [running]:
github.com/hashicorp/golang-lru/v2/simplelru.(*lruList[...]).remove(...)
	/go/src/github.com/cilium/tetragon/vendor/github.com/hashicorp/golang-lru/v2/simplelru/list.go:89
github.com/hashicorp/golang-lru/v2/simplelru.(*LRU[...]).removeElement(0xc003118cc0?, 0xc0046c34d0?)
	/go/src/github.com/cilium/tetragon/vendor/github.com/hashicorp/golang-lru/v2/simplelru/lru.go:159 +0x1e
github.com/hashicorp/golang-lru/v2/simplelru.(*LRU[...]).Remove(0xc003126480, {0xc0046c34d0, 0xc00313c360?})
	/go/src/github.com/cilium/tetragon/vendor/github.com/hashicorp/golang-lru/v2/simplelru/lru.go:95 +0x50
github.com/hashicorp/golang-lru/v2.(*Cache[...]).Remove(0xc00313c2a0, {0xc0046c34d0, 0x30})
	/go/src/github.com/cilium/tetragon/vendor/github.com/hashicorp/golang-lru/v2/lru.go:169 +0x5f
github.com/cilium/tetragon/pkg/process.(*Cache).remove(0xc000ab4fa0?, 0xc000ab4f54?)
	/go/src/github.com/cilium/tetragon/pkg/process/cache.go:172 +0x2e
github.com/cilium/tetragon/pkg/process.(*Cache).cacheGarbageCollector.func1()
	/go/src/github.com/cilium/tetragon/pkg/process/cache.go:76 +0x33b
created by github.com/cilium/tetragon/pkg/process.(*Cache).cacheGarbageCollector
	/go/src/github.com/cilium/tetragon/pkg/process/cache.go:42 +0xed

Anything else?

there are continuously logged error messages, not sure they are related.

level=warning msg="k8s.io/client-go/informers/factory.go:134: failed to list *v1.Pod: resourceVersion: Invalid value: \"\\x00\\x00\\x00\\x00\\x00\": strconv.ParseUint: parsing \"\\x00\\x00\\x00\\x00\\x00\": invalid syntax" subsys=klog
level=info msg="k8s.io/client-go/informers/factory.go:134: failed to list *v1.Pod: resourceVersion: Invalid value: \"\\x00\\x00\\x00\\x00\\x00\": strconv.ParseUint: parsing \"\\x00\\x00\\x00\\x00\\x00\": invalid syntax" subsys=klog
time="2023-07-07T06:29:37Z" level=error msg="Kubernetes API error" error="k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Pod: failed to list *v1.Pod: resourceVersion: Invalid value: \"\\x00\\x00\\x00\\x00\\x00\": strconv.ParseUint: parsing \"\\x00\\x00\\x00\\x00\\x00\": invalid syntax"

Code of Conduct

  • I agree to follow this project's Code of Conduct
@bakito bakito added the kind/bug Something isn't working label Jul 7, 2023
@mtardy
Copy link
Member

mtardy commented Jul 7, 2023

Hey 👋 thanks for opening that issue. You were just running Tetragon on that infrastructure with no Tracing Policy loaded? We just need to reproduce your setup to reproduce the issue?

@bakito
Copy link
Author

bakito commented Jul 7, 2023

Yes, I just installed tetragon via helm and default values.
To start there are currently no tracing policies defined.

@mtardy
Copy link
Member

mtardy commented Jul 7, 2023

Yes, I just installed tetragon via helm and default values.

To start there are currently no tracing policies defined.

I see it was k3s running. Did you spawn a local cluster? Is there a service we could use to easily reproduce your infra? Maybe you used something related to rancher?

@bakito
Copy link
Author

bakito commented Jul 7, 2023

It's an on-prem single node k3s cluster that is not publicly available.
The cluster is set up with

  • cilium 1.13.4 as network plugin (also default settings)
  • ingress-nginx 4.7.1

k3s config.yaml

disable:
- traefik
disable-network-policy: true
flannel-backend: none

k3s start command:
/usr/local/bin/k3s server --config /etc/rancher/k3s/config.yaml

@kkourt
Copy link
Contributor

kkourt commented Jul 7, 2023

Hi!

Thanks for the report.

I believe the issue is fixed in our current main version. The PR with the fix is #1090.

Could you verify that the fix works for you?
You can do that using and tetragon.image.override tetragonOperator.image.override helm values. You can either try the latest images (quay.io/cilium/tetragon-ci:latest and quay.io/cilium/tetragon-operator-ci:latest) or our 0.10 pre-release images that include the fix (quay.io/cilium/tetragon:v0.10.0-pre.2 and quay.io/cilium/tetragon-operator:v0.10.0-pre.2).

Thanks!

@bakito
Copy link
Author

bakito commented Jul 7, 2023

Thank you for the update.
I'll try with quay.io/cilium/tetragon:v0.10.0-pre.2 and let it run over the weekend.

@bakito
Copy link
Author

bakito commented Jul 10, 2023

@kkourt I did not have any pod restarts since upgrading to tetragon:v0.10.0-pre.2

It seems the issue is fixed.

Also the other warnings have disappeared.

@mtardy
Copy link
Member

mtardy commented Jul 10, 2023

Thanks for the feedback, I'll close this issue as resolved then for now, happy that it's working for you now!

@mtardy mtardy closed this as completed Jul 10, 2023
@kkourt
Copy link
Contributor

kkourt commented Jul 11, 2023

@bakito Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants