Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.4.x 运行报错 #159

Closed
xj1988 opened this issue Dec 11, 2024 · 9 comments · Fixed by #183
Closed

5.4.x 运行报错 #159

xj1988 opened this issue Dec 11, 2024 · 9 comments · Fixed by #183
Labels
bug Something isn't working

Comments

@xj1988
Copy link

xj1988 commented Dec 11, 2024

[bs@VM-1-46-tencentos /opt/kyanos]$ ./kyanos watch http

⣻ 🦜 Kyanos Loading...
Kyanos exited.

..............................
..............................
..............................
🍩 Kyanos starting... 1ns
🍎 Loaded eBPF maps & programs. 184.972835ms

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7a222e]

goroutine 1 [running]:
kyanos/bpf.(*AgentObjects).Close(0x7f5a877cf108?)
/home/runner/work/kyanos/kyanos/bpf/agent_x86_bpfel.go:327 +0xe
kyanos/bpf/loader.(*BPF).Close(0xc00004e0d0)
/home/runner/work/kyanos/kyanos/bpf/loader/loader.go:33 +0x25
kyanos/agent.SetupAgent.func4()
/home/runner/work/kyanos/kyanos/agent/agent.go:116 +0x17
kyanos/agent.SetupAgent({0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, {0x17f2880, ...}, ...})
/home/runner/work/kyanos/kyanos/agent/agent.go:149 +0x8c3
kyanos/cmd.startAgent()
/home/runner/work/kyanos/kyanos/cmd/common.go:99 +0x5dd
kyanos/cmd.init.func1(0xc000348f08, {0x14f5323?, 0x4?, 0x14f51c3?})
/home/runner/work/kyanos/kyanos/cmd/http.go:32 +0x26f
github.com/spf13/cobra.(*Command).execute(0xc000348f08, {0x24cc800, 0x0, 0x0})
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:989 +0xa91
github.com/spf13/cobra.(*Command).ExecuteC(0x2185e60)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
/home/runner/go/pkg/mod/github.com/spf13/[email protected]/command.go:1041
kyanos/cmd.Execute()
/home/runner/work/kyanos/kyanos/cmd/root.go:114 +0x1a
main.main()
/home/runner/work/kyanos/kyanos/main.go:6 +0xf

执行命令后没有捕获到任何数据,退出后报错了,打印的日志如上。
内核版本如下:
[bs@VM-1-46-tencentos /opt/kyanos]$ uname -a
Linux VM-1-46-tencentos 5.4.119-19.0009.44 #1 SMP Tue May 7 20:09:33 CST 2024 x86_64 x86_64 x86_64 GNU/Linux

Copy link

dosubot bot commented Dec 11, 2024

Hey @xj1988! I'm here to help you with any bugs, questions, or contributions you may have. Let's squash those bugs together! 🤖

The error you're encountering, a "nil pointer dereference," typically occurs when the code attempts to access a memory location that hasn't been initialized. In the context of the Kyanos application, this issue is happening in the Close method of the AgentObjects struct [1].

Here are a few steps you can take to troubleshoot and potentially resolve this issue:

  1. Initialization Check: Ensure that all components of the AgentObjects struct are properly initialized before the Close method is called. This includes verifying that the AgentPrograms and AgentMaps are not nil.

  2. Error Handling: Review the code paths leading to the Close method to ensure that any errors during initialization or loading of eBPF programs and maps are properly handled and logged. This can help identify if the initialization failed silently.

  3. Kernel Compatibility: Since you're running on a 5.4.x kernel, ensure that the eBPF programs and maps are compatible with this kernel version. The SetupAgent function checks for kernel compatibility, so verify that this check is passing [2].

  4. Debugging: Add additional logging before the Close method is called to check the state of the AgentObjects struct. This can help identify if any fields are unexpectedly nil.

By following these steps, you should be able to identify the root cause of the nil pointer dereference and address it accordingly.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Dec 11, 2024
@hengyoush
Copy link
Owner

hengyoush commented Dec 11, 2024

Thank you for your feedback!

There are two problems here:

  1. when you execute kyanos watch http, no data has been captured
  2. when you exit, kyanos panic.

the panic problem seems a memory issue, i will take a look, but the first problem is strange, I tested kyanos with your kernel version(tencentos, 5.4.119-19.0009.44) , but it works fine.
demo

When you execute kyanos watch http, don't exit, does it display the following table:
image
or any other error log?

if the table displayed, then open another shell, execute curl http://www.baidu.com, the table should display a record, otherwise it is a bug.

@hengyoush hengyoush added this to v1.5.0 Dec 11, 2024
@hengyoush hengyoush moved this to Todo in v1.5.0 Dec 11, 2024
@xj1988
Copy link
Author

xj1988 commented Dec 11, 2024

1、执行 kyanos watch http 后一直卡在这里。

[bs@VM-1-46-tencentos /opt/kyanos]$ ./kyanos watch http

⣻ 🦜 Kyanos Loading...
⣟ 🦜 Kyanos Loading...

..............................
..............................
..............................
🍩 Kyanos starting... 1ns
🍎 Loaded eBPF maps & programs. 160.340442ms

Press ctrl+c to exit

2、执行 ctrl+c 后出现表格

⣷ Events received: 0/100

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ id StartTime Connection Proto TotalTime ReqSize RespSize Net/Internal ReadSocketTime │
│─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────│
│ │
│ │
│ │
│ │
│ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
↑/k up • ↓/j down

3、手动执行 curl http://www.baidu.com ,表格中并没有显示任何信息。

4、再次执行 ctrl+c 后出现上次评论的异常信息。

@hengyoush
Copy link
Owner

hengyoush commented Dec 11, 2024

Can you execute this command: ./kyanos watch http --debug --debug-output,then give me terminal output and the log file: /tmp/kyanos_xxx.log?
Thanks!

@xj1988
Copy link
Author

xj1988 commented Dec 11, 2024

执行 ./kyanos watch http --debug --debug-output 后控制台输出:
image

查看日志文件内容:
image

@hengyoush
Copy link
Owner

hengyoush commented Dec 11, 2024

Oh, you should add sudo before your command like sudo ./kyanos watch http ! Because kyanos need root privilege. @xj1988

@hengyoush hengyoush moved this from Todo to In Progress in v1.5.0 Dec 11, 2024
@hengyoush hengyoush moved this from In Progress to Done in v1.5.0 Dec 11, 2024
@hengyoush hengyoush closed this as completed by moving to Done in v1.5.0 Dec 11, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in v1.5.0 Dec 11, 2024
@hengyoush hengyoush moved this from Done to In Progress in v1.5.0 Dec 11, 2024
@hengyoush hengyoush moved this from Done to In Progress in v1.5.0 Dec 11, 2024
@xj1988
Copy link
Author

xj1988 commented Dec 12, 2024

谢谢,可以了

@hengyoush hengyoush moved this from In Progress to Done in v1.5.0 Dec 12, 2024
@xj1988
Copy link
Author

xj1988 commented Dec 12, 2024

有两个其他的问题(ps:ip进行了脱敏)

1、
⢿ Events received: 100/100

┌──────────────────────────────────────────────────────────────────────────────
│ id StartTime Connection Proto TotalTime ReqSize RespSize Net/Internal ReadSocketTi
│──────────────────────────────────────────────────────────────────────────────
│ 1 15:03:40.607 172.9.2.46:50802 => 151.241.97.138:80 HTTP 403.90 717 7 403.55 529.18
│ 2 15:03:40.979 172.9.2.46:50716 => 151.241.97.138:80 HTTP 995.19 757 7 - -
│ 3 15:03:41.081 172.9.2.46:51378 => 151.241.97.138:80 HTTP 837.93 781 1154 837.87 879.65
│ 4 15:03:41.687 172.9.2.46:50746 => 151.241.97.138:80 HTTP 365.10 753 7 365.06 438.59
│ 5 15:03:40.404 172.9.2.46:51078 => 151.241.97.138:80 HTTP 2091.59 753 1434 2091.50 593.90
│ 6 15:03:42.389 172.9.2.46:50746 => 151.241.97.138:80 HTTP 815.60 701 2947 815.30 956.53
└──────────────────────────────────────────────────────────────────────────────
↑/k up • ↓/j down

Net/Internal ReadSocketTime 为横线的请求我查看应用日志,是进行了网络请求的,所以这里横线代表什么含义?

2、
╭───────────────────────────────╮
│ Record Detail: 1 (Total: 100)
╰───────────────────────────────╯
+-------------------+ +---------------------+
| Process(pid:6493) | | eth0(used:1175.35ms)|
| --->| |
+-------------------+ +---------------------+
|
|
|
v
+-------------------------+ +------------------------+ +---------------------+
| Process(used:1112.93ms) | | Socket(used:1112.55ms) | | eth0(used:-476.11ms)|
| |<--- | |<---| |
+-------------------------+ +------------------------+ +---------------------+

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

[conn] [pid=6493][local addr]=172.9.2.46:50936 [remote addr]=151.241.97.138:80 [side]=client [ssl]=false
[total duration] = 699.685(ms)(start=2024-12-12 15:04:06.407, end=2024-12-12 15:04:07.107)

这里响应到eth0网卡的耗时为什么会是负数?

@hengyoush
Copy link
Owner

hengyoush commented Dec 12, 2024

  1. "-" means some trace data is missing, so the time spent here can't be calculated. This usually happens with short-lived connections, connection closed before kyanos capture trace data.
    ps. net/internal & ReadSocketTime meaning:
Net/Internal 如果这是本地发起的请求,含义为网络耗时; 如果是作为服务端接收外部请求,含义为本地进程处理的内部耗时  
ReadSocketTime 如果这是本地发起的请求,含义为从内核Socket缓冲区读取响应的耗时; 如果是作为服务端接收外部请求,含义从内核Socket缓冲区读取请求的耗时。
  1. This is a bug. Please download the v1.4.1 release(https://github.com/hengyoush/kyanos/releases/tag/v1.4.1), which hopefully fixes this problem.
    @xj1988

@hengyoush hengyoush reopened this Dec 12, 2024
@hengyoush hengyoush closed this as completed by moving to Done in v1.5.0 Dec 12, 2024
hengyoush added a commit that referenced this issue Dec 16, 2024
hengyoush added a commit that referenced this issue Dec 16, 2024
fix: fix #159 stuck when load bpf failed
hengyoush added a commit that referenced this issue Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants