Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Fix bt command show wrong stacktrace on ramdump source #183

Closed
wants to merge 1 commit into from

Conversation

happybevis
Copy link

If we use crash to parse ramdump(Qcom phone device) rathen than vmcore. Start command should be like: crash vmlinux --kaslr=xxx DDRCS0_0.BIN@0x0000000080000000,... --machdep vabits_actual=39 Then We will see bt command show misleading backtrace information below:

crash> bt 16930
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
#1 [ffffffc034c43850] _kvm_nvhe$d.2314 at 6be732e004cf05a0
#2 [ffffffc034c438b0] _kvm_nvhe$d.2314 at 86c54c6004ceff80
#3 [ffffffc034c43950] _kvm_nvhe$d.2314 at 55d6f96003a7b120
#4 [ffffffc034c439f0] _kvm_nvhe$d.2314 at 9ccec46003a80a64
#5 [ffffffc034c43ac0] _kvm_nvhe$d.2314 at 8cf41e6003a945c4
#6 [ffffffc034c43b10] _kvm_nvhe$d.2314 at a8f181e00372c818
#7 [ffffffc034c43b40] _kvm_nvhe$d.2314 at 6dedde600372c0d0
#8 [ffffffc034c43b90] _kvm_nvhe$d.2314 at 62cc07e00373d0ac
#9 [ffffffc034c43c00] _kvm_nvhe$d.2314 at 72fb1de00373bedc
...
PC: 00000073f5294840 LR: 00000070d8f39ba4 SP: 00000070d4afd5d0
X29: 00000070d4afd600 X28: b4000071efcda7f0 X27: 00000070d4afe000
X26: 0000000000000000 X25: 00000070d9616000 X24: 0000000000000000
X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000000000
X20: b40000728fd27520 X19: b40000728fd27550 X18: 000000702daba000
X17: 00000073f5294820 X16: 00000070d940f9d8 X15: 00000000000000bf
X14: 0000000000000000 X13: 00000070d8ad2fac X12: b40000718fce5040
X11: 0000000000000000 X10: 0000000000000070 X9: 0000000000000001
X8: 0000000000000062 X7: 0000000000000020 X6: 0000000000000000
X5: 0000000000000000 X4: 0000000000000000 X3: 0000000000000000
X2: 0000000000000002 X1: 0000000000000080 X0: b40000728fd27550
ORIG_X0: b40000728fd27550 SYSCALLNO: ffffffff PSTATE: 40001000

By checking the raw data below, will see the lr (fp+8) data show the pointer which already been replaced by PAC prefix.

crash> bt -f
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
ffffffc034c437f0: ffffffc034c43850 6be732e004cf05a4
ffffffc034c43800: ffffffe006186108 a0ed07e004cf09c4
ffffffc034c43810: ffffff8a1a340000 ffffff8a8d343c00
ffffffc034c43820: ffffff89b3eada00 ffffff8b780db540
ffffffc034c43830: ffffff89b3eada00 0000000000000000
ffffffc034c43840: 0000000000000004 712b828118484a00
#1 [ffffffc034c43850] _kvm_nvhe$d.2314 at 6be732e004cf05a0
ffffffc034c43850: ffffffc034c438b0 86c54c6004ceff84
ffffffc034c43860: 000000708070f000 ffffffc034c43938
ffffffc034c43870: ffffff88bd822878 ffffff89b3eada00
...

So we check the CONFIG_ARM64_PTR_AUTH and CONFIG_ARM64_PTR_AUTH_KERNEL to double check if pac mechanism been enabled on this ramdump. Then we use vabits to figure it out.
Fix then show the right backtrace below:
crash> bt 16930
PID: 16930 TASK: ffffff89b3eada00 CPU: 2 COMMAND: "Firebase Backgr"
#0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
#1 [ffffffc034c43850] __schedule at ffffffe004cf05a0
#2 [ffffffc034c438b0] preempt_schedule_common at ffffffe004ceff80
#3 [ffffffc034c43950] unmap_page_range at ffffffe003a7b120
#4 [ffffffc034c439f0] unmap_vmas at ffffffe003a80a64
#5 [ffffffc034c43ac0] exit_mmap at ffffffe003a945c4
#6 [ffffffc034c43b10] __mmput at ffffffe00372c818
#7 [ffffffc034c43b40] mmput at ffffffe00372c0d0
#8 [ffffffc034c43b90] exit_mm at ffffffe00373d0ac
#9 [ffffffc034c43c00] do_exit at ffffffe00373bedc
PC: 00000073f5294840 LR: 00000070d8f39ba4 SP: 00000070d4afd5d0
X29: 00000070d4afd600 X28: b4000071efcda7f0 X27: 00000070d4afe000
X26: 0000000000000000 X25: 00000070d9616000 X24: 0000000000000000
X23: 0000000000000000 X22: 0000000000000000 X21: 0000000000000000
X20: b40000728fd27520 X19: b40000728fd27550 X18: 000000702daba000
X17: 00000073f5294820 X16: 00000070d940f9d8 X15: 00000000000000bf
X14: 0000000000000000 X13: 00000070d8ad2fac X12: b40000718fce5040
X11: 0000000000000000 X10: 0000000000000070 X9: 0000000000000001
X8: 0000000000000062 X7: 0000000000000020 X6: 0000000000000000
X5: 0000000000000000 X4: 0000000000000000 X3: 0000000000000000
X2: 0000000000000002 X1: 0000000000000080 X0: b40000728fd27550
ORIG_X0: b40000728fd27550 SYSCALLNO: ffffffff PSTATE: 40001000

Let's use GENMASK to replace the pac pointer to fix it. gki related commit url here:
https://lore.kernel.org/all/[email protected]/

If we use crash to parse ramdump(Qcom phone device) rathen than vmcore.
Start command should be like: crash vmlinux --kaslr=xxx DDRCS0_0.BIN@0x0000000080000000,... --machdep vabits_actual=39
Then We will see bt command show misleading backtrace information below:

crash> bt 16930
PID: 16930    TASK: ffffff89b3eada00  CPU: 2    COMMAND: "Firebase Backgr"
 #0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
 crash-utility#1 [ffffffc034c43850] __kvm_nvhe_$d.2314 at 6be732e004cf05a0
 crash-utility#2 [ffffffc034c438b0] __kvm_nvhe_$d.2314 at 86c54c6004ceff80
 crash-utility#3 [ffffffc034c43950] __kvm_nvhe_$d.2314 at 55d6f96003a7b120
 crash-utility#4 [ffffffc034c439f0] __kvm_nvhe_$d.2314 at 9ccec46003a80a64
 crash-utility#5 [ffffffc034c43ac0] __kvm_nvhe_$d.2314 at 8cf41e6003a945c4
 crash-utility#6 [ffffffc034c43b10] __kvm_nvhe_$d.2314 at a8f181e00372c818
 crash-utility#7 [ffffffc034c43b40] __kvm_nvhe_$d.2314 at 6dedde600372c0d0
 crash-utility#8 [ffffffc034c43b90] __kvm_nvhe_$d.2314 at 62cc07e00373d0ac
 crash-utility#9 [ffffffc034c43c00] __kvm_nvhe_$d.2314 at 72fb1de00373bedc
...
     PC: 00000073f5294840   LR: 00000070d8f39ba4   SP: 00000070d4afd5d0
    X29: 00000070d4afd600  X28: b4000071efcda7f0  X27: 00000070d4afe000
    X26: 0000000000000000  X25: 00000070d9616000  X24: 0000000000000000
    X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000000000
    X20: b40000728fd27520  X19: b40000728fd27550  X18: 000000702daba000
    X17: 00000073f5294820  X16: 00000070d940f9d8  X15: 00000000000000bf
    X14: 0000000000000000  X13: 00000070d8ad2fac  X12: b40000718fce5040
    X11: 0000000000000000  X10: 0000000000000070   X9: 0000000000000001
     X8: 0000000000000062   X7: 0000000000000020   X6: 0000000000000000
     X5: 0000000000000000   X4: 0000000000000000   X3: 0000000000000000
     X2: 0000000000000002   X1: 0000000000000080   X0: b40000728fd27550
    ORIG_X0: b40000728fd27550  SYSCALLNO: ffffffff  PSTATE: 40001000

By checking the raw data below, will see the lr (fp+8) data show the pointer which already been replaced by PAC prefix.

crash> bt -f
PID: 16930    TASK: ffffff89b3eada00  CPU: 2    COMMAND: "Firebase Backgr"
 #0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
    ffffffc034c437f0: ffffffc034c43850 6be732e004cf05a4
    ffffffc034c43800: ffffffe006186108 a0ed07e004cf09c4
    ffffffc034c43810: ffffff8a1a340000 ffffff8a8d343c00
    ffffffc034c43820: ffffff89b3eada00 ffffff8b780db540
    ffffffc034c43830: ffffff89b3eada00 0000000000000000
    ffffffc034c43840: 0000000000000004 712b828118484a00
 crash-utility#1 [ffffffc034c43850] __kvm_nvhe_$d.2314 at 6be732e004cf05a0
    ffffffc034c43850: ffffffc034c438b0 86c54c6004ceff84
    ffffffc034c43860: 000000708070f000 ffffffc034c43938
    ffffffc034c43870: ffffff88bd822878 ffffff89b3eada00
...

So we check the CONFIG_ARM64_PTR_AUTH and CONFIG_ARM64_PTR_AUTH_KERNEL to double check if pac mechanism been enabled on this ramdump.
Then we use vabits to figure it out.
Fix then show the right backtrace below:
crash> bt 16930
PID: 16930    TASK: ffffff89b3eada00  CPU: 2    COMMAND: "Firebase Backgr"
 #0 [ffffffc034c437f0] __switch_to at ffffffe0036832d4
 crash-utility#1 [ffffffc034c43850] __schedule at ffffffe004cf05a0
 crash-utility#2 [ffffffc034c438b0] preempt_schedule_common at ffffffe004ceff80
 crash-utility#3 [ffffffc034c43950] unmap_page_range at ffffffe003a7b120
 crash-utility#4 [ffffffc034c439f0] unmap_vmas at ffffffe003a80a64
 crash-utility#5 [ffffffc034c43ac0] exit_mmap at ffffffe003a945c4
 crash-utility#6 [ffffffc034c43b10] __mmput at ffffffe00372c818
 crash-utility#7 [ffffffc034c43b40] mmput at ffffffe00372c0d0
 crash-utility#8 [ffffffc034c43b90] exit_mm at ffffffe00373d0ac
 crash-utility#9 [ffffffc034c43c00] do_exit at ffffffe00373bedc
     PC: 00000073f5294840   LR: 00000070d8f39ba4   SP: 00000070d4afd5d0
    X29: 00000070d4afd600  X28: b4000071efcda7f0  X27: 00000070d4afe000
    X26: 0000000000000000  X25: 00000070d9616000  X24: 0000000000000000
    X23: 0000000000000000  X22: 0000000000000000  X21: 0000000000000000
    X20: b40000728fd27520  X19: b40000728fd27550  X18: 000000702daba000
    X17: 00000073f5294820  X16: 00000070d940f9d8  X15: 00000000000000bf
    X14: 0000000000000000  X13: 00000070d8ad2fac  X12: b40000718fce5040
    X11: 0000000000000000  X10: 0000000000000070   X9: 0000000000000001
     X8: 0000000000000062   X7: 0000000000000020   X6: 0000000000000000
     X5: 0000000000000000   X4: 0000000000000000   X3: 0000000000000000
     X2: 0000000000000002   X1: 0000000000000080   X0: b40000728fd27550
    ORIG_X0: b40000728fd27550  SYSCALLNO: ffffffff  PSTATE: 40001000

Let's use GENMASK to replace the pac pointer to fix it.
gki related commit url here:
https://lore.kernel.org/all/[email protected]/

Signed-off-by: bevis_chen <[email protected]>
@happybevis happybevis changed the title [PATCH] arm64: Fix bt command show wrong stacktrace on ramdump source arm64: Fix bt command show wrong stacktrace on ramdump source Jul 3, 2024
@happybevis happybevis closed this Jul 3, 2024
@happybevis happybevis deleted the patch-tree branch July 3, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant