Skip to content

Add static to fmt string to get it put into rodata#1422

Merged
christos68k merged 1 commit into
open-telemetry:mainfrom
gnurizen:print-fmt-rodata
May 15, 2026
Merged

Add static to fmt string to get it put into rodata#1422
christos68k merged 1 commit into
open-telemetry:mainfrom
gnurizen:print-fmt-rodata

Conversation

@gnurizen
Copy link
Copy Markdown
Contributor

Clang will only put DEBUG_PRINT fmt strings into rodata if they
are "static const char[]", we were missing the static. So instead
clang generates code to construct the fmt string on the stack which
is wasteful of instructions and stack space.

Before:

;     DEBUG_PRINT("beam: no PerCPURecord found");
      10:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      12:       61 11 00 00 00 00 00 00 w1 = *(u32 *)(r1 + 0x0)
;     DEBUG_PRINT("beam: no PerCPURecord found");
      13:       15 01 9d 03 00 00 00 00 if r1 == 0x0 goto +0x39d <LBB0_107>
      14:       b7 01 00 00 75 6e 64 00 r1 = 0x646e75
;     DEBUG_PRINT("beam: no PerCPURecord found");
      15:       63 1a 78 ff 00 00 00 00 *(u32 *)(r10 - 0x88) = w1
      16:       18 01 00 00 65 63 6f 72 00 00 00 00 64 20 66 6f r1 = 0x6f662064726f6365 ll
      18:       7b 1a 70 ff 00 00 00 00 *(u64 *)(r10 - 0x90) = r1
      19:       18 01 00 00 20 50 65 72 00 00 00 00 43 50 55 52 r1 = 0x5255504372655020 ll
      21:       7b 1a 68 ff 00 00 00 00 *(u64 *)(r10 - 0x98) = r1
      22:       18 01 00 00 62 65 61 6d 00 00 00 00 3a 20 6e 6f r1 = 0x6f6e203a6d616562 ll
      24:       7b 1a 60 ff 00 00 00 00 *(u64 *)(r10 - 0xa0) = r1
      25:       bf a1 00 00 00 00 00 00 r1 = r10
      26:       07 01 00 00 60 ff ff ff r1 += -0xa0
      27:       b7 02 00 00 1c 00 00 00 r2 = 0x1c
      28:       85 00 00 00 06 00 00 00 call 0x6
      29:       05 00 8d 03 00 00 00 00 goto +0x38d <LBB0_107>

After:

;     DEBUG_PRINT("beam: no PerCPURecord found");
      10:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      12:       61 11 00 00 00 00 00 00 w1 = *(u32 *)(r1 + 0x0)
;     DEBUG_PRINT("beam: no PerCPURecord found");
      13:       15 01 11 02 00 00 00 00 if r1 == 0x0 goto +0x211 <LBB0_100>
      14:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      16:       b7 02 00 00 1c 00 00 00 r2 = 0x1c
      17:       85 00 00 00 06 00 00 00 call 0x6
      18:       05 00 0c 02 00 00 00 00 goto +0x20c <LBB0_100>

Clang will only put DEBUG_PRINT fmt strings into rodata if they
are "static const char[]", we were missing the static. So instead
clang generates code to construct the fmt string on the stack which
is wasteful of instructions and stack space.

Before:

```
;     DEBUG_PRINT("beam: no PerCPURecord found");
      10:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      12:       61 11 00 00 00 00 00 00 w1 = *(u32 *)(r1 + 0x0)
;     DEBUG_PRINT("beam: no PerCPURecord found");
      13:       15 01 9d 03 00 00 00 00 if r1 == 0x0 goto +0x39d <LBB0_107>
      14:       b7 01 00 00 75 6e 64 00 r1 = 0x646e75
;     DEBUG_PRINT("beam: no PerCPURecord found");
      15:       63 1a 78 ff 00 00 00 00 *(u32 *)(r10 - 0x88) = w1
      16:       18 01 00 00 65 63 6f 72 00 00 00 00 64 20 66 6f r1 = 0x6f662064726f6365 ll
      18:       7b 1a 70 ff 00 00 00 00 *(u64 *)(r10 - 0x90) = r1
      19:       18 01 00 00 20 50 65 72 00 00 00 00 43 50 55 52 r1 = 0x5255504372655020 ll
      21:       7b 1a 68 ff 00 00 00 00 *(u64 *)(r10 - 0x98) = r1
      22:       18 01 00 00 62 65 61 6d 00 00 00 00 3a 20 6e 6f r1 = 0x6f6e203a6d616562 ll
      24:       7b 1a 60 ff 00 00 00 00 *(u64 *)(r10 - 0xa0) = r1
      25:       bf a1 00 00 00 00 00 00 r1 = r10
      26:       07 01 00 00 60 ff ff ff r1 += -0xa0
      27:       b7 02 00 00 1c 00 00 00 r2 = 0x1c
      28:       85 00 00 00 06 00 00 00 call 0x6
      29:       05 00 8d 03 00 00 00 00 goto +0x38d <LBB0_107>
```

After:

```
;     DEBUG_PRINT("beam: no PerCPURecord found");
      10:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      12:       61 11 00 00 00 00 00 00 w1 = *(u32 *)(r1 + 0x0)
;     DEBUG_PRINT("beam: no PerCPURecord found");
      13:       15 01 11 02 00 00 00 00 if r1 == 0x0 goto +0x211 <LBB0_100>
      14:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
      16:       b7 02 00 00 1c 00 00 00 r2 = 0x1c
      17:       85 00 00 00 06 00 00 00 call 0x6
      18:       05 00 0c 02 00 00 00 00 goto +0x20c <LBB0_100>
```
@gnurizen
Copy link
Copy Markdown
Contributor Author

gnurizen commented May 15, 2026

Not sure if there was a reason why we were avoiding rodata for fmt strings, un-happy accident I suppose.

Instruction count: before vs. after static const char[] fmt strings

  ┌─────────────────────────────────────┬────────┬───────┬──────┬────────┐
  │               Program               │ Before │ After │  Δ   │   Δ%   │
  ├─────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ kprobe/unwind_beam                  │    987 │   576 │ −411 │ −41.6% │
  ├─────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ perf_event/unwind_beam              │    987 │   576 │ −411 │ −41.6% │
  ├─────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ kprobe/unwind_dotnet                   │   1642 │   931 │ −711 │ −43.3% │
  ├────────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ kprobe/unwind_dotnet10                 │   1538 │   904 │ −634 │ −41.2% │
  ├────────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ perf_event/unwind_dotnet               │   1642 │   931 │ −711 │ −43.3% │
  ├────────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ perf_event/unwind_dotnet10             │   1538 │   904 │ −634 │ −41.2% │
  ├────────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ kprobe/generic                         │   1142 │   725 │ −417 │ −36.5% │
  ├────────────────────────────────────────┼────────┼───────┼──────┼────────┤
  │ kprobe/go_labels                       │    665 │   371 │   −294 │ −44.2% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/go_labels                   │    665 │   371 │   −294 │ −44.2% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_hotspot                  │   2009 │  1225 │   −784 │ −39.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_hotspot              │   2009 │  1225 │   −784 │ −39.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ tracepoint/integration/sched_switch    │    140 │   128 │    −12 │  −8.6% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_stop                     │   1648 │   988 │   −660 │ −40.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_stop                 │   1648 │   988 │   −660 │ −40.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_native                   │   1525 │   979 │   −546 │ −35.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/native_tracer_entry         │   1235 │   781 │   −454 │ −36.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_native               │   1525 │   979 │   −546 │ −35.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/finish_task_switch              │   1182 │   752 │   −430 │ −36.4% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/dummy                           │      6 │     6 │      0 │   0.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ tracepoint/sched/sched_switch          │     53 │    36 │    −17 │ −32.1% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_perl                     │   1613 │   975 │   −638 │ −39.6% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_perl                 │   1613 │   975 │   −638 │ −39.6% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_php                      │    957 │   547 │   −410 │ −42.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_php                  │    957 │   547 │   −410 │ −42.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_python                   │   1525 │   961 │   −564 │ −37.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_python               │   1525 │   961 │   −564 │ −37.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_ruby                     │   1575 │  1043 │   −532 │ −33.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_ruby                 │   1575 │  1043 │   −532 │ −33.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ tracepoint/sched/sched_process_free/v2 │    278 │   170 │   −108 │ −38.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ tracepoint/sched/sched_process_free/v1 │    278 │   170 │   −108 │ −38.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ tracepoint/syscalls/sys_enter_bpf      │     52 │    39 │    −13 │ −25.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ raw_tracepoint/sys_enter               │     62 │    44 │    −18 │ −29.0% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ kprobe/unwind_v8                       │   1657 │  1030 │   −627 │ −37.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ perf_event/unwind_v8                   │   1657 │  1030 │   −627 │ −37.8% │
  ├────────────────────────────────────────┼────────┼───────┼────────┼────────┤
  │ Total                                  │  39110 │ 23911 │ −15199 │ −38.9% │
  └────────────────────────────────────────┴────────┴───────┴────────┴────────┘

@gnurizen gnurizen marked this pull request as ready for review May 15, 2026 11:18
@gnurizen gnurizen requested review from a team as code owners May 15, 2026 11:18
Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! 👍

@christos68k christos68k merged commit 7796e25 into open-telemetry:main May 15, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants