diff --git a/README.md b/README.md index 96ad5b98ad83..6fc2abb1e2a2 100644 --- a/README.md +++ b/README.md @@ -136,6 +136,7 @@ pair of .c and .py files, and some are directories of files. - tools/[mysqld_qslower](tools/mysqld_qslower.py): Trace MySQL server queries slower than a threshold. [Examples](tools/mysqld_qslower_example.txt). - tools/[ppchcalls](tools/ppchcalls.py): Summarize ppc hcall counts and latencies. [Examples](tools/ppchcalls_example.txt). - tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt). +- tools/[softirqslower](tools/softirqslower.py): Trace slow soft IRQ (interrupt). [Examples](tools/softirqslower_example.txt). - tools/[syscount](tools/syscount.py): Summarize syscall counts and latencies. [Examples](tools/syscount_example.txt). ##### CPU and Scheduler Tools diff --git a/man/man8/softirqslower.8 b/man/man8/softirqslower.8 new file mode 100644 index 000000000000..77c6f84f2399 --- /dev/null +++ b/man/man8/softirqslower.8 @@ -0,0 +1,133 @@ +.TH SOFTIRQSLOWER 8 "2025-07-08" "BCC Tools" "Linux Performance Analysis" +.SH NAME +softirqslower \- Trace softirq handlers with latency exceeding a threshold +.SH SYNOPSIS +.B softirqslower.py +.RI [min_us] +.br +.B softirqslower.py +.RB [ \-c +.IR CPU ] +.RI [min_us] +.SH DESCRIPTION +The +.B softirqslower +tool traces softirq events that exceed a specified latency threshold. This tool helps diagnose +interrupt handling bottlenecks by measuring two critical latency dimensions: + +.IP \(bu 2 +\fBirq(hard) to softirq\fR: Time delay between hardware interrupt completion and softirq execution start +.IP \(bu 2 +\fBsoftirq runtime\fR: Actual execution duration of softirq handlers +.PP +Developed as part of the BCC (BPF Compiler Collection) tools, it's particularly useful for network +stack optimization, scheduler tuning, and real-time systems analysis. + +.SH OPTIONS +.TP +.B min_us +Minimum latency threshold to trace (in microseconds). Default: 10,000 μs (10 ms) +.TP +.BR \-c ", " \-\-cpu " " \fICPU +Trace only events on the specified CPU core + +.SH OUTPUT FORMAT +The tool outputs events in the following table format: +.PP +TIME STAGE SOFTIRQ LAT(us) CPU COMM +.PP +With fields defined as: +.TP +.B TIME +Event timestamp (HH:MM:SS format) +.TP +.B STAGE +Latency measurement type: +.RS +.TP 12 +.B irq(hard) to softirq +Time from hardware interrupt completion to softirq dispatch +.TP +.B softirq runtime +SoftIRQ handler execution duration +.RE +.TP +.B SOFTIRQ +SoftIRQ category (case-sensitive). Common values: +.RS +.TP 12 +.B NET_RX +Network reception processing +.TP +.B NET_TX +Network transmission processing +.TP +.B TIMER +Timer callbacks +.TP +.B SCHED +Scheduler operations +.TP +.B RCU +Read-Copy-Update synchronization +.TP +.B TASKLET +Deferred task execution +.TP +.B HRTIMER +High-resolution timers +.TP +.B BLOCK +Block device operations +.RE +.TP +.B LAT(us) +Measured latency in microseconds +.TP +.B CPU +CPU core where softirq was handled (0-based numbering) +.TP +.B COMM +Process context handling the softirq: +.RS +.TP 12 +.B swapper/N +Idle thread for CPU N +.TP +.B ksoftirqd/N +Softirq daemon for CPU N +.TP +.B +User-space process +.RE + +.SH EXAMPLES +.TP +Trace softirqs exceeding 10μs latency: +.B softirqslower 10 +.TP +Monitor only CPU core 1: +.B softirqslower \-c 1 +.TP +Trace network-related softirq delays (>50μs): +.B softirqslower 50 | grep -E 'net_rx|net_tx' +.TP +Capture RCU delays longer than 100μs to file: +.B softirqslower 100 | grep "rcu" > rcu_latency.log + +.SH SIGNALS +.B Ctrl+C +Stop tracing and exit gracefully + +.SH AUTHOR +Chenyue Zhou + +.SH SEE ALSO +.BR runqslower (8), +.BR hardirqs (8), +.BR softirqs (8), +.BR trace (8), +.BR funclatency (8), + +.SH REPORTING BUGS +BCC Tools Issue Tracker: https://github.com/iovisor/bcc/issues diff --git a/snap/snapcraft.yaml b/snap/snapcraft.yaml index d6c87e3e2dc1..0b88ee86285c 100644 --- a/snap/snapcraft.yaml +++ b/snap/snapcraft.yaml @@ -274,6 +274,8 @@ apps: command: bcc-wrapper sofdsnoop softirqs: command: bcc-wrapper softirqs + softirqslower: + command: bcc-wrapper softirqslower solisten: command: bcc-wrapper solisten sslsniff: diff --git a/tests/python/test_tools_smoke.py b/tests/python/test_tools_smoke.py index e0212ebff8cf..6fab54b78d0e 100755 --- a/tests/python/test_tools_smoke.py +++ b/tests/python/test_tools_smoke.py @@ -317,6 +317,10 @@ def test_softirqs(self): self.run_with_duration("softirqs.py 1 1") pass + @skipUnless(kernel_version_ge(4,7), "requires kernel >= 4.7") + def test_softirqslower(self): + self.run_with_int("softirqslower.py") + @skipUnless(kernel_version_ge(4,4), "requires kernel >= 4.4") def test_solisten(self): self.run_with_int("solisten.py") diff --git a/tools/softirqslower.py b/tools/softirqslower.py new file mode 100644 index 000000000000..986322825f75 --- /dev/null +++ b/tools/softirqslower.py @@ -0,0 +1,203 @@ +#!/usr/bin/env python +# @lint-avoid-python-3-compatibility-imports +# +# softirqlat Trace slow soft IRQ (interrupt). +# For Linux, uses BCC, eBPF. +# +# USAGE: softirqslower [-h] [-c CPU] [min_us] +# +# Copyright (c) 2025 Chenyue Zhou. +# Licensed under the Apache License, Version 2.0 (the "License") +# +# 08-Jul-2025 Chenyue Zhou Created this. + +from __future__ import print_function +from bcc import BPF +from time import sleep, strftime +import argparse +import sys + +# arguments +examples = """examples: + ./softirqslower # trace softirq latency higher than 10000 us (default) + ./softirqslower 100000 # trace softirq latency higher than 100000 us + ./softirqslower -c 1 # trace softirq latency on CPU 1 only +""" + +parser = argparse.ArgumentParser( + description="Trace slow soft IRQ (interrupt).", + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=examples) +parser.add_argument("min_us", nargs="?", default="10000", + help="minimum softirq latency to trace, in us (default 10000)") +parser.add_argument("-c", "--cpu", type=int, help="trace this CPU only") +parser.add_argument("--ebpf", action="store_true", help=argparse.SUPPRESS) +args = parser.parse_args() + +bpf_text = """ +#include +#include +#include + +enum { + SOFTIRQ_RAISE, + SOFTIRQ_ENTRY, + SOFTIRQ_EXIT, + SOFTIRQ_MAX_STAGE, +}; + +struct counter { + u64 ts; + u8 not_first; +}; + +struct data { + u32 reserved; + u32 stage; + u32 vec; + u32 cpu; + u64 delta_ns; + char task[TASK_COMM_LEN]; +}; + +BPF_PERCPU_ARRAY(raise, struct counter, NR_SOFTIRQS); +BPF_PERCPU_ARRAY(entry, struct counter, NR_SOFTIRQS); +BPF_PERF_OUTPUT(softirq_events); + +static __always_inline void event_collect(void *ctx, u32 stage, u32 vec, + u32 cpu, u64 delta_ns) +{ + struct data dt = { + .stage = stage, + .vec = vec, + .cpu = cpu, + .delta_ns = delta_ns, + }; + bpf_get_current_comm(&dt.task, sizeof(dt.task)); + softirq_events.perf_submit(ctx, &dt, sizeof(dt)); +} + +RAW_TRACEPOINT_PROBE(softirq_raise) +{ + u32 cpu = bpf_get_smp_processor_id(); + u32 vec = (u32)(ctx->args[0]); + + FILTER_CPU + + struct counter *data = raise.lookup(&vec); + if (!data) + return 0; + + if (data->ts) { + // TODO record event + return 0; + } + + data->not_first = 1; + data->ts = bpf_ktime_get_ns(); + + return 0; +} + +RAW_TRACEPOINT_PROBE(softirq_entry) +{ + u32 cpu = bpf_get_smp_processor_id(); + u32 vec = (u32)(ctx->args[0]); + + FILTER_CPU + + struct counter *data = raise.lookup(&vec); + if (!data) + return 0; + + if ((data->not_first) && !(data->ts)) { + // TODO record miss event + return 0; + } + + u64 cur_ts = bpf_ktime_get_ns(); + u64 delta_ns = cur_ts - data->ts; + data->ts = 0; + + if (DELTA_FILTER) { + event_collect(ctx, SOFTIRQ_ENTRY, vec, cpu, delta_ns); + } + data = entry.lookup(&vec); + if (!data) + return 0; + + data->not_first = 1; + data->ts = cur_ts; + + return 0; +} + +RAW_TRACEPOINT_PROBE(softirq_exit) +{ + u32 cpu = bpf_get_smp_processor_id(); + u32 vec = (u32)(ctx->args[0]); + + FILTER_CPU + + struct counter *data = entry.lookup(&vec); + if (!data) + return 0; + + if ((data->not_first) && !(data->ts)) { + // TODO record miss event + return 0; + } + + u64 cur_ts = bpf_ktime_get_ns(); + u64 delta_ns = cur_ts - data->ts; + data->ts = 0; + + if (DELTA_FILTER) { + event_collect(ctx, SOFTIRQ_EXIT, vec, cpu, delta_ns); + } + + return 0; +} +""" + +def vec_to_name(vec): + return ["hi", "timer", "net_tx", "net_rx", "block", "irq_poll", "tasklet", + "sched", "hrtimer", "rcu"][vec] + +def stage_to_name(stage): + return ["raise softirq", "irq(hard) to softirq", "softirq runtime"][stage] + +def print_event(cpu, data, size): + event = b["softirq_events"].event(data) + print("%-8s %-20s %-8s %-14d %-6d %-6s" % (strftime("%H:%M:%S"), + stage_to_name(event.stage), vec_to_name(event.vec), + event.delta_ns / 1000, event.cpu, event.task.decode("utf-8", + "replace"))) + +if __name__ == "__main__": + if args.cpu is not None: + bpf_text = bpf_text.replace("FILTER_CPU", + 'if (cpu != %d) { return 0; }' % int(args.cpu)) + else: + bpf_text = bpf_text.replace("FILTER_CPU", "") + + bpf_text = bpf_text.replace("DELTA_FILTER", "delta_ns >= %d" % \ + (int(args.min_us) * 1000)) + + if args.ebpf: + print(bpf_text) + exit() + + b = BPF(text=bpf_text) + b["softirq_events"].open_perf_buffer(print_event, page_cnt=64) + + print("Tracing softirq latency higher than %d us... Hit Ctrl-C to end." % \ + int(args.min_us)) + + print("%-8s %-20s %-8s %-14s %-6s %-6s" % ("TIME", "STAGE", "SOFTIRQ", + "LAT(us)", "CPU", "COMM")) + while (1): + try: + b.perf_buffer_poll() + except KeyboardInterrupt: + exit() diff --git a/tools/softirqslower_example.txt b/tools/softirqslower_example.txt new file mode 100644 index 000000000000..2006540b06de --- /dev/null +++ b/tools/softirqslower_example.txt @@ -0,0 +1,58 @@ +Demonstrations of softirqslower, the Linux eBPF/bcc version. + +The softirqslower tool traces two critical latency dimensions of softirq handling: + +1. Softirq execution duration - Measures the actual processing time consumed by +softirq handlers (from entry to completion) +2. Wakeup-to-execution latency - Tracks the delay between softirq wakeup +triggers (via raise_softirq()) and handler execution start + +For example: + +# softirqslower + +Tracing softirq latency higher than 10000 us... Hit Ctrl-C to end. + +TIME STAGE SOFTIRQ LAT(us) CPU COMM +14:41:43 irq(hard) to softirq net_rx 50223 0 cat +14:41:43 irq(hard) to softirq tasklet 50374 0 cat +14:41:43 irq(hard) to softirq timer 15935 0 cat +14:41:43 irq(hard) to softirq net_rx 15901 0 cat +14:41:43 irq(hard) to softirq net_rx 10421 0 cat +14:41:43 irq(hard) to softirq net_rx 21794 0 cat +14:41:43 irq(hard) to softirq tasklet 21984 0 cat +14:42:11 irq(hard) to softirq net_rx 95140 0 cat +14:42:11 irq(hard) to softirq tasklet 95280 0 cat +14:42:11 irq(hard) to softirq timer 50834 0 cat +[...] + +The trace results reveal softirq events with measured latencies beyond the +10,000μs threshold. + +The 'softirq runtime' metric indicates the actual execution duration of softirq +handlers. This reveals which softirq types (NET_RX, TASKLET, etc.) are consuming +excessive processing time. + +The 'hardirq-to-softirq latency' metric measures the delay between when a +softirq is triggered (from hardware interrupt context) until its execution +begins. Elevated values here suggest either: Preemption by higher-priority +kernel tasks/threads or others + +USAGE message: + +# softirqslower -h +usage: softirqslower.py [-h] [-c CPU] [min_us] + +Trace slow soft IRQ (interrupt). + +positional arguments: + min_us minimum softirq latency to trace, in us (default 10000) + +optional arguments: + -h, --help show this help message and exit + -c CPU, --cpu CPU trace this CPU only + +examples: + ./softirqslower # trace softirq latency higher than 10000 us (default) + ./softirqslower 100000 # trace softirq latency higher than 100000 us + ./softirqslower -c 1 # trace softirq latency on CPU 1 only