Skip to content

Commit a7eec49

Browse files
committed
pping: Add support for ICMP echo messages
Allow pping to passivly monitor RTT for ICMP echo request/reply flows. Use the echo identifier as ports, and echo sequence as packet identifier. Additionally, add protocol to standard output format in order to be able to distinguish between TCP and ICMP flows. Potential concerns with this commit: - Is starting to approach verifier limit again (850k ins) - ppviz format does not include protocol, so cannot distinguish between TCP and ICMP flows - Cannot detect when ICMP flows close, so they will take up a flow-state entry until cleaned out by userspace - Userspace cleanup has a very long timeout of 5 minutes Signed-off-by: Simon Sundberg <[email protected]>
1 parent 029c05b commit a7eec49

File tree

4 files changed

+125
-49
lines changed

4 files changed

+125
-49
lines changed

pping/README.md

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,34 @@ TC-BPF (on egress) for the packet capture logic.
66
## Simple description
77
Passive Ping (PPing) is a simple tool for passively measuring per-flow RTTs. It
88
can be used on endhosts as well as any (BPF-capable Linux) device which can see
9-
both directions of the traffic (ex router or middlebox). Currently it only works
10-
for TCP traffic which uses the TCP timestamp option, but could be extended to
11-
also work with for example TCP seq/ACK numbers, the QUIC spinbit and ICMP
12-
echo-reply messages. See the [TODO-list](./TODO.md) for more potential features
13-
(which may or may not ever get implemented).
9+
both directions of the traffic (ex router or middlebox). Currently it works for
10+
TCP traffic which uses the TCP timestamp option and ICMP echo messages, but
11+
could be extended to also work with for example TCP seq/ACK numbers, the QUIC
12+
spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
13+
features (which may or may not ever get implemented).
1414

1515
The fundamental logic of pping is to timestamp a pseudo-unique identifier for
1616
outgoing packets, and then look for matches in the incoming packets. If a match
1717
is found, the RTT is simply calculated as the time difference between the
1818
current time and the stored timestamp.
1919

2020
This tool, just as Kathie's original pping implementation, uses TCP timestamps
21-
as identifiers. For outgoing packets, the TSval (which is a timestamp in and off
22-
itself) is timestamped. Incoming packets are then parsed for the TSecr, which
23-
are the echoed TSval values from the receiver. The TCP timestamps are not
24-
necessarily unique for every packet (they have a limited update frequency,
25-
appears to be 1000 Hz for modern Linux systems), so only the first instance of
26-
an identifier is timestamped, and matched against the first incoming packet with
27-
the identifier. The mechanism to ensure only the first packet is timestamped and
28-
matched differs from the one in Kathie's pping, and is further described in
29-
[SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
21+
as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
22+
timestamp in and off itself) is timestamped. Incoming packets are then parsed
23+
for the TSecr, which are the echoed TSval values from the receiver. The TCP
24+
timestamps are not necessarily unique for every packet (they have a limited
25+
update frequency, appears to be 1000 Hz for modern Linux systems), so only the
26+
first instance of an identifier is timestamped, and matched against the first
27+
incoming packet with the identifier. The mechanism to ensure only the first
28+
packet is timestamped and matched differs from the one in Kathie's pping, and is
29+
further described in [SAMPLING_DESIGN](./SAMPLING_DESIGN.md).
30+
31+
For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
32+
number as identifer to match against. Linux systems will typically use different
33+
echo identifers for different instances of ping, and thus each ping instance
34+
will be recongnized as a separate flow. Windows systems typically use a static
35+
echo identifer, and thus all instaces of ping originating from a particular
36+
Windows host and the same target host will be considered a single flow.
3037

3138
## Output formats
3239
pping currently supports 3 different formats, *standard*, *ppviz* and *json*. In
@@ -41,12 +48,12 @@ single line per event.
4148

4249
An example of the format is provided below:
4350
```shell
44-
16:00:46.142279766 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
45-
16:00:46.147705205 5.425439 ms 5.425439 ms 10.11.1.1:5201+10.11.1.2:59528
46-
16:00:47.148905125 5.261430 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
47-
16:00:48.151666385 5.972284 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
48-
16:00:49.152489316 6.017589 ms 5.261430 ms 10.11.1.1:5201+10.11.1.2:59528
49-
16:00:49.878508114 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
51+
16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src
52+
16:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
53+
16:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
54+
16:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
55+
16:00:49.152489316 6.017589 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
56+
16:00:49.878508114 TCP 10.11.1.1:5201+10.11.1.2:59528 closing due to RST from dest
5057
```
5158

5259
### ppviz format
@@ -196,8 +203,8 @@ these identifiers.
196203

197204
This issue could be avoided entirely by requiring that new-id > old-id instead
198205
of simply checking that new-id != old-id, as TCP timestamps should monotonically
199-
increase. That may however not be a suitable solution if/when we add support for
200-
other types of identifiers.
206+
increase. That may however not be a suitable solution for other types of
207+
identifiers.
201208

202209
#### Rate-limiting new timestamps
203210
In the tc/egress program packets to timestamp are sampled by using a per-flow

pping/TODO.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,15 @@
1414
- If one only considers SEQ/ACK (and don't check for SACK
1515
options), could result in ex. delay from retransmission being
1616
included in RTT
17-
- [ ] ICMP (ex Echo/Reply)
17+
- [x] ICMP (ex Echo/Reply)
1818
- [ ] QUIC (based on spinbit)
19+
- [ ] DNS queries
1920

2021
## General pping
2122
- [x] Add sampling so that RTT is not calculated for every packet
2223
(with unique value) for large flows
2324
- [ ] Allow short bursts to bypass sampling in order to handle
24-
delayed ACKs
25+
delayed ACKs, reordered or lost packets etc.
2526
- [x] Keep some per-flow state
2627
- Will likely be needed for the sampling
2728
- [ ] Could potentially include keeping track of average RTT, which

pping/pping.c

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
/* SPDX-License-Identifier: GPL-2.0-or-later */
22
static const char *__doc__ =
3-
"Passive Ping - monitor flow RTT based on TCP timestamps";
3+
"Passive Ping - monitor flow RTT based on header inspection";
44

55
#include <bpf/bpf.h>
66
#include <bpf/libbpf.h>
@@ -43,16 +43,16 @@ static const char *__doc__ =
4343
(1 * NS_PER_SECOND) // Update offset between CLOCK_MONOTONIC and CLOCK_REALTIME once per second
4444

4545
/*
46-
* BPF implementation of pping using libbpf
47-
* Uses TC-BPF for egress and XDP for ingress
48-
* - On egrees, packets are parsed for TCP TSval,
49-
* if found added to hashmap using flow+TSval as key,
50-
* and current time as value
51-
* - On ingress, packets are parsed for TCP TSecr,
52-
* if found looksup hashmap using reverse-flow+TSecr as key,
53-
* and calculates RTT as different between now map value
54-
* - Calculated RTTs are pushed to userspace
55-
* (together with the related flow) and printed out
46+
* BPF implementation of pping using libbpf.
47+
* Uses TC-BPF for egress and XDP for ingress.
48+
* - On egrees, packets are parsed for an identifer,
49+
* if found added to hashmap using flow+identifier as key,
50+
* and current time as value.
51+
* - On ingress, packets are parsed for reply identifer,
52+
* if found looksup hashmap using reverse-flow+identifier as key,
53+
* and calculates RTT as different between now and stored timestamp.
54+
* - Calculated RTTs are pushed to userspace
55+
* (together with the related flow) and printed out.
5656
*/
5757

5858
// Structure to contain arguments for clean_map (for passing to pthread_create)
@@ -555,16 +555,17 @@ static void print_event_standard(void *ctx, int cpu, void *data,
555555

556556
if (e->event_type == EVENT_TYPE_RTT) {
557557
print_ns_datetime(stdout, e->rtt_event.timestamp);
558-
printf(" %llu.%06llu ms %llu.%06llu ms ",
558+
printf(" %llu.%06llu ms %llu.%06llu ms %s ",
559559
e->rtt_event.rtt / NS_PER_MS,
560560
e->rtt_event.rtt % NS_PER_MS,
561561
e->rtt_event.min_rtt / NS_PER_MS,
562-
e->rtt_event.min_rtt % NS_PER_MS);
562+
e->rtt_event.min_rtt % NS_PER_MS,
563+
proto_to_str(e->rtt_event.flow.proto));
563564
print_flow_ppvizformat(stdout, &e->rtt_event.flow);
564565
printf("\n");
565566
} else if (e->event_type == EVENT_TYPE_FLOW) {
566567
print_ns_datetime(stdout, e->flow_event.timestamp);
567-
printf(" ");
568+
printf(" %s ", proto_to_str(e->rtt_event.flow.proto));
568569
print_flow_ppvizformat(stdout, &e->flow_event.flow);
569570
printf(" %s due to %s from %s\n",
570571
flowevent_to_str(e->flow_event.event_info.event),
@@ -578,6 +579,7 @@ static void print_event_ppviz(void *ctx, int cpu, void *data, __u32 data_size)
578579
const struct rtt_event *e = data;
579580
__u64 time = convert_monotonic_to_realtime(e->timestamp);
580581

582+
// ppviz format does not support flow events
581583
if (e->event_type != EVENT_TYPE_RTT)
582584
return;
583585

pping/pping_kern.c

Lines changed: 77 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
#include <linux/ip.h>
88
#include <linux/ipv6.h>
99
#include <linux/tcp.h>
10+
#include <linux/icmp.h>
11+
#include <linux/icmpv6.h>
1012
#include <stdbool.h>
1113

1214
// overwrite xdp/parsing_helpers.h value to avoid hitting verifier limit
@@ -181,6 +183,64 @@ static int parse_tcp_identifier(struct parsing_context *ctx, __be16 *sport,
181183
return 0;
182184
}
183185

186+
/*
187+
* Attemps to fetch an identifier for an ICMPv6 header, based on the echo
188+
* request/reply sequence number.
189+
* If successful, identifer will be set to the echo sequence number, both
190+
* sport and dport will be set to the echo identifier, and 0 will be returned.
191+
* On failure, -1 will be returned.
192+
* Note: Will store the 16-bit echo sequence number in network byte order in
193+
* the 32-bit identifier.
194+
*/
195+
static int parse_icmp6_identifier(struct parsing_context *ctx, __u16 *sport,
196+
__u16 *dport, struct flow_event_info *fei,
197+
__u32 *identifier)
198+
{
199+
struct icmp6hdr *icmp6h;
200+
201+
if (parse_icmp6hdr(&ctx->nh, ctx->data_end, &icmp6h) < 0)
202+
return -1;
203+
204+
if (ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REQUEST)
205+
return -1;
206+
if (!ctx->is_egress && icmp6h->icmp6_type != ICMPV6_ECHO_REPLY)
207+
return -1;
208+
if (icmp6h->icmp6_code != 0)
209+
return -1;
210+
211+
fei->event = FLOW_EVENT_NONE;
212+
*sport = icmp6h->icmp6_identifier;
213+
*dport = *sport;
214+
*identifier = icmp6h->icmp6_sequence;
215+
return 0;
216+
}
217+
218+
/*
219+
* Same as parse_icmp6_identifier, but for an ICMP(v4) header instead.
220+
*/
221+
static int parse_icmp_identifier(struct parsing_context *ctx, __u16 *sport,
222+
__u16 *dport, struct flow_event_info *fei,
223+
__u32 *identifier)
224+
{
225+
struct icmphdr *icmph;
226+
227+
if (parse_icmphdr(&ctx->nh, ctx->data_end, &icmph) < 0)
228+
return -1;
229+
230+
if (ctx->is_egress && icmph->type != ICMP_ECHO)
231+
return -1;
232+
if (!ctx->is_egress && icmph->type != ICMP_ECHOREPLY)
233+
return -1;
234+
if (icmph->code != 0)
235+
return -1;
236+
237+
fei->event = FLOW_EVENT_NONE;
238+
*sport = icmph->un.echo.id;
239+
*dport = *sport;
240+
*identifier = icmph->un.echo.sequence;
241+
return 0;
242+
}
243+
184244
/*
185245
* Attempts to parse the packet limited by the data and data_end pointers,
186246
* to retrieve a protocol dependent packet identifier. If sucessful, the
@@ -224,15 +284,21 @@ static int parse_packet_identifier(struct parsing_context *ctx,
224284
return -1;
225285
}
226286

227-
// Add new protocols here
228-
if (p_id->flow.proto == IPPROTO_TCP) {
229-
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port,
230-
fei, &p_id->identifier);
231-
if (err)
232-
return -1;
233-
} else {
234-
return -1;
235-
}
287+
// Parse identifer from suitable protocol
288+
if (p_id->flow.proto == IPPROTO_TCP)
289+
err = parse_tcp_identifier(ctx, &saddr->port, &daddr->port, fei,
290+
&p_id->identifier);
291+
else if (p_id->flow.proto == IPPROTO_ICMPV6 &&
292+
p_id->flow.ipv == AF_INET6)
293+
err = parse_icmp6_identifier(ctx, &saddr->port, &daddr->port,
294+
fei, &p_id->identifier);
295+
else if (p_id->flow.proto == IPPROTO_ICMP && p_id->flow.ipv == AF_INET)
296+
err = parse_icmp_identifier(ctx, &saddr->port, &daddr->port,
297+
fei, &p_id->identifier);
298+
else
299+
return -1; // No matching protocol
300+
if (err)
301+
return -1; // Failed parsing protocol
236302

237303
// Sucessfully parsed packet identifier - fill in IP-addresses and return
238304
if (p_id->flow.ipv == AF_INET) {
@@ -266,7 +332,7 @@ static void fill_flow_event(struct flow_event *fe, __u64 timestamp,
266332
{
267333
fe->event_type = EVENT_TYPE_FLOW;
268334
fe->timestamp = timestamp;
269-
__builtin_memcpy(&fe->flow, flow, sizeof(struct network_tuple));
335+
fe->flow = *flow;
270336
fe->source = source;
271337
fe->reserved = 0; // Make sure it's initilized
272338
}
@@ -402,7 +468,7 @@ int pping_ingress(struct xdp_md *ctx)
402468
re.rec_bytes = f_state->rec_bytes;
403469

404470
// Push event to perf-buffer
405-
__builtin_memcpy(&re.flow, &p_id.flow, sizeof(struct network_tuple));
471+
re.flow = p_id.flow;
406472
bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &re, sizeof(re));
407473

408474
validflow_out:

0 commit comments

Comments
 (0)