@@ -13,20 +13,21 @@ spinbit and DNS queries. See the [TODO-list](./TODO.md) for more potential
1313features (which may or may not ever get implemented).
1414
1515The fundamental logic of pping is to timestamp a pseudo-unique identifier for
16- outgoing  packets, and then look for matches in the incoming  packets. If a match
17- is found,  the RTT is simply calculated as the time difference between the
18- current time and  the stored timestamp.
16+ packets, and then look for matches in the reply  packets. If a match is found, 
17+ the RTT is simply calculated as the time difference between the current time and 
18+ the stored timestamp.
1919
2020This tool, just as Kathie's original pping implementation, uses TCP timestamps
21- as identifiers for TCP traffic. For outgoing packets, the TSval (which is a
22- timestamp in and off itself) is timestamped. Incoming packets are then parsed
23- for the TSecr, which are the echoed TSval values from the receiver. The TCP
24- timestamps are not necessarily unique for every packet (they have a limited
25- update frequency, appears to be 1000 Hz for modern Linux systems), so only the
26- first instance of an identifier is timestamped, and matched against the first
27- incoming packet with the identifier. The mechanism to ensure only the first
28- packet is timestamped and matched differs from the one in Kathie's pping, and is
29- further described in [ SAMPLING_DESIGN] ( ./SAMPLING_DESIGN.md ) .
21+ as identifiers for TCP traffic. The TSval (which is a timestamp in and off
22+ itself) is used as an identifier and timestamped. Reply packets in the reverse
23+ flow are then parsed for the TSecr, which are the echoed TSval values from the
24+ receiver. The TCP timestamps are not necessarily unique for every packet (they
25+ have a limited update frequency, appears to be 1000 Hz for modern Linux
26+ systems), so only the first instance of an identifier is timestamped, and
27+ matched against the first incoming packet with a matching reply identifier. The
28+ mechanism to ensure only the first packet is timestamped and matched differs
29+ from the one in Kathie's pping, and is further described in
30+ [ SAMPLING_DESIGN] ( ./SAMPLING_DESIGN.md ) .
3031
3132For ICMP echo, it uses the echo identifier as port numbers, and echo sequence
3233number as identifer to match against. Linux systems will typically use different
@@ -48,7 +49,7 @@ single line per event.
4849
4950An example of the format is provided below:
5051``` shell 
51- 16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from src 
52+ 16:00:46.142279766 TCP 10.11.1.1:5201+10.11.1.2:59528 opening due to SYN-ACK from dest 
525316:00:46.147705205 5.425439 ms 5.425439 ms TCP 10.11.1.1:5201+10.11.1.2:59528
535416:00:47.148905125 5.261430 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
545516:00:48.151666385 5.972284 ms 5.261430 ms TCP 10.11.1.1:5201+10.11.1.2:59528
@@ -96,7 +97,7 @@ An example of a (pretty-printed) flow-event is provided below:
9697    "protocol" : " TCP"  ,
9798    "flow_event" : " opening"  ,
9899    "reason" : " SYN-ACK"  ,
99-     "triggered_by" : " src " 
100+     "triggered_by" : " dest " 
100101}
101102``` 
102103
@@ -114,7 +115,8 @@ An example of a (pretty-printed) RTT-even is provided below:
114115    "sent_packets" : 9393 ,
115116    "sent_bytes" : 492457296 ,
116117    "rec_packets" : 5922 ,
117-     "rec_bytes" : 37 
118+     "rec_bytes" : 37 ,
119+     "match_on_egress" : false 
118120}
119121``` 
120122
@@ -123,36 +125,33 @@ An example of a (pretty-printed) RTT-even is provided below:
123125
124126### Files:  
125127-  ** pping.c:**  Userspace program that loads and attaches the BPF programs, pulls
126-   the perf-buffer ` rtt_events `  to print out RTT messages and periodically cleans
128+   the perf-buffer ` events `  to print out RTT messages and periodically cleans
127129  up the hash-maps from old entries. Also passes user options to the BPF
128130  programs by setting a "global variable" (stored in the programs .rodata
129131  section).
130- -  ** pping_kern.c:**  Contains the BPF programs that are loaded on tc (egress) and
131-   XDP (ingress), as well as several common functions, a global constant ` config ` 
132-   (set from userspace) and map definitions. The tc program ` pping_egress() ` 
133-   parses outgoing packets for identifiers. If an identifier is found and the
134-   sampling strategy allows it, a timestamp for the packet is created in
135-   ` packet_ts ` . The XDP program ` pping_ingress() `  parses incomming packets for an
136-   identifier. If found, it looks up the ` packet_ts `  map for a match on the
137-   reverse flow (to match source/dest on egress). If there is a match, it
138-   calculates the RTT from the stored timestamp and deletes the entry. The
139-   calculated RTT (together with the flow-tuple) is pushed to the perf-buffer
140-   ` events ` . Both ` pping_egress() `  and ` pping_ingress `  can also push flow-events
141-   to the ` events `  buffer.
132+ -  ** pping_kern.c:**  Contains the BPF programs that are loaded on egress (tc) and
133+   ingress (XDP or tc), as well as several common functions, a global constant
134+   ` config `  (set from userspace) and map definitions. Essentially the same pping
135+   program is loaded on both ingress and egress. All packets are parsed for both
136+   an identifier that can be used to create a timestamp entry ` packet_ts ` , and a
137+   reply identifier that can be used to match the packet with a previously
138+   timestamped one in the reverse flow. If a match is found, an RTT is calculated
139+   and an RTT-event is pushed to userspace through the perf-buffer ` events ` . For
140+   each packet with a valid identifier, the program also keeps track of and
141+   updates the state flow and reverse flow, stored in the ` flow_state `  map.
142142-  ** pping.h:**  Common header file included by ` pping.c `  and
143143  ` pping_kern.c ` . Contains some common structs used by both (are part of the
144144  maps).
145145
146146### BPF Maps:  
147147-  ** flow_state:**  A hash-map storing some basic state for each flow, such as the
148148  last seen identifier for the flow and when the last timestamp entry for the
149-   flow was created. Entries are created by ` pping_egress() ` , and can be updated
150-   or deleted by both ` pping_egress() `  and ` pping_ingress() ` . Leftover entries
151-   are eventually removed by ` pping.c ` .
149+   flow was created. Entries are created, updated and deleted by the BPF pping
150+   programs. Leftover entries are eventually removed by userspace (` pping.c ` ).
152151-  ** packet_ts:**  A hash-map storing a timestamp for a specific packet
153-   identifier. Entries are created by ` pping_egress() `  and removed by 
154-   ` pping_ingress() `   if a match is found. Leftover entries are eventually removed 
155-   by ` pping.c ` .
152+   identifier. Entries are created by the BPF pping program if a valid identifier 
153+   is found, and removed  if a match is found. Leftover entries are eventually
154+   removed  by userspace ( ` pping.c ` ) .
156155-  ** events:**  A perf-buffer used by the BPF programs to push flow or RTT events
157156  to ` pping.c ` , which continuously polls the map the prints them out.
158157
@@ -222,9 +221,9 @@ additional map space and report some additional RTT(s) more than expected
222221(however the reported RTTs should still be correct).
223222
224223If the packets have the same identifier, they must first have managed to bypass
225- the previous check for unique identifiers (see [ previous point ] (#Tracking last 
226- seen  identifier)), and only one of them will be able to successfully store a 
227- timestamp entry.
224+ the previous check for unique identifiers (see [ previous
225+ point ] ( #tracking-last- seen- identifier) ), and only one of them will be able to
226+ successfully store a  timestamp entry.
228227
229228#### Matching against stored timestamps  
230229The XDP/ingress program could potentially match multiple concurrent packets with
@@ -246,8 +245,8 @@ if this is the lowest RTT seen so far for the flow. If multiple RTTs are
246245calculated concurrently, then several could pass this check concurrently and
247246there may be a lost update. It should only be possible for multiple RTTs to be
248247calculated concurrently in case either the [ timestamp rate-limit was
249- bypassed] (#Rate -limiting  new  timestamps) or [ multiple packets managed to match
250- against the same timestamp] (#Matching  against  stored  timestamps).
248+ bypassed] ( #rate -limiting- new- timestamps )  or [ multiple packets managed to match
249+ against the same timestamp] ( #matching- against- stored- timestamps ) .
251250
252251It's worth noting that with sampling the reported minimum-RTT is only an
253252estimate anyways (may never calculate RTT for packet with the true minimum
0 commit comments