Decouple the sending of probes from the latency reporting in the NodeLatencyMonitor #6570

antoninbas · 2024-07-29T18:42:40Z

At the moment, the NodeLatencyMonitor in the Agent reports latency measurements immediately after sending ICMP probes:

Lines 444 to 451 in 1907856

    
           case <-tickerCh: 
        
           	// Try to send pingAll signal 
        
           	m.pingAll(ipv4Socket, ipv6Socket) 
        
           	// We no not delete IPs from nodeIPLatencyMap as part of the Node delete event handler 
        
           	// to avoid consistency issues and because it would not be sufficient to avoid stale entries completely. 
        
           	// This means that we have to periodically invoke DeleteStaleNodeIPs to avoid stale entries in the map. 
        
           	m.latencyStore.DeleteStaleNodeIPs() 
        
           	m.report()

I believe this is not ideal, because when outputting the NodeLatencyStats, the values for the lastRecvTime and lastSendTime
fields can be a bit confusing / misleading:

kubectl get nodelatencystats/kind-worker -o yaml

apiVersion: stats.antrea.io/v1alpha1
kind: NodeLatencyStats
metadata:
  creationTimestamp: null
  name: kind-worker
peerNodeLatencyStats:
- nodeName: kind-control-plane
  targetIPLatencyStats:
  - lastMeasuredRTTNanoseconds: 5837000
    lastRecvTime: "2024-07-26T22:40:03Z"
    lastSendTime: "2024-07-26T22:40:33Z"
    targetIP: 10.10.0.1
- nodeName: kind-worker2
  targetIPLatencyStats:
  - lastMeasuredRTTNanoseconds: 4704000
    lastRecvTime: "2024-07-26T22:40:03Z"
    lastSendTime: "2024-07-26T22:40:33Z"
    targetIP: 10.10.2.1

We are "always" going to have lastRecvTime < lastSendTime, because we always update NodeLatencyStats right after sending a new probe (before the response has had a chance to be received). Ideally most of the time, especially with very low inter-Node latency like we have here (a few ms), most of the time we would observe timestamps which are very close to each other / identical. This can be achieved by providing enough time to the NodeLatencyMonitor to receive / process the response, before calling m.report().

Another advantage of decoupling the sending of probes from the latency reporting would be the ability to enforce a minimum time interval between two consecutive reports. At the moment it is possible for someone to set pingIntervalSeconds to 1s (minimum supported value in the NodeLatencyMonitor CRD). In turn, this would cause m.report() to be invoked every second. That may be a bit too frequent for a monitoring tool, especially for a large cluster. So we could consider enforcing a minimum interval of 10s (even though that would mean that values of pingIntervalSeconds under 10s are not very useful).

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-28T00:05:14Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days

faheem047 · 2024-11-16T12:37:17Z

@antoninbas Kindly Check i have raised an PR regarding This issue i have tried to decouple the both PingTicker and ReportTicker your guidance regarding this wiil be highly appericiated

github-actions · 2025-02-15T00:04:51Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment, or this will be closed in 90 days. You can add a label "lifecycle/frozen" to skip stale checking.

antoninbas added the good first issue Good for newcomers label Jul 29, 2024

Yushmanth-reddy mentioned this issue Aug 13, 2024

Added delay before reporting and enforced a minimum interval of 10s #6608

Closed

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2024

antoninbas removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2024

faheem047 linked a pull request Nov 19, 2024 that will close this issue

[LatencyMonitor] Decouple sending of ICMP probes and latency reporting #6812

Open

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 15, 2025

luolanzone removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple the sending of probes from the latency reporting in the NodeLatencyMonitor #6570

Decouple the sending of probes from the latency reporting in the NodeLatencyMonitor #6570

antoninbas commented Jul 29, 2024

github-actions bot commented Oct 28, 2024

faheem047 commented Nov 16, 2024 •

edited

Loading

github-actions bot commented Feb 15, 2025

Decouple the sending of probes from the latency reporting in the NodeLatencyMonitor #6570

Decouple the sending of probes from the latency reporting in the NodeLatencyMonitor #6570

Comments

antoninbas commented Jul 29, 2024

github-actions bot commented Oct 28, 2024

faheem047 commented Nov 16, 2024 • edited Loading

github-actions bot commented Feb 15, 2025

faheem047 commented Nov 16, 2024 •

edited

Loading