Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ospfd: Improve OSPF neighbor retransmission list granularity and pacing #16128

Merged

Conversation

aceelindem
Copy link
Collaborator

@aceelindem aceelindem commented May 31, 2024

The current OSPF neighbor retransmission operates on a single per-neighbor periodic timer that sends all LSAs on the list when it expires. Additionally, since it skips the first retransmission of received LSAs so that at least the retransmission interval (resulting in a delay of between the retransmission interval and twice the interval. In environments where the links are lossy on P2MP networks with "delay-reflood" configured (which relies on neighbor retransmission in partial meshs), the implementation is sub-optimal (to say the least).

This commit reimplements OSPF neighbor retransmission as follows:

  1. A new data structure making use the typesafe.h doubly-link list to implement an OSPF temporal list where each node includes a timestamp.
  2. The existing neighbor LS retransmission LSDB data structure is augmented with a pointer to the list node on the temporal list to faciliate O(1) removal when the LSA is acknowledged.
  3. The neighbor LS retransmission timer is set to the expiration timer of the LSA at the top of the list.
  4. When the timer expires, LSAs are retransmitted that within the window of the current time and a small delta (50 milli-secs). The LSAs that are retransmitted are given an updated retransmission time and moved to the end of the list.
  5. Neighbor and interface LSA retransmission counters are added to provide insight into the lossiness of the links. However, these will increment quickly on non-fully meshed P2MP networks with "delay-reflood" configured.
  6. Added a topotest to exercise the implementation on a non-fully meshed P2MP network with "delay-reflood" configured. The alternative was to use existing mechanisms to introduce packet loss but these seem less deterministic in a topotest.

ospfd/ospf_flood.h Outdated Show resolved Hide resolved
@aceelindem aceelindem force-pushed the aceelindem/ospf-ls-retrans-improve branch 4 times, most recently from 829f4df to 2b53be0 Compare June 3, 2024 20:14
Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits in addition to other comments...

ospfd/ospf_vty.c Outdated Show resolved Hide resolved
ospfd/ospf_neighbor.c Outdated Show resolved Hide resolved
ospfd/ospf_flood.h Outdated Show resolved Hide resolved
@aceelindem aceelindem force-pushed the aceelindem/ospf-ls-retrans-improve branch 3 times, most recently from dea592e to 2c1a581 Compare June 7, 2024 20:34
@aceelindem
Copy link
Collaborator Author

CI:rerun

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed the wrong button ...

@riw777
Copy link
Member

riw777 commented Jun 11, 2024

not building on docker, but it looks like it's in bfd rather than here ... let's try again

ci:rerun

@aceelindem aceelindem force-pushed the aceelindem/ospf-ls-retrans-improve branch 2 times, most recently from 476f653 to b248c3a Compare June 14, 2024 15:09
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@aceelindem aceelindem force-pushed the aceelindem/ospf-ls-retrans-improve branch from b248c3a to 1808e7a Compare June 18, 2024 17:11
@aceelindem
Copy link
Collaborator Author

CI:rerun

2 similar comments
@aceelindem
Copy link
Collaborator Author

CI:rerun

@aceelindem
Copy link
Collaborator Author

CI:rerun

…ision

The current OSPF neighbor retransmission operates on a single per-neighbor
periodic timer that sends all LSAs on the list when it expires.
Additionally, since it skips the first retransmission of received LSAs so
that at least the retransmission interval (resulting in a delay of between
the retransmission interval and twice the interval. In environments where
the links are lossy on P2MP networks with "delay-reflood" configured (which
relies on neighbor retransmission in partial meshs), the implementation
is sub-optimal (to say the least).

This commit reimplements OSPF neighbor retransmission as follows:

   1. A new data structure making use the application managed
      typesafe.h doubly linked list implements an OSPF LSA
      list where each node includes a timestamp.
   2. The existing neighbor LS retransmission LSDB data structure
      is augmented with a pointer to the list node on the LSA
      list to faciliate O(1) removal when the LSA is acknowledged.
   3. The neighbor LS retransmission timer is set to the expiration
      timer of the LSA at the top of the list.
   4. When the timer expires, LSAs are retransmitted that within
      the window of the current time and a small delta (50 milli-secs
      default). The LSAs that are retransmited are given an updated
      retransmission time and moved to the end of the LSA list.
   5. Configuration is added to set the "retransmission-window" to a
      value other than 50 milliseconds.
   6. Neighbor and interface LSA retransmission counters are added
      to provide insight into the lossiness of the links. However,
      these will increment quickly on non-fully meshed P2MP networks
      with "delay-reflood" configured.
   7. Added a topotest to exercise the implementation on a non-fully
      meshed P2MP network with "delay-reflood" configured. The
      alternative was to use existing mechanisms to instroduce loss
      but these seem less determistic in a topotest.

Signed-off-by: Acee Lindem <[email protected]>
@aceelindem aceelindem force-pushed the aceelindem/ospf-ls-retrans-improve branch from 1808e7a to c494702 Compare June 20, 2024 15:35
@aceelindem
Copy link
Collaborator Author

CI:rerun

@aceelindem
Copy link
Collaborator Author

CI:rerun

The definition of insanity...

@aceelindem
Copy link
Collaborator Author

CI:rerun

@aceelindem
Copy link
Collaborator Author

CI:rerun

3 similar comments
@aceelindem
Copy link
Collaborator Author

CI:rerun

@aceelindem
Copy link
Collaborator Author

CI:rerun

@aceelindem
Copy link
Collaborator Author

CI:rerun

@riw777 riw777 merged commit 87c2f10 into FRRouting:master Jun 24, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants