Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: OSPFD encapsulation #43

Open
wants to merge 1,357 commits into
base: ospf-unnumbered
Choose a base branch
from
Open

Question: OSPFD encapsulation #43

wants to merge 1,357 commits into from

Conversation

xspbrx
Copy link

@xspbrx xspbrx commented Jul 5, 2017

Hi,

Where is the encapsulation code for OSPFD?

Thank you

Dinesh Dutt and others added 30 commits September 13, 2016 10:36
IPv6 multipath is broken in BGP if nexthop contains only global address.
IPv6 always uses both nextop IPv6 address and ifIndex in sending routes down to
zebra. In cases where only the global IPv6 address is present in the nexthop
information, the existing code doesn't set the ifIndex. An example of such a
case is when a route-map isused with "set ipv6 next-hop" and only global
address is specified. This code causes the ifIndex to be determined and
set thereby fixing the multipath programming.

Signed-off-by: Dinesh G Dutt <[email protected]>
Reviewed-by: Shrijeet Mukherjee <[email protected]>
0. Introduction

This is the design specification for next hop tracking feature in
Quagga.

1. Background

Recursive routes are of the form:

   p/m --> n
  [Ex: 1.1.0.0/16 --> 2.2.2.2]

where 'n' itself is resolved through another route as follows:

   p2/m --> h, interface
  [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0]

Usually, BGP routes are recursive in nature and BGP nexthops get
resolved through an IGP route. IGP usually adds its routes pointing to
an interface (these are called non-recursive routes).

When BGP receives a recursive route from a peer, it needs to validate
the nexthop. The path is marked valid or invalid based on the
reachability status of the nexthop.  Nexthop validation is also
important for BGP decision process as the metric to reach the nexthop
is a parameter to best path selection process.

As it goes with routing, this is a dynamic process. Route to the
nexthop can change. The nexthop can become unreachable or
reachable. In the current BGP implementation, the nexthop validation
is done periodically in the scanner run. The default scanner run
interval is one minute. Every minute, the scanner task walks the
entire BGP table. It checks the validity of each nexthop with Zebra
(the routing table manager) through a request and response message
exchange between BGP and Zebra process. BGP process is blocked for
that duration. The mechanism has two major drawbacks:

(1) The scanner task runs to completion. That can potentially starve
    the other tasks for long periods of time, based on the BGP table
    size and number of nexthops.

(2) Convergence around routing changes that affect the nexthops can be
    long (around a minute with the default intervals). The interval
    can be shortened to achieve faster reaction time, but it makes the
    first problem worse, with the scanner task consuming most of the
    CPU resources.

"Next hop tracking" feature makes this process event-driven. It
eliminates periodic nexthop validation and introduces an asynchronous
communication path between BGP and Zebra for route change notifications
that can then be acted upon.

2. Goal

Stating the obvious, the main goal is to remove the two limitations we
discussed in the previous section. The goals, in a constructive tone,
are the following:

- fairness: the scanner run should not consume an unjustly high amount
  of CPU time. This should give an overall good performance and
  response time to other events (route changes, session events,
  IO/user interface).

- convergence: BGP must react to nexthop changes instantly and provide
  sub-second convergence. This may involve diverting the routes from
  one nexthop to another.

3. Overview of the changes

The changes are in both BGP and Zebra modules.  The short summary is
the following:

- Zebra implements a registration mechanism by which clients can
   register for next hop notification. Consequently, it maintains a
   separate table, per (VRF, AF) pair, of next hops and interested
   client-list per next hop.

- When the main routing table changes in Zebra, it evaluates the next
   hop table: for each next hop, it checks if the route table
   modifications have changed its state. If so, it notifies the
   interested clients.

- BGP is one such client. It registers the next hops corresponding to
   all of its received routes/paths. It also threads the paths against
   each nexthop structure.

- When BGP receives a next hop notification from Zebra, it walks the
   corresponding path list. It makes them valid or invalid depending
   on the next hop notification. It then re-computes best path for the
   corresponding destination. This may result in re-announcing those
   destinations to peers.

4. Design

4.1. Modules

The core design introduces an "nht" (next hop tracking) module in BGP
and "rnh" (recursive nexthop) module in Zebra. The "nht" module
provides the following APIs:

bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table
bgp_find_nexthop() : find a nexthop in BGP nexthop table
bgp_parse_nexthop_update() : parse a nexthop update message coming
                              from zebra

The "rnh" module provides the following APIs:

zebra_add_rnh() : add a recursive nexthop
zebra_delete_rnh() : delete a recursive nexthop
zebra_lookup_rnh() : lookup a recursive nexthop

zebra_add_rnh_client() : register a client for nexthop notifications
                         against a recursive nexthop

zebra_remove_rnh_client(): remove the client registration for a
                            recursive nexthop

zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table
                            (most probably because the main routing
                            table has changed).

zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module
                            data structures (most probably because the
                            client is going away).

4.2. Control flow

The next hop registration control flow is the following:

<====      BGP Process       ====>|<====      Zebra Process      ====>
                                  |
receive module     nht module     |  zserv module        rnh module
----------------------------------------------------------------------
              |                   |                  |
bgp_update_   |                   |                  |
      main()  | bgp_find_or_add_  |                  |
              |        nexthop()  |                  |
              |                   |                  |
              |                   | zserv_nexthop_   |
              |                   |       register() |
              |                   |                  | zebra_add_rnh()
              |                   |                  |

The next hop notification control flow is the following:

<====     Zebra Process    ====>|<====      BGP Process       ====>
                                |
rib module         rnh module   |     zebra module        nht module
----------------------------------------------------------------------
              |                 |                   |
meta_queue_   |                 |                   |
    process() | zebra_evaluate_ |                   |
              |     rnh_table() |                   |
              |                 |                   |
              |                 | bgp_read_nexthop_ |
              |                 |          update() |
              |                 |                   | bgp_parse_
              |                 |                   | nexthop_update()
              |                 |                   |

4.3. zclient message format

ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are
encoded in the following way:

/*
 *     0                   1                   2                   3
 *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |     AF                        |  prefix len   |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .      Nexthop prefix                                           .
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .                                                               .
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |     AF                        |  prefix len   |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .      Nexthop prefix                                           .
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 */

ZEBRA_NEXTHOP_UPDATE message is encoded as follows:

/*
 *     0                   1                   2                   3
 *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |     AF                        |  prefix len   |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .      Nexthop prefix getting resolved                          .
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |        metric                                                 |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * |  #nexthops    |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * | nexthop type  |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .      resolving Nexthop details                                .
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .                                                               .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * | nexthop type  |
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 * .      resolving Nexthop details                                .
 * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 */

4.4. BGP data structure

Legend:

/\   struct bgp_node: a BGP destination/route/prefix
\/

[ ]  struct bgp_info: a BGP path (e.g. route received from a peer)

 _
(_)  struct bgp_nexthop_cache: a BGP nexthop

   /\         NULL
   \/--+        ^
       |        :
       +--[ ]--[ ]--[ ]--> NULL
   /\           :
   \/--+        :
       |        :
       +--[ ]--[ ]--> NULL
                :
  _             :
 (_).............

4.5. Zebra data structure

rnh table:

           O
          / \
         O   O
            / \
           O   O

        struct rnh
        {
          u_char flags;
          struct rib *state;
          struct list *client_list;
          struct route_node *node;
        };

5. User interface changes

quagga# show ip nht
3.3.3.3
 resolved via kernel
 via 11.0.0.6, swp1
 Client list: bgp(fd 12)
11.0.0.10
 resolved via connected
 is directly connected, swp2
 Client list: bgp(fd 12)
11.0.0.18
 resolved via connected
 is directly connected, swp4
 Client list: bgp(fd 12)
11.11.11.11
 resolved via kernel
 via 10.0.1.2, eth0
 Client list: bgp(fd 12)

quagga# show ip bgp nexthop
Current BGP nexthop cache:
 3.3.3.3 valid [IGP metric 0], #paths 3
  Last update: Wed Oct 16 04:43:49 2013

 11.0.0.10 valid [IGP metric 1], #paths 1
  Last update: Wed Oct 16 04:43:51 2013

 11.0.0.18 valid [IGP metric 1], #paths 2
  Last update: Wed Oct 16 04:43:47 2013

 11.11.11.11 valid [IGP metric 0], #paths 1
  Last update: Wed Oct 16 04:43:47 2013

quagga# show ipv6 nht
quagga# show ip bgp nexthop detail

quagga# debug bgp nht
quagga# debug zebra nht

6. Sample test cases

     r2----r3
    /  \  /
  r1----r4

- Verify that a change in IGP cost triggers NHT
  + shutdown the r1-r4 and r2-r4 links
  + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back
    up
  + We should be back to the original nexthop via r4 now
- Verify that a NH becoming unreachable triggers NHT
  + Shutdown all links to r4
- Verify that a NH becoming reachable triggers NHT
  + no shut all links to r4

7. Future work

- route-policy for next hop validation (e.g. ignore default route)
- damping for rapid next hop changes
- prioritized handling of nexthop changes ((un)reachability vs. metric
  changes)
- handling recursion loop, e.g.
   11.11.11.11/32 -> 12.12.12.12
   12.12.12.12/32 -> 11.11.11.11
   11.0.0.0/8 -> <interface>
- better statistics
Addresses upstream comments.

"show ip bgp nexthop detail" couldn't display multiple NHs due to a bug.
Fix that.

Fix reference counts for the nexthop cache entries

Signed-off-by: Pradosh Mohapatra <[email protected]>
Signed-off-by: Daniel Walton <[email protected]>
Signed-off-by: Dinesh Dutt <[email protected]>
Signed-off-by: Donald Sharp <[email protected]>
Signed-off-by: Vivek Venkatraman <[email protected]>

Fix reference counts for the nexthop cache entries.

Signed-off-by: Vivek Venkatraman <[email protected]>

Edited-by: Paul Jakma <[email protected]>
- Fix nexthop_ipv6_add defs in rib.h not having been modified with rib_ prefix.
- Remove rib_lookup_and_pushup, appears not to be used except for
  !HAVE_NETLINK && HAVE_STRUCT_IFALIASREQ case of ioctl.c::if_set_prefix,
  so it's not being used at all on platform with most testing of RIB.
…ddresses are removed.

When you flap an interface repeatedly, you can get into situations where
the code has not quite finished cleaning up before the next event happens.
Gracefully prevent a NULL dereference.

Signed-off-by: Pradosh Mohapatra <[email protected]>
In the absence of this patch, attempting to type "address-family ipv6 unicast"
would result in an "Ambiguous command" error and in the case of
"address-family ipv6 multicast", the command would silently fail, without the
prompt dropping into the address-family mode.

The cause is how the parse tree is constructed for ipv6 address family. There
was an error in extract.pl.in script and in vtysh.c files which assumed that
there was only address family ipv6 command, without unicast or multicast and
so the command was failing.

Signed-off-by: vivek <[email protected]>
'outq' field in 'show ip bgp sum' displays the number of formatted packets
to a peer. Since the route announcement follows an input-buffered pattern
(i.e. adj-rib-out is a separate queue of routes per peer and packets are
formatted from the routes at the time of TCP write), the outq field doesn't
show any interesting data worth watching.

The patch is to display the adj-rib-out queue depth instead.

signed-off-by: [email protected]
reviewed-by: [email protected]
This patch adds support for matching on local preference in
BGP route-map.

Signed-off-by: Dinesh Dutt <[email protected]>
…reflectors.

By default, attribute modification via route-map policy out is ignored on
reflected routes. This patch provides an option to allow this modification
to occur. Once enabled, it affects all reflected routes.

Signed-off-by: Dinesh G Dutt <[email protected]>
Do not allow a program outside Quagga to delete a Quagga route from the kernel.
To delete a Quagga route, do it inside Quagga.

Signed-off-by: James Li <[email protected]>
Allow the user to actually turn off the set metric command
in ospf6d and rip.

Signed-off-by: Daniel Walton <[email protected]>
Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Dinesh Dutt <[email protected]>
Reviewed-by: Daniel Walton <[email protected]>
* bgp_aspath.c: (aspath_prepend) aspath_delete_confed_seq may result
  in as2 being updated, and seg2 becoming invalid. E.g. if the first
  segment of of as2 is confeds. However, code there after unconditionally
  reads from seg2.

  Reset seg2, and re-do the empty check on it.

  Caught by valgrinding tools/aspathtest.
* bgp_aspath.h: Add BGP_AS_IS_PRIVATE macro.
* bgp_aspath.c: (aspath_highest) use said macro to also ensure 4-byte private
  AS range is ignored in calculating highest public ASN.
  (aspath_private_as_check) consolidate to use said macro.

Note: Extracted from 'bgpd: Add replace-as option to remove-private-as'
by [email protected].
AS_PATH comparison is broken if CONFED_AS_SEQ are present.
This patch fixes this issue

Signed-off-by: Daniel Walton <[email protected]>
There were various failures in ANVL's aggregation tests, this
patch fixes those issues found

Signed-off-by: Daniel Walton <[email protected]>
…rtisement of a prefix

Add these commands to bgp:

clear ip bgp prefix A.B.C.D/M
clear bgp ipv6 (unicast|multicast) prefix X:X::X:X/M

These two commands forces a bestpath calculation to happen again if
necessary to re-advertise the prefix.

Signed-off-by: Daniel Walton <[email protected]>
When showing a prefix in bgp allow user to specify output
based upon the bestpath chosen, multipath information of all
information about a prefix(the default)

Signed-off-by: Daniel Walton <[email protected]>
Allow the user to enter the 'clear ip ospf interface [IFNAME]' command
this resets the connection between ospf and any peers out the
specified interface.

Signed-off-by: Vipin Kumar <[email protected]>
This is a fix to make sure router-LSA is updated when neighbor's interface
ID change is received in hello packet.

Signed-off-by: Vipin Kumar <[email protected]>
Clear interface commands for ospfv3.  Allow the user
to clear all peers out the specified interface.

Signed-off-by: Vipin Kumar <[email protected]>
SYMPTOM:

If some of the ospfv3 commands like 'show ipv6 ospf6 route' are executed
with ospf6d daemon running but before having any ospfv3 configuration, then
ospf6d crash is seen.

ISSUE:

There are a few show commands, which are (unlike others) not checking if
ospf6 instance is initialized already.

FIX:

Add the missing checks, by using OSPF6_CMD_CHECK_RUNNING() in the commands
where its needed and not yet used.

Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Pradosh Mohapatra <[email protected]>
             Dinesh Dutt <[email protected]>
When turning off redistribution in ospf6, allow the
user to specify the full form of the command entered.

Signed-off-by: Daniel Walton <[email protected]>
If a BGP path has an unreachable nexthop display that path as invalid

Signed-off-by: Daniel Walton <[email protected]>
ISSUE:

LSAcks (for directed acks) are being sent to neighbor's unicast address.

RFC 2328 says:

"The IP destination address for the packet is selected as
 follows.  On physical point-to-point networks, the IP
 destination is always set to the address AllSPFRouters"

Fix is to unconditionally set the destination address for LSAcks over
point-to-point links as AllSPFRouters. Quagga OSPF already has similar
change for OSPF DBD, LSUpdate and LSrequest packets.

Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Daniel Walton <[email protected]>
Reviewed-by: Dinesh G Dutt <[email protected]>
OSPF silently ignores 'no ip ospf hello-interval X' and 'no ip ospf hello-interval X'

Signed-off-by: Daniel Walton <[email protected]>
Reviewed-by: Dinesh G Dutt <[email protected]>
ANVL test 17.5. The current implementation wouldn't start sending LSReq unti
the DB Desc packets have all been received (no M bit). This caused the test
choke up. RFC 2328 allows for sending LSReq on receiving the first DbDesc
packet as long as the nbr state is Exchange. This patch fixes that.

Signed-off-by: Dinesh Dutt <[email protected]>

Edited-by: Paul Jakma <[email protected]> to start the sending of LsReq
from the nsm_negotiation_done FSM transition function for ExStart->Exchange,
rather than tacking the call to ospf_ls_req_send to the bottom of the DD
desc processing function.
As per the RFC, when the NU bit is set, prefix should be ignored.
However, the code is currently ignoring prefix with LA bit too.

Fixing that part.

In future, we should also set LA bit for the loopback addresses. Not doing this
part right away, as quagga wont be backward compatible with its own previous
releases. Maybe after a release or so, we should start setting LA bit too.

Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Daniel Walton <[email protected]>
When a route_node has multiple ospf6_routes under it (common subnet case),
then the current implementation has an issue in adjusting the route_node->info
on a ospf6_route_remove() call.

The main reason is that it ends up using exact match to determine if the next
ospf6_route belongs to the same route_node or not. Fixing that part to use
rnode (the existing back-pointer to the route_node) from the ospf6_route to
determine that.

Also fixing some of the walks to turn them safe so that the route deletion is
fine.

Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Vivek Venkatraman <[email protected]>
This is to avoid a crash triggered by process termination
when ospf6d daemon is running and 'router ospf6' config has not been done yet.

Signed-off-by: Vipin Kumar <[email protected]>
Reviewed-by: Daniel Walton <[email protected]>
pjakma and others added 29 commits March 9, 2017 18:03
* routeserver.texi: The full-mesh v RS topology diagrammes are fairly obvious
  and don't deserve so much space.  As smaller wrap-floats they might be
  acceptable, but there seems no way to access that TeX feature from
  Texinfo.  So nuke them.

  Centre the 2 other figures. Tweak size to avoid bbox overflow messages.
* Makefile.am: The Quagga specific PDF dependency is gone.  The
  overfull boxes that caused texi2dvi to return fail code are gone.  So
  the custom rule can go.
* overfull hboxes cause texi2dvi to return fail, even if the PDf is written.
  We hacked around this by running it with '...  || true', but that
  sucks for buildboting the docs.

  Fix all the overfull hboxes.

* ospfd.texi: Long command definitions can cause hbox overruns in the
  columnar command definitions index.  This leads to strange errors
  about "Missing number, treated as zero." when building the index -
  very hard to figure out.

  'show ip ospf database ...' was the culprit.  Use a distinct deffnx
  alias for each option instead of trying to stuff them into 1 line.
…er tweaks

* master.cfg: Add a "build-docs" builder to test generation of HTML and PDF
  docs into the commit checks.  With nightly=true property, upload generated
  docs to a static dir on the master.  Run from a NightlyScheduler.

  Add the properties from the internal worker config to the buildbot
  BuildSlave properties, so they're visible in the web UI. Filter through
  a workers2publicprops helper, to whitelist allowed props.
was accidentally not implemented earlier
to triggers refresh of IKE SAs immediately on this command
RFC2332 states that prefix length MUST be 0xff for unique bindings.
However, it seems at least some Cisco firmwares use host prefix
length instead (which on wire level makes sense). Relax the handling
of prefix length to treat all value longer than address length as
0xff. Additionally treat 0x00 the same way too, this is required
by the RFC. This also fixes the prefix length address family to be
checked against protocol address.
" % Ambiguous command."
Fixes bug #950

Signed-off-by: Balaji Gurudoss <[email protected]>
Fix to resolve ripv2 update process from skipping over non multicast interfaces when sending updates.

Reported by: Christian Hammers <[email protected]>
Cisco has a bug that it rejects packets with zero hop count.
Use one to avoid potential forwarding of registration requests.
Fixes bug #955.

Removed the installation of vrf specific tag and distance command.

Reported by: goodman <[email protected]>
If you have an AS_PATH with more entries than
what can be written into a single AS_SEGMENT_MAX
it needs to be broken up.  The code that noticed
that the AS_PATH needs to be broken up was not
correctly calculating the size of the resulting
message.  This patch addresses this issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.