Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ospfd: Solved crash in RI parsing with OSPF TE #15674

Merged
merged 4 commits into from
May 24, 2024

Conversation

odd22
Copy link
Member

@odd22 odd22 commented Apr 3, 2024

Iggy Frankovic discovered another ospfd crash when perfomring fuzzing of OSPF LSA packets. The crash occurs in ospf_te_parse_ri() function when attemping to read Segment Routing subTLVs. The original code doesn't check if the size of the SR subTLVs have the correct length. In presence of erronous LSA, this will cause a buffer overflow and ospfd crash.

This patch introduces new verification of the subTLVs size for Router Information TLV.

Copy link
Collaborator

@aceelindem aceelindem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a couple questions on RFC 8665 parsing.

ospfd/ospf_te.c Outdated
@@ -2456,6 +2456,8 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa)

switch (ntohs(tlvh->type)) {
case RI_SR_TLV_SR_ALGORITHM:
if (TLV_BODY_SIZE(tlvh) != ALGORITHM_COUNT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of algorithms is variable - right? Why must there be 4? Shouldn't this be "< 1"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Catch. In fact I need that the size is comprise between 1 and 4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just update the code to check that Algorithm size is in the interval [1 - 4]. Normally, 2 should be sufficient according to RFC8665 as only 2 algorithms have been defines (SPF and Strict SPF). This should be modified when Flex algo will be added latter.

@@ -2456,6 +2456,8 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa)

switch (ntohs(tlvh->type)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't all these checks break out if the size isn't the minimum, i.e., "< minimum" rather than "!=".

Copy link
Member Author

@odd22 odd22 Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes and no. Yes when subTLV has variable size like you mention about the Algorithm, and no when subTLVs have fixed size. I'll check with SRGB, SRLB and MSD in which category they felt and change code accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRGB and SRLB are fixed size today (RFC 8665) but could be extended with Sub-TLVs in future RFCs. This seems unlikely but I may be overlooking something. MSD allows for multiple MSD types today (RFC 8476). What I would do if it could be done without too much effort is just ignore values that aren't understood. However, I don't feel that strongly.

Copy link
Member Author

@odd22 odd22 Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per RFC8665,

Only a single SID/Label Sub-TLV MAY be advertised in the SID/Label Range TLV. If more than one SID/Label Sub-TLV is present, the SID/Label Range TLV MUST be ignored.

Thus, the size should not be exceeded 12 bytes. If there is more than one SID/Label Sub-TLV, the code MUST ignore the SRGB (or SRLB). Thus, but checking that the TLV size is exactly 12, we ensure that there is only one SID/Label Sub-TLV and that the SRGB or SRLB is acceptable. Regarding the size of the SID/Label sub-TLVs size, it could be 3 or 4 depending if the SID is an MPLS label (3 bytes) or an index (4 bytes). But, in any case, due to the fact that OSPF impose a 4 bytes alignment of TLV, there is an extra byte when SID is an MPLS label. Thus, in all case, from the SRGB or SRLB TLV point of view, the SID/Label SubTLV has already the same size and conduct to a TLV of 12 bytes. Again, if there new RFCs that extend SRGB / SRLB, we'll modify the code accordingly. For the moment, the code is supporting strictly RFC8665, not more.

Regarding MSD, as per RFC8476:

When multiple Node MSD TLVs are received from a given router, the
receiver MUST use the first occurrence of the TLV in the Router
Information (RI) LSA.

In addition, as per RFC8491, only one Node MSD Type has been defined. Thus, only one Node MSD is pertinent and supported. So, checking that there is one Node MSD and not more is correct. Again, if new RFC defin new MSD Type in the registry, we'll enhance the code. However, I changed the code to check that Node MSD size is not lower than the expected size as you purpose.

@odd22
Copy link
Member Author

odd22 commented Apr 5, 2024

Update the code accordingly to the review and add a new commit for Extended Link LSA parser

Copy link
Collaborator

@aceelindem aceelindem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment on the remote interface address.

@@ -2744,6 +2753,13 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa)

/* Initialize TLV browsing */
len = TLV_BODY_SIZE(&ext->header) - EXT_TLV_LINK_SIZE;
i = lsa->size - (OSPF_LSA_HEADER_SIZE + TLV_HDR_SIZE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is ok since RFC 3630 indicates in section 2.4.2:

Only one Link TLV shall be carried in each LSA, allowing for fine
granularity changes in topology.

You might want to add a comment stating this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2753,6 +2769,8 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa)

switch (ntohs(tlvh->type)) {
case EXT_SUBTLV_ADJ_SID:
if (TLV_BODY_SIZE(tlvh) != EXT_SUBTLV_ADJ_SID_SIZE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - this will work since TLV_BODY_SIZE() rounds up to the 4 byte boundary so it doesn't matter whether a local label or offset is advertised as specified in section 5 of RFC 8665.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That's the purpose of the TLV_BODY_SIZE() macro. In fact, all TLV_ macros are dealing with the 4 bytes alignment.

@@ -2808,6 +2828,8 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa)

break;
case EXT_SUBTLV_RMT_ITF_ADDR:
if (TLV_BODY_SIZE(tlvh) != EXT_SUBTLV_RMT_ITF_ADDR_SIZE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 3630 allows for multiple remote addresses in section 2.5.4 of RFC 3630.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not part of the RFC3630. In fact, this subTLV is a Cisco proprietary one using the experimental type 32768. From what I could observed on Cisco router, I saw only one remote IP address. If you have some Cisco internal documents that explain this TLV, please share them to us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is the OSPF Extended Link Opaque LSA. What confused me is that this is a general purpose LSA rather than a TE LSA.

ospfd/ospf_te.c Outdated
@@ -2456,6 +2456,9 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa)

switch (ntohs(tlvh->type)) {
case RI_SR_TLV_SR_ALGORITHM:
if (TLV_BODY_SIZE(tlvh) < 1
|| TLV_BODY_SIZE(tlvh) > ALGORITHM_COUNT)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you remove this check and only retain the first ALGORITHM_COUNT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the goal is to check that there is a valid size to avoid a problem in case of malformed LSA, DDoS ... that Iggy detected with fuzzing. Thus, I prefer to keep the control to let the code more robust.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that RFC 8665 section 3.1 doesn't limit the number of algorithms advertised to 4. What you could do is parse all that are advertised and only retain the algorithms that are known to the implementation since those are the only ones that would make a difference. See https://www.iana.org/assignments/igp-parameters/igp-parameters.xhtml#igp-algorithm-types

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no comments further than @aceelindem 's ...

Copy link
Collaborator

@aceelindem aceelindem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead with this - while I would have allowed for the full extent of the sub-TLVs as allowed in the standards, I agree that the probability of these being advertised today is minimal.

@odd22
Copy link
Member Author

odd22 commented Apr 9, 2024

Let's go ahead with this - while I would have allowed for the full extent of the sub-TLVs as allowed in the standards, I agree that the probability of these being advertised today is minimal.

The main goal of this PR is to enhance the original code to prevent ospfd crash against malformed LSA. Not to enhance or add new RFCs or features. We could discuss offline the list of improvement we need to add to ospfd.

@ton31337
Copy link
Member

Could you fix frrbot styling issues before merging?

@odd22
Copy link
Member Author

odd22 commented Apr 16, 2024

Could you fix frrbot styling issues before merging?

Done

@odd22
Copy link
Member Author

odd22 commented May 22, 2024

ci:rerun

@odd22
Copy link
Member Author

odd22 commented May 22, 2024

ci:run

@aceelindem
Copy link
Collaborator

Got the error this time. The problem comes from a wrong TLV size verification for TE database which output the ospf_te_parse_pref() and ospf_te_parse_link() with error. Unfortunately, within ospf opaque processing, if a registered function terminate with error, others registered functions in the queue are not call (see lines 1043 - 1045 in ospf_opaque.c). Thus, Opaque LSA for Segment Routing are not processed and thus, tests failed as MPLS table is not fulfill.

Hope the tests pass this time.

So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing?

@aceelindem
Copy link
Collaborator

Got the error this time. The problem comes from a wrong TLV size verification for TE database which output the ospf_te_parse_pref() and ospf_te_parse_link() with error. Unfortunately, within ospf opaque processing, if a registered function terminate with error, others registered functions in the queue are not call (see lines 1043 - 1045 in ospf_opaque.c). Thus, Opaque LSA for Segment Routing are not processed and thus, tests failed as MPLS table is not fulfill.
Hope the tests pass this time.

So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing?

Was this causing the topotest failures?

if (IPV4_NET0(attr.standard.local.s_addr) && !attr.standard.local_id) {
ote_debug(" |- Found no TE Link local address/ID. Abort!");
return -1;
}

Pdoijode and others added 4 commits May 23, 2024 10:46
Currently zebra does not deny the routes if `ip protocol <proto> route-map
FOO`
commmand is configured with reference to an undefined route-map (FOO in
this case).
However, on FRR restart, in zebra_route_map_check() routes get denied
if route-map name is available but the route-map is not defined. This
change was introduced in fd303a4.

Fix:
When `ip protocol <proto> route-map FOO` CLI is configured with reference to an
undefined route-map FOO, let the processing in ip_protocol_rm_add() and
ip_protocol_rm_del() go through so that zebra can deny the routes instead
of simply returning. This will result in consistent behavior.

Testing Done:

Before fix:
```
spine-1# configure
spine-1(config)# ip protocol bgp route-map rmap7

root@spine-1:mgmt:/var/home/cumulus# vtysh -c "show run" | grep rmap7
ip protocol bgp route-map rmap7
root@spine-1:mgmt:/var/home/cumulus#

spine-1(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
       Z - FRR,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C>* 27.0.0.1/32 is directly connected, lo, 02:27:45
B>* 27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1, weight 1, 02:27:35
B>* 27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2, weight 1, 02:27:40
B>* 27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3, weight 1, 02:27:40
B>* 27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4, weight 1, 02:27:40
```

After fix:
```
spine-1(config)# ip protocol bgp route-map route-map67
spine-1(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric,
       Z - FRR,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

C>* 27.0.0.1/32 is directly connected, lo, 00:35:03
B   27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1 inactive, weight 1, 00:34:58
B   27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2 inactive, weight 1, 00:34:57
B   27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3 inactive, weight 1, 00:34:57
B   27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4 inactive, weight 1, 00:34:58
spine-1(config)#

root@spine-1:mgmt:/var/home/cumulus# ip route show
root@spine-1:mgmt:/var/home/cumulus#
```

Signed-off-by: Pooja Jagadeesh Doijode <[email protected]>
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF
LSA packets. The crash occurs in ospf_te_parse_ri() function when attemping to
read Segment Routing subTLVs. The original code doesn't check if the size of
the SR subTLVs have the correct length. In presence of erronous LSA, this will
cause a buffer overflow and ospfd crash.

This patch introduces new verification of the subTLVs size for Router
Information TLV.

Co-authored-by: Iggy Frankovic <[email protected]>
Signed-off-by: Olivier Dugeon <[email protected]>
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF
LSA packets. The crash occurs in ospf_te_parse_ext_link() function when
attemping to read Segment Routing Adjacency SID subTLVs. The original code
doesn't check if the size of the Extended Link TLVs and subTLVs have the correct
length. In presence of erronous LSA, this will cause a buffer overflow and ospfd
crashes.

This patch introduces new verification of the subTLVs size for Extended Link
TLVs and subTLVs. Similar check has been also introduced for the Extended
Prefix TLV.

Co-authored-by: Iggy Frankovic <[email protected]>
Signed-off-by: Olivier Dugeon <[email protected]>
During fuzzing, Iggy Frankovic discovered that get_edge() function in ospf_te.c
could return null pointer, in particular when the link_id or advertised router
IP addresses are fuzzed. As the null pointer returned by get_edge() function is
not handlei by calling functions, this could cause ospfd crash.

This patch introduces new verification of returned pointer by get_edge()
function and stop the processing in case of null pointer. In addition, link ID
and advertiser router ID are validated before calling ls_find_edge_by_key() to
avoid the creation of a new edge with an invalid key.

CVE-2024-34088

Co-authored-by: Iggy Frankovic <[email protected]>
Signed-off-by: Olivier Dugeon <[email protected]>
@odd22
Copy link
Member Author

odd22 commented May 24, 2024

So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing?

The previous versions of my PR were wrong and cause systematic crashed. It is not intermittent.

@odd22
Copy link
Member Author

odd22 commented May 24, 2024

So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing?

Was this causing the topotest failures?

if (IPV4_NET0(attr.standard.local.s_addr) && !attr.standard.local_id) {
ote_debug(" |- Found no TE Link local address/ID. Abort!");
return -1;
}

Let me try to explain how Opaque LSA is handle. Within ospfd/ospf_opaque.c there is a mechanism to register specific functions for a given type of Opaque LSA. The current ospfd code handle 4 different Opaque LSA: TE, Router Information, Extended Prefix and Extended Link. All of them have a different LSA-ID starting respectively by 1.x.x.x, 4.x.x.x, 7.x.x.x and 8.x.x.x. For each of them, specifics functions are registered in ospf_opaque handler. Look at the beginning of ospf_te.c, ospf_ri.c and ospf_ext.c.

Now, within ospf opaque processing, if a registered function terminated with error, others registered functions in the queue are not call (see lines 1043 - 1045 in ospf_opaque.c). Regarding the current code and how configuration file is parsed, TE and RI functions are registered before Extended ones. Thus, if a TE or RI function failed during opaque handler, the Extended registered functions are not called which cause tests failure, in particular tests which uses Segment Routing.

Thus, if we would change how opaque registered functions are handled in case of error, we need to discuss pro/cons of the current system and produce another PR.

Hope these explanation are clear.

@aceelindem
Copy link
Collaborator

So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing?

Was this causing the topotest failures?

if (IPV4_NET0(attr.standard.local.s_addr) && !attr.standard.local_id) { ote_debug(" |- Found no TE Link local address/ID. Abort!"); return -1; }

Let me try to explain how Opaque LSA is handle. Within ospfd/ospf_opaque.c there is a mechanism to register specific functions for a given type of Opaque LSA. The current ospfd code handle 4 different Opaque LSA: TE, Router Information, Extended Prefix and Extended Link. All of them have a different LSA-ID starting respectively by 1.x.x.x, 4.x.x.x, 7.x.x.x and 8.x.x.x. For each of them, specifics functions are registered in ospf_opaque handler. Look at the beginning of ospf_te.c, ospf_ri.c and ospf_ext.c.

Now, within ospf opaque processing, if a registered function terminated with error, others registered functions in the queue are not call (see lines 1043 - 1045 in ospf_opaque.c). Regarding the current code and how configuration file is parsed, TE and RI functions are registered before Extended ones. Thus, if a TE or RI function failed during opaque handler, the Extended registered functions are not called which cause tests failure, in particular tests which uses Segment Routing.

Thus, if we would change how opaque registered functions are handled in case of error, we need to discuss pro/cons of the current system and produce another PR.

Hope these explanation are clear.

Yes - I understand and I agree the OSPF opaque infra-structure shouldn't be so fragile.

@Jafaral
Copy link
Member

Jafaral commented May 24, 2024

@Mergifyio backport stable/8.4 stable/8.5 stable/9.0 stable/9.1 stable/10.0

Copy link

mergify bot commented May 24, 2024

backport stable/8.4 stable/8.5 stable/9.0 stable/9.1 stable/10.0

✅ Backports have been created

@Jafaral Jafaral merged commit 52703f0 into FRRouting:master May 24, 2024
11 checks passed
donaldsharp added a commit that referenced this pull request May 25, 2024
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
donaldsharp added a commit that referenced this pull request May 25, 2024
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
donaldsharp added a commit that referenced this pull request May 25, 2024
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
donaldsharp added a commit that referenced this pull request May 25, 2024
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
donaldsharp added a commit that referenced this pull request May 29, 2024
…4_8.5

ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants