-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ospfd: Solved crash in RI parsing with OSPF TE #15674
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See a couple questions on RFC 8665 parsing.
ospfd/ospf_te.c
Outdated
@@ -2456,6 +2456,8 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
switch (ntohs(tlvh->type)) { | |||
case RI_SR_TLV_SR_ALGORITHM: | |||
if (TLV_BODY_SIZE(tlvh) != ALGORITHM_COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of algorithms is variable - right? Why must there be 4? Shouldn't this be "< 1"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Catch. In fact I need that the size is comprise between 1 and 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just update the code to check that Algorithm size is in the interval [1 - 4]. Normally, 2 should be sufficient according to RFC8665 as only 2 algorithms have been defines (SPF and Strict SPF). This should be modified when Flex algo will be added latter.
@@ -2456,6 +2456,8 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
switch (ntohs(tlvh->type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't all these checks break out if the size isn't the minimum, i.e., "< minimum" rather than "!=".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes and no. Yes when subTLV has variable size like you mention about the Algorithm, and no when subTLVs have fixed size. I'll check with SRGB, SRLB and MSD in which category they felt and change code accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SRGB and SRLB are fixed size today (RFC 8665) but could be extended with Sub-TLVs in future RFCs. This seems unlikely but I may be overlooking something. MSD allows for multiple MSD types today (RFC 8476). What I would do if it could be done without too much effort is just ignore values that aren't understood. However, I don't feel that strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per RFC8665,
Only a single SID/Label Sub-TLV MAY be advertised in the SID/Label Range TLV. If more than one SID/Label Sub-TLV is present, the SID/Label Range TLV MUST be ignored.
Thus, the size should not be exceeded 12 bytes. If there is more than one SID/Label Sub-TLV, the code MUST ignore the SRGB (or SRLB). Thus, but checking that the TLV size is exactly 12, we ensure that there is only one SID/Label Sub-TLV and that the SRGB or SRLB is acceptable. Regarding the size of the SID/Label sub-TLVs size, it could be 3 or 4 depending if the SID is an MPLS label (3 bytes) or an index (4 bytes). But, in any case, due to the fact that OSPF impose a 4 bytes alignment of TLV, there is an extra byte when SID is an MPLS label. Thus, in all case, from the SRGB or SRLB TLV point of view, the SID/Label SubTLV has already the same size and conduct to a TLV of 12 bytes. Again, if there new RFCs that extend SRGB / SRLB, we'll modify the code accordingly. For the moment, the code is supporting strictly RFC8665, not more.
Regarding MSD, as per RFC8476:
When multiple Node MSD TLVs are received from a given router, the
receiver MUST use the first occurrence of the TLV in the Router
Information (RI) LSA.
In addition, as per RFC8491, only one Node MSD Type has been defined. Thus, only one Node MSD is pertinent and supported. So, checking that there is one Node MSD and not more is correct. Again, if new RFC defin new MSD Type in the registry, we'll enhance the code. However, I changed the code to check that Node MSD size is not lower than the expected size as you purpose.
Update the code accordingly to the review and add a new commit for Extended Link LSA parser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment on the remote interface address.
@@ -2744,6 +2753,13 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
/* Initialize TLV browsing */ | |||
len = TLV_BODY_SIZE(&ext->header) - EXT_TLV_LINK_SIZE; | |||
i = lsa->size - (OSPF_LSA_HEADER_SIZE + TLV_HDR_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is ok since RFC 3630 indicates in section 2.4.2:
Only one Link TLV shall be carried in each LSA, allowing for fine
granularity changes in topology.
You might want to add a comment stating this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2753,6 +2769,8 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
switch (ntohs(tlvh->type)) { | |||
case EXT_SUBTLV_ADJ_SID: | |||
if (TLV_BODY_SIZE(tlvh) != EXT_SUBTLV_ADJ_SID_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - this will work since TLV_BODY_SIZE() rounds up to the 4 byte boundary so it doesn't matter whether a local label or offset is advertised as specified in section 5 of RFC 8665.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That's the purpose of the TLV_BODY_SIZE() macro. In fact, all TLV_ macros are dealing with the 4 bytes alignment.
@@ -2808,6 +2828,8 @@ static int ospf_te_parse_ext_link(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
break; | |||
case EXT_SUBTLV_RMT_ITF_ADDR: | |||
if (TLV_BODY_SIZE(tlvh) != EXT_SUBTLV_RMT_ITF_ADDR_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 3630 allows for multiple remote addresses in section 2.5.4 of RFC 3630.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is not part of the RFC3630. In fact, this subTLV is a Cisco proprietary one using the experimental type 32768. From what I could observed on Cisco router, I saw only one remote IP address. If you have some Cisco internal documents that explain this TLV, please share them to us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this is the OSPF Extended Link Opaque LSA. What confused me is that this is a general purpose LSA rather than a TE LSA.
ospfd/ospf_te.c
Outdated
@@ -2456,6 +2456,9 @@ static int ospf_te_parse_ri(struct ls_ted *ted, struct ospf_lsa *lsa) | |||
|
|||
switch (ntohs(tlvh->type)) { | |||
case RI_SR_TLV_SR_ALGORITHM: | |||
if (TLV_BODY_SIZE(tlvh) < 1 | |||
|| TLV_BODY_SIZE(tlvh) > ALGORITHM_COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you remove this check and only retain the first ALGORITHM_COUNT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the goal is to check that there is a valid size to avoid a problem in case of malformed LSA, DDoS ... that Iggy detected with fuzzing. Thus, I prefer to keep the control to let the code more robust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that RFC 8665 section 3.1 doesn't limit the number of algorithms advertised to 4. What you could do is parse all that are advertised and only retain the algorithms that are known to the implementation since those are the only ones that would make a difference. See https://www.iana.org/assignments/igp-parameters/igp-parameters.xhtml#igp-algorithm-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no comments further than @aceelindem 's ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go ahead with this - while I would have allowed for the full extent of the sub-TLVs as allowed in the standards, I agree that the probability of these being advertised today is minimal.
The main goal of this PR is to enhance the original code to prevent ospfd crash against malformed LSA. Not to enhance or add new RFCs or features. We could discuss offline the list of improvement we need to add to ospfd. |
Could you fix frrbot styling issues before merging? |
Done |
ci:rerun |
ci:run |
So, this was an intermittant failure? The changes in the 3 recent commits all add more checking - which change fixed the failing topotests? I can't find any checking that was corrected (only added validation). What am I missing? |
Was this causing the topotest failures? if (IPV4_NET0(attr.standard.local.s_addr) && !attr.standard.local_id) { |
Currently zebra does not deny the routes if `ip protocol <proto> route-map FOO` commmand is configured with reference to an undefined route-map (FOO in this case). However, on FRR restart, in zebra_route_map_check() routes get denied if route-map name is available but the route-map is not defined. This change was introduced in fd303a4. Fix: When `ip protocol <proto> route-map FOO` CLI is configured with reference to an undefined route-map FOO, let the processing in ip_protocol_rm_add() and ip_protocol_rm_del() go through so that zebra can deny the routes instead of simply returning. This will result in consistent behavior. Testing Done: Before fix: ``` spine-1# configure spine-1(config)# ip protocol bgp route-map rmap7 root@spine-1:mgmt:/var/home/cumulus# vtysh -c "show run" | grep rmap7 ip protocol bgp route-map rmap7 root@spine-1:mgmt:/var/home/cumulus# spine-1(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric, Z - FRR, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure C>* 27.0.0.1/32 is directly connected, lo, 02:27:45 B>* 27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1, weight 1, 02:27:35 B>* 27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2, weight 1, 02:27:40 B>* 27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3, weight 1, 02:27:40 B>* 27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4, weight 1, 02:27:40 ``` After fix: ``` spine-1(config)# ip protocol bgp route-map route-map67 spine-1(config)# do show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, A - Babel, D - SHARP, F - PBR, f - OpenFabric, Z - FRR, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure C>* 27.0.0.1/32 is directly connected, lo, 00:35:03 B 27.0.0.3/32 [20/0] via fe80::202:ff:fe00:21, downlink_1 inactive, weight 1, 00:34:58 B 27.0.0.4/32 [20/0] via fe80::202:ff:fe00:29, downlink_2 inactive, weight 1, 00:34:57 B 27.0.0.5/32 [20/0] via fe80::202:ff:fe00:31, downlink_3 inactive, weight 1, 00:34:57 B 27.0.0.6/32 [20/0] via fe80::202:ff:fe00:39, downlink_4 inactive, weight 1, 00:34:58 spine-1(config)# root@spine-1:mgmt:/var/home/cumulus# ip route show root@spine-1:mgmt:/var/home/cumulus# ``` Signed-off-by: Pooja Jagadeesh Doijode <[email protected]>
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF LSA packets. The crash occurs in ospf_te_parse_ri() function when attemping to read Segment Routing subTLVs. The original code doesn't check if the size of the SR subTLVs have the correct length. In presence of erronous LSA, this will cause a buffer overflow and ospfd crash. This patch introduces new verification of the subTLVs size for Router Information TLV. Co-authored-by: Iggy Frankovic <[email protected]> Signed-off-by: Olivier Dugeon <[email protected]>
Iggy Frankovic discovered another ospfd crash when performing fuzzing of OSPF LSA packets. The crash occurs in ospf_te_parse_ext_link() function when attemping to read Segment Routing Adjacency SID subTLVs. The original code doesn't check if the size of the Extended Link TLVs and subTLVs have the correct length. In presence of erronous LSA, this will cause a buffer overflow and ospfd crashes. This patch introduces new verification of the subTLVs size for Extended Link TLVs and subTLVs. Similar check has been also introduced for the Extended Prefix TLV. Co-authored-by: Iggy Frankovic <[email protected]> Signed-off-by: Olivier Dugeon <[email protected]>
During fuzzing, Iggy Frankovic discovered that get_edge() function in ospf_te.c could return null pointer, in particular when the link_id or advertised router IP addresses are fuzzed. As the null pointer returned by get_edge() function is not handlei by calling functions, this could cause ospfd crash. This patch introduces new verification of returned pointer by get_edge() function and stop the processing in case of null pointer. In addition, link ID and advertiser router ID are validated before calling ls_find_edge_by_key() to avoid the creation of a new edge with an invalid key. CVE-2024-34088 Co-authored-by: Iggy Frankovic <[email protected]> Signed-off-by: Olivier Dugeon <[email protected]>
The previous versions of my PR were wrong and cause systematic crashed. It is not intermittent. |
Was this causing the topotest failures? if (IPV4_NET0(attr.standard.local.s_addr) && !attr.standard.local_id) { Let me try to explain how Opaque LSA is handle. Within ospfd/ospf_opaque.c there is a mechanism to register specific functions for a given type of Opaque LSA. The current ospfd code handle 4 different Opaque LSA: TE, Router Information, Extended Prefix and Extended Link. All of them have a different LSA-ID starting respectively by 1.x.x.x, 4.x.x.x, 7.x.x.x and 8.x.x.x. For each of them, specifics functions are registered in ospf_opaque handler. Look at the beginning of ospf_te.c, ospf_ri.c and ospf_ext.c. Now, within ospf opaque processing, if a registered function terminated with error, others registered functions in the queue are not call (see lines 1043 - 1045 in ospf_opaque.c). Regarding the current code and how configuration file is parsed, TE and RI functions are registered before Extended ones. Thus, if a TE or RI function failed during opaque handler, the Extended registered functions are not called which cause tests failure, in particular tests which uses Segment Routing. Thus, if we would change how opaque registered functions are handled in case of error, we need to discuss pro/cons of the current system and produce another PR. Hope these explanation are clear. |
Yes - I understand and I agree the OSPF opaque infra-structure shouldn't be so fragile. |
@Mergifyio backport stable/8.4 stable/8.5 stable/9.0 stable/9.1 stable/10.0 |
✅ Backports have been created
|
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
…4_8.5 ospfd: Solved crash in RI parsing with OSPF TE (backport #15674)
Iggy Frankovic discovered another ospfd crash when perfomring fuzzing of OSPF LSA packets. The crash occurs in ospf_te_parse_ri() function when attemping to read Segment Routing subTLVs. The original code doesn't check if the size of the SR subTLVs have the correct length. In presence of erronous LSA, this will cause a buffer overflow and ospfd crash.
This patch introduces new verification of the subTLVs size for Router Information TLV.