Reduce size of skb->cb #144

cpaasch · 2016-10-23T17:19:32Z

When moving to v4.1 we had to increase the size of skb->cb to 56 bytes. This is unacceptable for upstreaming. We need to bring it back down to 48 bytes.

The text was updated successfully, but these errors were encountered:

faddat · 2016-11-26T17:30:57Z

This is interesting. Where can I find skb->cb?

Note: No idea if I can help or not, but I'd love to have a look.

I searched the codebase and found many files referencing combinations of skb and cb. One file needs changing, or many?

cpaasch · 2016-11-27T21:34:04Z

@faddat skb->cb is in struct sk_buff. We had to increase it from 48 bytes to 56 bytes, to accomodate for the MPTCP-specific fields in struct tcp_skb_cb

Reducing it back to 48 will probably need some major rework.

cpaasch · 2016-11-27T21:38:36Z

@faddat - looking more into it, it can probably easily be done by reordering some fields, as we introduced a hole in the struct tcp_skb_cb. You can check with pahole to see if that works.

cpaasch · 2016-11-27T22:06:32Z

@faddat And it's me again :)

Actually, seems like we can just reduce the size down to 48 bytes again in struct sk_buff. If you want, you can submit a patch. Please compile-test with IPv6 and Mobile-IPv6 enabled.

doru91 · 2017-03-03T21:41:38Z

@cpaasch - it's not clear for me why you increased the size of the cb from 44 to 48 bytes (before 4.1) then you increased it to 56 bytes (due to 744d5a3). Could you explain a bit?

Before 4.1: The dss option need 24 bytes (u32 dss[6]) and it's inside a double union that's already accommodating 16 bytes (struct inet6_skb_parm h6). The cb size should increase with 8 bytes, shouldn't it?

Thanks!

cpaasch · 2017-03-03T23:03:05Z

At least, when I test-compiled it with mobile-IPv6-enabled I don't think I got an error. You can maybe just try out whith allyesmod. If it compiles, then 48 is fine.

BigNerd95 · 2018-02-24T08:22:35Z

May I know why it "is unacceptable for upstreaming"?

cpaasch · 2018-02-24T16:47:34Z

Increasing the size of the skb slows down the networking stack because accessing fields of the skb will incur additional cacheline misses. The upstream community is trying to minimize the cache misses as much as possible.

[ Upstream commit a447da7 ] syzkaller managed to trigger a use-after-free in tls like the following: BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls] Write of size 1 at addr ffff88037aa08000 by task a.out/2317 CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ multipath-tcp#144 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 kasan_report+0x258/0x380 ? tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_sw_push_pending_record+0x2e/0x40 [tls] tls_sk_proto_close+0x3fe/0x710 [tls] ? tcp_check_oom+0x4c0/0x4c0 ? tls_write_space+0x260/0x260 [tls] ? kmem_cache_free+0x88/0x1f0 inet_release+0xd6/0x1b0 __sock_release+0xc0/0x240 sock_close+0x11/0x20 __fput+0x22d/0x660 task_work_run+0x114/0x1a0 do_exit+0x71a/0x2780 ? mm_update_next_owner+0x650/0x650 ? handle_mm_fault+0x2f5/0x5f0 ? __do_page_fault+0x44f/0xa50 ? mm_fault_error+0x2d0/0x2d0 do_group_exit+0xde/0x300 __x64_sys_exit_group+0x3a/0x50 do_syscall_64+0x9a/0x300 ? page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happened through fault injection where aead_req allocation in tls_do_encryption() eventually failed and we returned -ENOMEM from the function. Turns out that the use-after-free is triggered from tls_sw_sendmsg() in the second tls_push_record(). The error then triggers a jump to waiting for memory in sk_stream_wait_memory() resp. returning immediately in case of MSG_DONTWAIT. What follows is the trim_both_sgl(sk, orig_size), which drops elements from the sg list added via tls_sw_sendmsg(). Now the use-after-free gets triggered when the socket is being closed, where tls_sk_proto_close() callback is invoked. The tls_complete_pending_work() will figure that there's a pending closed tls record to be flushed and thus calls into the tls_push_pending_closed_record() from there. ctx->push_pending_record() is called from the latter, which is the tls_sw_push_pending_record() from sw path. This again calls into tls_push_record(). And here the tls_fill_prepend() will panic since the buffer address has been freed earlier via trim_both_sgl(). One way to fix it is to move the aead request allocation out of tls_do_encryption() early into tls_push_record(). This means we don't prep the tls header and advance state to the TLS_PENDING_CLOSED_RECORD before allocation which could potentially fail happened. That fixes the issue on my side. Fixes: 3c4d755 ("tls: kernel TLS support") Reported-by: [email protected] Reported-by: [email protected] Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Dave Watson <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 69216a5 ] The SHA256 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit c643165 ] The SHA512 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 69216a5 ] The SHA256 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit c643165 ] The SHA512 code we adopted from the OpenSSL project uses a rather peculiar way to take the address of the round constant table: it takes the address of the sha256_block_data_order() routine, and substracts a constant known quantity to arrive at the base of the table, which is emitted by the same assembler code right before the routine's entry point. However, recent versions of binutils have helpfully changed the behavior of references emitted via an ADR instruction when running in Thumb2 mode: it now takes the Thumb execution mode bit into account, which is bit 0 af the address. This means the produced table address also has bit 0 set, and so we end up with an address value pointing 1 byte past the start of the table, which results in crashes such as Unable to handle kernel paging request at virtual address bf825000 pgd = 42f44b11 [bf825000] *pgd=80000040206003, *pmd=5f1bd003, *pte=00000000 Internal error: Oops: 207 [#1] PREEMPT SMP THUMB2 Modules linked in: sha256_arm(+) sha1_arm_ce sha1_arm ... CPU: 7 PID: 396 Comm: cryptomgr_test Not tainted 5.0.0-rc6+ #144 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 PC is at sha256_block_data_order+0xaaa/0xb30 [sha256_arm] LR is at __this_module+0x17fd/0xffffe800 [sha256_arm] pc : [<bf820bca>] lr : [<bf824ffd>] psr: 800b0033 sp : ebc8bbe8 ip : faaabe1c fp : 2fdd3433 r10: 4c5f1692 r9 : e43037df r8 : b04b0a5a r7 : c369d722 r6 : 39c3693e r5 : 7a013189 r4 : 1580d26b r3 : 8762a9b0 r2 : eea9c2cd r1 : 3e9ab536 r0 : 1dea4ae7 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user Control: 70c5383d Table: 6b8467c0 DAC: dbadc0de Process cryptomgr_test (pid: 396, stack limit = 0x69e1fe23) Stack: (0xebc8bbe8 to 0xebc8c000) ... unwind: Unknown symbol address bf820bca unwind: Index not found bf820bca Code: 441a ea80 40f9 440a (f85e) 3b04 ---[ end trace e560cce92700ef8a ]--- Given that this affects older kernels as well, in case they are built with a recent toolchain, apply a minimal backportable fix, which is to emit another non-code label at the start of the routine, and reference that instead. (This is similar to the current upstream state of this file in OpenSSL) Signed-off-by: Ard Biesheuvel <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

…ower_limit() [ Upstream commit 117dbeda22ec5ea0918254d03b540ef8b8a64d53 ] There is a global-out-of-bounds reported by KASAN: BUG: KASAN: global-out-of-bounds in _rtl8812ae_eq_n_byte.part.0+0x3d/0x84 [rtl8821ae] Read of size 1 at addr ffffffffa0773c43 by task NetworkManager/411 CPU: 6 PID: 411 Comm: NetworkManager Tainted: G D 6.1.0-rc8+ #144 e15588508517267d37 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), Call Trace: <TASK> ... kasan_report+0xbb/0x1c0 _rtl8812ae_eq_n_byte.part.0+0x3d/0x84 [rtl8821ae] rtl8821ae_phy_bb_config.cold+0x346/0x641 [rtl8821ae] rtl8821ae_hw_init+0x1f5e/0x79b0 [rtl8821ae] ... </TASK> The root cause of the problem is that the comparison order of "prate_section" in _rtl8812ae_phy_set_txpower_limit() is wrong. The _rtl8812ae_eq_n_byte() is used to compare the first n bytes of the two strings from tail to head, which causes the problem. In the _rtl8812ae_phy_set_txpower_limit(), it was originally intended to meet this requirement by carefully designing the comparison order. For example, "pregulation" and "pbandwidth" are compared in order of length from small to large, first is 3 and last is 4. However, the comparison order of "prate_section" dose not obey such order requirement, therefore when "prate_section" is "HT", when comparing from tail to head, it will lead to access out of bounds in _rtl8812ae_eq_n_byte(). As mentioned above, the _rtl8812ae_eq_n_byte() has the same function as strcmp(), so just strcmp() is enough. Fix it by removing _rtl8812ae_eq_n_byte() and use strcmp() barely. Although it can be fixed by adjusting the comparison order of "prate_section", this may cause the value of "rate_section" to not be from 0 to 5. In addition, commit "21e4b0726dc6" not only moved driver from staging to regular tree, but also added setting txpower limit function during the driver config phase, so the problem was introduced by this commit. Fixes: 21e4b07 ("rtlwifi: rtl8821ae: Move driver from staging to regular tree") Signed-off-by: Li Zetao <[email protected]> Acked-by: Ping-Ke Shih <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>

cpaasch added the For Upstreaming label Oct 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce size of skb->cb #144

Reduce size of skb->cb #144

cpaasch commented Oct 23, 2016

faddat commented Nov 26, 2016 •

edited

Loading

cpaasch commented Nov 27, 2016

cpaasch commented Nov 27, 2016

cpaasch commented Nov 27, 2016

doru91 commented Mar 3, 2017

cpaasch commented Mar 3, 2017

BigNerd95 commented Feb 24, 2018

cpaasch commented Feb 24, 2018

Reduce size of skb->cb #144

Reduce size of skb->cb #144

Comments

cpaasch commented Oct 23, 2016

faddat commented Nov 26, 2016 • edited Loading

cpaasch commented Nov 27, 2016

cpaasch commented Nov 27, 2016

cpaasch commented Nov 27, 2016

doru91 commented Mar 3, 2017

cpaasch commented Mar 3, 2017

BigNerd95 commented Feb 24, 2018

cpaasch commented Feb 24, 2018

faddat commented Nov 26, 2016 •

edited

Loading