Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please let me know how to use "ip mptcp end points backup"? #191

Closed
Kenoko opened this issue May 10, 2021 · 19 comments
Closed

Could you please let me know how to use "ip mptcp end points backup"? #191

Kenoko opened this issue May 10, 2021 · 19 comments
Assignees

Comments

@Kenoko
Copy link

Kenoko commented May 10, 2021

I try to make backup-interface, but I cannot do it,
image1
Server and Client OS: Ubuntu21.04(kernel:5.11.0-16-generic)
I confirmed I could send mptcp packets on both paths from client to server. so I want to make an active/standby structure in a next step.
I executed a following command on client side, however the interface didn't become backup mode.The mptcp traffic kept running.
ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup
and I created signal endpoint at server as below.
ip mptcp add endpoint 192.168.40.2 dev enp0s9 signal
The limit setting isip mptcp limits set subflow 1 add_addr_accepted 1 on both client and server.

So Could you please let me know how to make backup interface?
I could make it by using mptcp v0 while I referred a following site, so I want to make it with mptcpv1 also.
http://multipath-tcp.org/pmwiki.php/Users/Tools

@matttbe
Copy link
Member

matttbe commented May 15, 2021

Hello,

The configuration you did seems good. Did you also configure IP rules and routes?
http://multipath-tcp.org/pmwiki.php/Users/ConfigureRouting

Could you share some captures, one for each interface of the client (or server) to see if the behaviour seems correct? I suspect other subflows to be used, not the one you expect.

# tcpdump -i <IFACE> -n "tcp[tcpflags] & tcp-syn != 0"

@matttbe matttbe self-assigned this May 15, 2021
@Kenoko
Copy link
Author

Kenoko commented May 15, 2021

Thanks for your support!
I referred the configuration of the website.
I configured routing on both servers as below. and There is no cross connection between each links on router.

route on client

192.168.10.0/24 dev enp0s8 proto kernel scope link src 192.168.10.2
192.168.20.0/24 dev enp0s9 proto kernel scope link src 192.168.20.2
192.168.30.0/24 via 192.168.10.1 dev enp0s8 proto static
192.168.40.0/24 via 192.168.20.1 dev enp0s9 proto static

routing policy

table 1
192.168.10.0/24 dev enp0s8 scope link
192.168.30.0/24 via 192.168.10.1 dev enp0s8
table 2
192.168.20.0/24 dev enp0s9 scope link
192.168.40.0/24 via 192.168.20.1 dev enp0s9

route on server

192.168.10.0/24 via 192.168.30.1 dev enp0s8 proto static
192.168.20.0/24 via 192.168.40.1 dev enp0s9 proto static
192.168.30.0/24 dev enp0s8 proto kernel scope link src 192.168.30.2
192.168.40.0/24 dev enp0s9 proto kernel scope link src 192.168.40.2

I got a following tcpdump at only enp0s9 on client ,but I couldn't get capture at enp0s8.
The result of tcpdump

12:15:00.291864 IP 192.168.20.2.33673 > 192.168.40.2.12345: Flags [S], seq 2340819369, win 64240, options [mss 1460,sackOK,TS val 861298340 ecr 0,nop,wscale 7,mptcp join id 0 token 0xc39ab12 nonce 0x6e02b9d7], length 0
12:15:00.293999 IP 192.168.40.2.12345 > 192.168.20.2.33673: Flags [S.], seq 3600655604, ack 2340819370, win 65160, options [mss 1460,sackOK,TS val 1744158086 ecr 861298340,nop,wscale 7,mptcp join id 1 hmac 0xd7249e0f982095b8 nonce 0xc1f3a999], length 0

I send a file(200M) from client to server and I set 192.168.30.2 as target IP and I use only this socket(res->ai_family, res->ai_socktype, IPPROTO_MPTCP) in my source code.

Do I need other configuration? or Is there some incorrect configuration?

@Kenoko
Copy link
Author

Kenoko commented May 23, 2021

@matttbe
Sorry for asking you many time.
Could you please let me know how to make backup path?

@VenkateswaranJ
Copy link

@Kenoko

Have you tried executing the command ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup before starting the client & server? I mean based on your question I understand that you are executing the command on the client-side when the application is running?

@Kenoko
Copy link
Author

Kenoko commented May 23, 2021

@VenkateswaranJ
Thank you for your support!
I executed this command before starting the application, however I couldn't make backup path.
I would like to know how to make active/standby structure.

@matttbe
Sorry, I shared incorrect tcpdump results. following is correct. so please check these tcpdump result.
The result of tcpdump

enp0s9 on client

listening on enp0s9, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:35:37.516946 IP 192.168.20.2.57007 > 192.168.40.2.12345: Flags [S], seq 3392821094, win 64240, options [mss 1460,sackOK,TS val 1777751700 ecr 0,nop,wscale 7,mptcp join id 0 token 0x23e6561c nonce 0x2f51d687], length 0
13:35:37.519138 IP 192.168.40.2.12345 > 192.168.20.2.57007: Flags [S.], seq 4154909268, ack 3392821095, win 65160, options [mss 1460,sackOK,TS val 1895143772 ecr 1777751700,nop,wscale 7,mptcp join id 2 hmac 0x1ae0cbb5d05b5681 nonce 0x9bf08b96], length 0

enp0s8 on clinet

listening on enp0s8, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:35:37.510478 IP 192.168.10.2.54874 > 192.168.30.2.12345: Flags [S], seq 4073350759, win 64240, options [mss 1460,sackOK,TS val 632161022 ecr 0,nop,wscale 7,mptcp capable v1], length 0
13:35:37.512400 IP 192.168.30.2.12345 > 192.168.10.2.54874: Flags [S.], seq 3568740971, ack 4073350760, win 65160, options [mss 1460,sackOK,TS val 796316579 ecr 632161022,nop,wscale 7,mptcp capable v1 {0x7928cf9fef53264}], length 0

Do you have any idea to make a backup path?

@matttbe
Copy link
Member

matttbe commented May 26, 2021

Hi @Kenoko

Sorry, I was not available these last days (and I still have a lot of stuff to catch up).

Routes seem good.
From the traces you sent, there is no backup flag with the MP_JOIN on enp0s9. So indeed, there is an issue somewhere, either in the configuration or in the interpretation of the configuration.

I noticed that the command you sent was wrong: ip mptcp add endpoint 192.168.20.2 dev enp0s9 backup
It should be endpoint add and not add endpoint. Is it just a wrong copy-paste or is it really the command you typed?

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 backup

Just to be sure, can you give me the output of this command: ip mptcp endpoint show

If you see the backup flag, can you also try to set the backup flag on the server side as well and take traces again?

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 backup signal

Just to check if there is an issue only with the client or also with server.

@Kenoko
Copy link
Author

Kenoko commented May 27, 2021

@matttbe Thank you for your reply despite your busy schedule.
Sorry I had a wrong copy-paste, and I paste the result ip mptcp endpoint show&ip mptcp limits show on client/server
Clinet

$ip mptcp endpoint show
192.168.20.2 id 1 backup dev enp0s9
$ip mptcp limits show
add_addr_accepted 1 subflows 1

Server

$ip mptcp endpoint show
192.168.40.2 id 1 signal dev enp0s9
$ip mptcp limits show
add_addr_accepted 1 subflows 1

However I couldn't see backup flag.
After this, I set on server side by using your commandsudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 backup signal
Clinet

$ip mptcp endpoint show
192.168.20.2 id 1 backup dev enp0s9
$ip mptcp limits show
add_addr_accepted 1 subflows 1

Server

$ip mptcp endpoint show
192.168.40.2 id 2 signal backup dev enp0s9
$ip mptcp limits show
add_addr_accepted 1 subflows 1

tcp dump result is as below. There is no backup flag also.

enp0s8 on clinet
listening on enp0s8, link-type EN10MB (Ethernet), snapshot length 262144 bytes
04:01:09.748405 IP 192.168.10.2.43522 > 192.168.30.2.12345: Flags [S], seq 1176472289, win 64240, options [mss 1460,sackOK,TS val 279891026 ecr 0,nop,wscale 7,mptcp capable v1], length 0
04:01:09.749948 IP 192.168.30.2.12345 > 192.168.10.2.43522: Flags [S.], seq 2868751690, ack 1176472290, win 65160, options [mss 1460,sackOK,TS val 3243649719 ecr 279891026,nop,wscale 7,mptcp capable v1 {0x4bc1db0df00581b7}], length 0
enp0s9 on server
listening on enp0s9, link-type EN10MB (Ethernet), snapshot length 262144 bytes
04:01:09.753538 IP 192.168.20.2.38653 > 192.168.40.2.12345: Flags [S], seq 1739756007, win 64240, options [mss 1460,sackOK,TS val 642625727 ecr 0,nop,wscale 7,mptcp join id 0 token 0xa03fbfbe nonce 0x3830cef8], length 0
04:01:09.755973 IP 192.168.40.2.12345 > 192.168.20.2.38653: Flags [S.], seq 2759061971, ack 1739756008, win 65160, options [mss 1460,sackOK,TS val 1826250580 ecr 642625727,nop,wscale 7,mptcp join id 2 hmac 0xdbafbbdb84aa8530 nonce 0x4d21616a], length 0

Do I need to change something on configure of my servers?

@matttbe
Copy link
Member

matttbe commented May 27, 2021

Strange, I will try to investigate more.

In theory, we should configure the client with:

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup

(subflow is required)

And the server:

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup

I guess the order of the flags there should not change anything because all flags are sent to the kernel and read when needed but I will investigate.

@matttbe matttbe added the bug label May 27, 2021
@Kenoko
Copy link
Author

Kenoko commented May 28, 2021

@matttbe
Thank you for your checking!
I tried to make backup path on another diagram as below,but I couldn't do it. Could you please check following situation also?
image

I executed sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client as you mentioned.
I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces.
I attached my client/server settings and tcpdump result of each interface.

client_setting.txt
server_setting.txt
enp0s8.log
enp0s9.log

According to "RFC8684",It is written as follows. so I think there is no traffic on enp0s9 while enp0s8 is available.
Could you please check why the traffic go through enp0s9?

Hosts can indicate at initial subflow setup whether they wish the subflow to be used as a regular or backup path -- a backup path only being used if there are no regular paths available.

@matttbe
Copy link
Member

matttbe commented May 28, 2021

@Kenoko when discussing about this bug at the last meeting, we were wondering if this issue has not been fixed in a more recent kernel. We remembered having fixed something around there.

By chance, could you try with a more recent kernel, e.g. 5.12?
e.g. built from https://kernel.ubuntu.com/~kernel-ppa/mainline/

@Kenoko
Copy link
Author

Kenoko commented May 29, 2021

@matttbe

Thank you for your advice and support.I installed following kernels on client/server.

  • 5.12.0-051200-generic
  • 5.13.0-051300rc2-generic
  • 5.12.0 (mptcp_net-next-export-20210430T165402)

I matched the kernel version between client and server and tried to make backup path. However the result was same as before.

the result is as below.

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client
I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces.

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup on server
There is no backup flag

Could you please check this?

@matttbe
Copy link
Member

matttbe commented May 31, 2021

Thank you for having checked that!

sudo ip mptcp endpoint add 192.168.20.2 dev enp0s9 subflow backup on client
I could see 'backup flag' on enp0s9, however the traffic was still running on the interfaces.

That's already better!
If now you see backup in the SYN+MP_JOIN in the packet traces, it means there was an issue in the Path Manager.

But if you see traffic over this interface while non backup path are still active, there is an issue on the packet scheduler side.

sudo ip mptcp endpoint add 192.168.40.2 dev enp0s9 signal backup on server
There is no backup flag

Mmh, maybe a similar issue in the PM is there for the server side if you don't see the backup flag in the SYN+ACK+MP_JOIN.

Could you please check this?

Yes but to be honest with you, it might take some time for me to catch-up my backlog :)

@Kenoko
Copy link
Author

Kenoko commented Jun 1, 2021

@matttbe
Thank you for support!
I will wait for your checking

@matttbe
Copy link
Member

matttbe commented Jun 7, 2021

We quickly discussed about this issue at the last weekly meeting we have and something that should be checked is, when the backup flag is correctly set in the SYN, maybe the traffic we can see on this interface is only "reinjected" traffic. If the non backup link is lossy, we can have a lot of reinjections, see #177.

Is the non backup link lossy in your case? Is the backup link "fully used": is the whole the capacity of the link used?

@Kenoko
Copy link
Author

Kenoko commented Jun 10, 2021

Thank you for your checking.
I checked non backup status, I think the link is normal.
I checked packet capture data on wire shark and there is no re-transmission.

@matttbe
Copy link
Member

matttbe commented Jun 11, 2021

I will need to check this in more details but with a simple setup, I can see a lot of traffic on the backup link:

Setup:

ip netns add ns-a
ip netns add ns-b
ip link add ns-a-eth1 netns ns-a type veth peer name ns-b-eth1 netns ns-b
ip link add ns-a-eth2 netns ns-a type veth peer name ns-b-eth2 netns ns-b
ip -net ns-a link set lo up
ip -net ns-b link set lo up
ip -net ns-a addr add 10.0.1.1/24 dev ns-a-eth1
ip -net ns-b addr add 10.0.1.2/24 dev ns-b-eth1
ip -net ns-a addr add 10.0.2.1/24 dev ns-a-eth2
ip -net ns-b addr add 10.0.2.2/24 dev ns-b-eth2
ip -net ns-a link set ns-a-eth1 up
ip -net ns-a link set ns-a-eth2 up
ip -net ns-b link set ns-b-eth1 up
ip -net ns-b link set ns-b-eth2 up
ip -net ns-a mptcp limits set subflow 1 add_addr_accepted 1
ip -net ns-b mptcp limits set subflow 2 add_addr_accepted 1

tc -net ns-a qdisc add dev ns-a-eth1 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth1 root netem delay 250ms
tc -net ns-a qdisc add dev ns-a-eth2 root netem delay 250ms
tc -net ns-b qdisc add dev ns-b-eth2 root netem delay 250ms

ip -net ns-a mptcp endpoint add 10.0.2.1 dev ns-a-eth2 subflow backup
ip -net ns-b mptcp endpoint add 10.0.2.2 dev ns-b-eth2 signal backup

Utilisation:

# ip netns exec ns-a ifstat -i ns-a-eth1,ns-a-eth2 -btT
  Time        ns-a-eth1           ns-a-eth2             Total       
HH:MM:SS   Kbps in  Kbps out   Kbps in  Kbps out   Kbps in  Kbps out
09:16:48      0.00      0.61      0.00      0.00      0.00      0.61
09:16:49    116.15    337.01      0.00      1.34    116.15    338.35
09:16:50    683.99    967.09      1.73    107.44    685.72   1074.52
09:16:51   2714.18   1975.25     10.31    576.54   2724.48   2551.79
09:16:52  10978.68   4814.72     20.62    895.00  10999.29   5709.72
09:16:53  36498.11  19653.30     59.47   4224.46  36557.59  23877.75
09:16:54  40616.52  22022.35    121.43   5551.44  40737.94  27573.79
09:16:55  44300.90  35689.21    186.82   5980.72  44487.72  41669.92
09:16:56  44584.53  30997.82    151.15   5922.61  44735.68  36920.43
09:16:57  38932.46  31158.00    173.56  11743.47  39106.01  42901.47
09:16:58  35989.38  25915.49    259.24  10975.08  36248.62  36890.57
09:16:59  33078.95  33028.99    112.96  11497.24  33191.91  44526.24
09:17:00  33867.00  18244.52    181.52  21597.89  34048.52  39842.41
09:17:01  35889.53  20784.96    207.31  23876.98  36096.84  44661.94
09:17:02  36768.68  20588.43    239.28  24136.13  37007.96  44724.56
09:17:03  37675.66  21021.42    239.75  23808.62  37915.41  44830.04
09:17:04  37830.23  20818.03    237.37  23169.54  38067.60  43987.56
09:17:05  38037.17  20426.63    248.11  23715.60  38285.28  44142.23
09:17:06  37943.80  19045.66    249.33  25455.02  38193.13  44500.68
09:17:07  38244.62  14372.56    259.54  25212.54  38504.16  39585.10
09:17:08  38329.65  18481.79    260.33  24892.52  38589.99  43374.31
09:17:09  38264.39  18791.64    257.08  25590.48  38521.46  44382.12
09:17:10  37954.59  18695.06    259.22  25610.29  38213.80  44305.35
09:17:11  40835.93  18996.24    258.41  25593.15  41094.35  44589.39
09:17:12  38571.42  18830.26    246.54  25674.13  38817.96  44504.39
09:17:13  38653.77   9271.67    207.09   7858.68  38860.86  17130.35
09:17:14  24539.33    395.14      1.34      1.22  24540.67    396.36
09:17:15      0.54      0.00      0.61      0.61      1.15      0.61
09:17:16      0.00      0.00      0.33      0.00      0.33      0.00

Still, it looks like the backup link is only used to upload data of my bi-directional flow.

I checked packet capture data on wire shark and there is no re-transmission.

Be careful that here, we don't talk about TCP re-transmission but MPTCP re-injection: packets that were sent in one subflow but reinjected to another one.

@pabeni
Copy link

pabeni commented Jun 16, 2021

I will need to check this in more details but with a simple setup, I can see a lot of traffic on the backup link:
[...]
Utilisation:

# ip netns exec ns-a ifstat -i ns-a-eth1,ns-a-eth2 -btT
  Time        ns-a-eth1           ns-a-eth2             Total       
HH:MM:SS   Kbps in  Kbps out   Kbps in  Kbps out   Kbps in  Kbps out
09:16:48      0.00      0.61      0.00      0.00      0.00      0.61
09:16:49    116.15    337.01      0.00      1.34    116.15    338.35
09:16:50    683.99    967.09      1.73    107.44    685.72   1074.52
09:16:51   2714.18   1975.25     10.31    576.54   2724.48   2551.79
09:16:52  10978.68   4814.72     20.62    895.00  10999.29   5709.72
09:16:53  36498.11  19653.30     59.47   4224.46  36557.59  23877.75

[...]


Still, it looks like the backup link is only used to upload data of my bi-directional flow.

The problem is that when the 'backup' flag is set, the flag is announced to the peer (that is: enforced in the opposite direction), but not set in the forward/egress direction. That is actually compliant with the RFC - the backup flag is a per direction feature - but very confusing.

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

A possibly complete solution would be to additionally introduce PM logic inside the kernel to automatically close a subflow making no forward progresses and triggerring too much re-injection when other subflows are available, as per paragraph 3.3.6. in the RFC.
Such behavior should be tunable and the user-space PM should be able to turn it off.

@matttbe
Copy link
Member

matttbe commented Jun 16, 2021

The problem is that when the 'backup' flag is set, the flag is announced to the peer (that is: enforced in the opposite direction), but not set in the forward/egress direction. That is actually compliant with the RFC - the backup flag is a per direction feature - but very confusing.

If we tell the PM an endpoint has to be considered as backup, should it eventually ensure that the packet scheduler is also taking this into consideration even if the other peer didn't set the backup flag?

It is true that the backup flag is there to tell the other peer a path should be used as a backup one and it should use it only if there is no other non backup path available (still, not a MUST). But typically, the client knows a path is more costly and might mark it as backup for the other peer but also for itself, not to use it if other paths are available, no? :)

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

I would need to check again but I think in my case, the backup flag was set on both the SYN and SYN+ACK.

@pabeni
Copy link

pabeni commented Jun 16, 2021

There is a simple fix for this - setting the backup flag in both direction - but it will break active backup (when no user-space PM is in action).

I would need to check again but I think in my case, the backup flag was set on both the SYN and SYN+ACK.

Right, the server is currently mirroring back the backup flag from ingress syn MPJ packet into the SYN-ACK reply.

But the client code is ignoring such bit - which is a bug, to be fixed.

The discussion about active backup still applies - when fixing such bug we additionally need a smarter in kernel PM.

fengguang pushed a commit to 0day-ci/linux that referenced this issue Jul 13, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: multipath-tcp/mptcp_net-next#191
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Jul 16, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Jul 17, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Jul 18, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Jul 28, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Jul 28, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Jul 28, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Jul 29, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Jul 30, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Jul 31, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 2, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 2, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 3, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 3, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 4, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 4, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 4, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 5, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 5, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 6, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 7, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 8, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 9, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 10, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 11, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 12, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
matttbe pushed a commit that referenced this issue Aug 12, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 13, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Aug 13, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: multipath-tcp/mptcp_net-next#191
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
jenkins-tessares pushed a commit that referenced this issue Aug 14, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: #191
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
fengguang pushed a commit to 0day-ci/linux that referenced this issue Aug 14, 2021
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.

Closes: multipath-tcp/mptcp_net-next#191
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
matttbe pushed a commit that referenced this issue Nov 24, 2023
Add test to validate BPF verifier's register range bounds tracking logic.

The main bulk is a lot of auto-generated tests based on a small set of
seed values for lower and upper 32 bits of full 64-bit values.
Currently we validate only range vs const comparisons, but the idea is
to start validating range over range comparisons in subsequent patch set.

When setting up initial register ranges we treat registers as one of
u64/s64/u32/s32 numeric types, and then independently perform conditional
comparisons based on a potentially different u64/s64/u32/s32 types. This
tests lots of tricky cases of deriving bounds information across
different numeric domains.

Given there are lots of auto-generated cases, we guard them behind
SLOW_TESTS=1 envvar requirement, and skip them altogether otherwise.
With current full set of upper/lower seed value, all supported
comparison operators and all the combinations of u64/s64/u32/s32 number
domains, we get about 7.7 million tests, which run in about 35 minutes
on my local qemu instance without parallelization. But we also split
those tests by init/cond numeric types, which allows to rely on
test_progs's parallelization of tests with `-j` option, getting run time
down to about 5 minutes on 8 cores. It's still something that shouldn't
be run during normal test_progs run.  But we can run it a reasonable
time, and so perhaps a nightly CI test run (once we have it) would be
a good option for this.

We also add a small set of tricky conditions that came up during
development and triggered various bugs or corner cases in either
selftest's reimplementation of range bounds logic or in verifier's logic
itself. These are fast enough to be run as part of normal test_progs
test run and are great for a quick sanity checking.

Let's take a look at test output to understand what's going on:

  $ sudo ./test_progs -t reg_bounds_crafted
  #191/1   reg_bounds_crafted/(u64)[0; 0xffffffff] (u64)< 0:OK
  ...
  #191/115 reg_bounds_crafted/(u64)[0; 0x17fffffff] (s32)< 0:OK
  ...
  #191/137 reg_bounds_crafted/(u64)[0xffffffff; 0x100000000] (u64)== 0:OK

Each test case is uniquely and fully described by this generated string.
E.g.: "(u64)[0; 0x17fffffff] (s32)< 0". This means that we
initialize a register (R6) in such a way that verifier knows that it can
have a value in [(u64)0; (u64)0x17fffffff] range. Another
register (R7) is also set up as u64, but this time a constant (zero in
this case). They then are compared using 32-bit signed < operation.
Resulting TRUE/FALSE branches are evaluated (including cases where it's
known that one of the branches will never be taken, in which case we
validate that verifier also determines this as a dead code). Test
validates that verifier's final register state matches expected state
based on selftest's own reg_state logic, implemented from scratch for
cross-checking purposes.

These test names can be conveniently used for further debugging, and if -vv
verboseness is requested we can get a corresponding verifier log (with
mark_precise logs filtered out as irrelevant and distracting). Example below is
slightly redacted for brevity, omitting irrelevant register output in
some places, marked with [...].

  $ sudo ./test_progs -a 'reg_bounds_crafted/(u32)[0; U32_MAX] (s32)< -1' -vv
  ...
  VERIFIER LOG:
  ========================
  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (05) goto pc+2
  3: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
  4: (bc) w6 = w0                       ; R0_w=scalar() R6_w=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
  5: (85) call bpf_get_current_pid_tgid#14      ; R0_w=scalar()
  6: (bc) w7 = w0                       ; R0_w=scalar() R7_w=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
  7: (b4) w1 = 0                        ; R1_w=0
  8: (b4) w2 = -1                       ; R2=4294967295
  9: (ae) if w6 < w1 goto pc-9
  9: R1=0 R6=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
  10: (2e) if w6 > w2 goto pc-10
  10: R2=4294967295 R6=scalar(smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
  11: (b4) w1 = -1                      ; R1_w=4294967295
  12: (b4) w2 = -1                      ; R2_w=4294967295
  13: (ae) if w7 < w1 goto pc-13        ; R1_w=4294967295 R7=4294967295
  14: (2e) if w7 > w2 goto pc-14
  14: R2_w=4294967295 R7=4294967295
  15: (bc) w0 = w6                      ; [...] R6=scalar(id=1,smin=0,smax=umax=4294967295,var_off=(0x0; 0xffffffff))
  16: (bc) w0 = w7                      ; [...] R7=4294967295
  17: (ce) if w6 s< w7 goto pc+3        ; R6=scalar(id=1,smin=0,smax=umax=4294967295,smin32=-1,var_off=(0x0; 0xffffffff)) R7=4294967295
  18: (bc) w0 = w6                      ; [...] R6=scalar(id=1,smin=0,smax=umax=4294967295,smin32=-1,var_off=(0x0; 0xffffffff))
  19: (bc) w0 = w7                      ; [...] R7=4294967295
  20: (95) exit

  from 17 to 21: [...]
  21: (bc) w0 = w6                      ; [...] R6=scalar(id=1,smin=umin=umin32=2147483648,smax=umax=umax32=4294967294,smax32=-2,var_off=(0x80000000; 0x7fffffff))
  22: (bc) w0 = w7                      ; [...] R7=4294967295
  23: (95) exit

  from 13 to 1: [...]
  1: [...]
  1: (b7) r0 = 0                        ; R0_w=0
  2: (95) exit
  processed 24 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1
  =====================

Verifier log above is for `(u32)[0; U32_MAX] (s32)< -1` use cases, where u32
range is used for initialization, followed by signed < operator. Note
how we use w6/w7 in this case for register initialization (it would be
R6/R7 for 64-bit types) and then `if w6 s< w7` for comparison at
instruction #17. It will be `if R6 < R7` for 64-bit unsigned comparison.
Above example gives a good impression of the overall structure of a BPF
programs generated for reg_bounds tests.

In the future, this "framework" can be extended to test not just
conditional jumps, but also arithmetic operations. Adding randomized
testing is another possibility.

Some implementation notes. We basically have our own generics-like
operations on numbers, where all the numbers are stored in u64, but how
they are interpreted is passed as runtime argument enum num_t. Further,
`struct range` represents a bounds range, and those are collected
together into a minimal `struct reg_state`, which collects range bounds
across all four numberical domains: u64, s64, u32, s64.

Based on these primitives and `enum op` representing possible
conditional operation (<, <=, >, >=, ==, !=), there is a set of generic
helpers to perform "range arithmetics", which is used to maintain struct
reg_state. We simulate what verifier will do for reg bounds of R6 and R7
registers using these range and reg_state primitives. Simulated
information is used to determine branch taken conclusion and expected
exact register state across all four number domains.

Implementation of "range arithmetics" is more generic than what verifier
is currently performing: it allows range over range comparisons and
adjustments. This is the intended end goal of this patch set overall and verifier
logic is enhanced in subsequent patches in this series to handle range
vs range operations, at which point selftests are extended to validate
these conditions as well. For now it's range vs const cases only.

Note that tests are split into multiple groups by their numeric types
for initialization of ranges and for comparison operation. This allows
to use test_progs's -j parallelization to speed up tests, as we now have
16 groups of parallel running tests. Overall reduction of running time
that allows is pretty good, we go down from more than 30 minutes to
slightly less than 5 minutes running time.

Acked-by: Eduard Zingerman <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants