Skip to content

Commit

Permalink
bpf:examples: update mptcp_set_mark_kern.c
Browse files Browse the repository at this point in the history
This example is extented to illustrate more advanced usage of subflows
handling :

- use bpf_mptcp_sock to identify the parent MPTCP connection.
- set tcp cc vegas algorithm only on the first subflow of a MPTCP connection.

The mark is still updated to allow further filtering (e.g. at firewall
level).

The file is also renamed to describe more precisely its new content.

This patch also adds a bash script launching all the approriate commands
to run the example.

Signed-off-by: Nicolas Rybowski <[email protected]>
  • Loading branch information
nrybowski committed Aug 20, 2020
1 parent 1046743 commit 4d12018
Show file tree
Hide file tree
Showing 5 changed files with 215 additions and 97 deletions.
8 changes: 4 additions & 4 deletions bpf/examples/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ LOADER_FLAGS += -DHAVE_ATTR_TEST=0

BPF_FLAGS := -O2 -target bpf -g

all: loader mptcp_set_mark_kern.o
all: loader mptcp_set_sf_sockopt_kern.o

loader:
@clang $(CFLAGS) $(LOADER_FLAGS) -o loader loader.c

mptcp_set_mark_kern.o:
@clang $(BPF_FLAGS) -c mptcp_set_mark_kern.c \
$(CFLAGS) -o mptcp_set_mark_kern.o
mptcp_set_sf_sockopt_kern.o:
@clang $(BPF_FLAGS) -c mptcp_set_sf_sockopt_kern.c \
$(CFLAGS) -o mptcp_set_sf_sockopt_kern.o

clean:
@rm *_kern.o loader
63 changes: 47 additions & 16 deletions bpf/examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,44 @@ This directory contains some examples of possible applications of eBPF programs

## Prerequisite
- The [`use_mptcp`](ihttps://github.com/pabeni/mptcp-tools) tool is required in the current folder in order to run the examples.
- The [`mptcp_net-next`](https://github.com/multipath-tcp/mptcp_net-next) kernel is supposed to be accessible at `${PWD}/../../mptcp_net-next`.
- The [`mptcp_net-next`](https://github.com/multipath-tcp/mptcp_net-next) kernel is supposed to be accessible at `${PWD}/../../../mptcp_net-next`.
- Nftables>=v0.9.2 and python3 are required.

## How to use

### `mptcp_set_mark_*`
It is required that the examples have been compiled, see the [Makefile](Makefile).

Multiple shells are required to run the experiments and to observe the results.
### `mptcp_set_sf_sockopt_*`

Once the C programs are compiled :
This example shows how it is possible to :
- Identify the parent msk of an MPTCP subflow.
- Put different `sockopt` for each subflow of a same MPTCP connection.

- Shell 1 : `./env.sh -c -m -l mptcp_set_mark_user [-D]` to setup the testing environment. Optional `-D` argument is used to show debug messages.
Here especially, we implemented two different behaviours :
- A socket mark (`SOL_SOCKET` `SO_MARK`) is put on each subflow of a same MPTCP connection. The order of creation of the current subflow defines its mark.
- The TCP CC algorithm of the very first subflow of an MPTCP connection is set to `vegas`.

- Shell 1 : `ip netns exec ns_client nft -f client.rules` to install the output filters on the client side.
A bash script is provided to run this example : `./mptcp_set_sf_sockopt_test.sh | less`. As it is quite verbose, it is recommended to pipe its output in `less`.

- Shell 1 : `ip netns exec ns_client tcpdump -ni any`. Optional to observe the correct subflows creation. Here is an expected output :
Here are the expected output informations :
- During the exchange between the client and the server, `ss` process are launched at regular interval on both sides to observe the evolution of the `sockopt`.
> Currently this part is not fully functional.
>
> It is required to slow down the connection by applying some delay (>100ms) on the server-side interfaces in order to use `ss`.
> But when delay (>50ms) is applied, the client-side sends reset on subflows creation.
- Once the data exchange between the client and the server ended, the script will output :
- The client-side `tcpdump` trace of the connection
- The client-side output filter
- The client-side BPF debug trace

Here is a sample of the expected output :

```
[...]
[INFO] Client-side tcpdump log :
14:02:25.150756 IP 10.0.1.1.54116 > 10.0.1.2.8000: Flags [S], seq 655299911, win 64240, options [mss 1460,sackOK,TS val 4288420401 ecr 0,nop,wscale 7,mptcp capable[bad opt]>
14:02:25.150838 IP 10.0.1.2.8000 > 10.0.1.1.54116: Flags [S.], seq 384090692, ack 655299912, win 65160, options [mss 1460,sackOK,TS val 4049039081 ecr 4288420401,nop,wscale 7,mptcp capable Unknown Version (1)], length 0
14:02:25.150893 IP 10.0.1.1.54116 > 10.0.1.2.8000: Flags [.], ack 1, win 502, options [nop,nop,TS val 4288420401 ecr 4049039081,mptcp capable Unknown Version (1)], length 0
Expand All @@ -36,15 +56,10 @@ id 0 hmac 0x994d937f81fb1a76 nonce 0xcaa06001], length 0
14:02:25.165128 IP 10.0.3.1.60041 > 10.0.1.2.8000: Flags [S], seq 2982449334, win 64240, options [mss 1460,sackOK,TS val 4266969079 ecr 0,nop,wscale 7,mptcp join backup id 2 token 0x153035fd nonce 0x1f382df], length 0
14:02:25.165150 IP 10.0.1.2.8000 > 10.0.3.1.60041: Flags [S.], seq 3431902562, ack 2982449335, win 65160, options [mss 1460,sackOK,TS val 3815681013 ecr 4266969079,nop,wscale 7,mptcp join backup id 0 hmac 0x6fb7c4ebd45363f nonce 0x4f412e69], length 0
14:02:25.165162 IP 10.0.3.1.60041 > 10.0.1.2.8000: Flags [.], ack 1, win 502, options [nop,nop,TS val 4266969080 ecr 3815681013,mptcp join hmac 0x63dadd4a5446f6d6e0633cc43549ce5c1c5ad545], length 0
```
- Shell 2 : `echo $$ >> /tmp/cgroup2/client/cgroup.procs` to register the current shell and all its child processes in the client cgroup.
[...]
- Shell 2 : `ip netns exec ns_client mptcp-tools/use_mptcp/use_mptcp.sh curl 10.0.1.2:8000 -o /dev/null`

- Shell 1 : `ip netns exec ns_client nft list ruleset` to observe the result. Here is an expected output :

```
[INFO] Client-side output filters :
table inet filter {
chain output {
type filter hook output priority filter; policy accept;
Expand All @@ -56,6 +71,22 @@ table inet filter {
tcp dport 8000 socket mark 0x00000000 counter packets 0 bytes 0
}
}
```
- Shell 1 : `./env.sh --clean` to cleanup. It will kill Shell 2 due to its membership to the client cgroup.
[INFO] Client-side bpf trace :
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3 #P:1
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
curl-11297 [000] d... 18011.270069: bpf_trace_printk: Mark <1> : return code <0>
kworker/0:2-26775 [000] d... 18011.271071: bpf_trace_printk: Mark <2> : return code <0>
kworker/0:2-26775 [000] d... 18011.271142: bpf_trace_printk: Mark <3> : return code <0>
```
77 changes: 0 additions & 77 deletions bpf/examples/mptcp_set_mark_kern.c

This file was deleted.

91 changes: 91 additions & 0 deletions bpf/examples/mptcp_set_sf_sockopt_kern.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#include <asm/socket.h> // SOL_SOCKET, SO_MARK, ...
#include <linux/tcp.h> // TCP_CONGESTION
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

char _license[] SEC("license") = "GPL";

#ifndef SOL_TCP
#define SOL_TCP 6
#endif

#ifndef TCP_CA_NAME_MAX
#define TCP_CA_NAME_MAX 16
#endif

char cc [TCP_CA_NAME_MAX] = "vegas";

/* Associate a subflow counter to each token */
struct bpf_map_def SEC("maps") mptcp_sf = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(__u32),
.value_size = sizeof(__u32),
.max_entries = 100
};

#define DEBUG 1

#ifdef DEBUG
char fmt1[] = "Mark <%u> : return code <%i>\n";
char fmt2[] = "Failed to get bpf_sock\n";
char fmt3[] = "Failed to get bpf_mptcp_sock\n";
char fmt4[] = "Failed to update sockopt\n";

#define pr_debug(msg, ...) bpf_trace_printk(msg, sizeof(msg), ##__VA_ARGS__);

#else

#define pr_debug(msg, ...)

#endif

SEC("sockops")
int mark_mptcp_sf(struct bpf_sock_ops *skops)
{
__u32 init = 1, key, mark, *cnt;
int err;

if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB)
goto out;

struct bpf_sock *sk = skops->sk;
if (!sk) {
pr_debug(fmt2);
goto out;
}

struct bpf_mptcp_sock *msk = bpf_mptcp_sock(sk);
if (!msk) {
pr_debug(fmt3);
goto out;
}

key = msk->token;
cnt = bpf_map_lookup_elem(&mptcp_sf, &key);

if (cnt) {
/* A new subflow is added to an existing MPTCP connection */
__sync_fetch_and_add(cnt, 1);
mark = *cnt;
} else {
/* A new MPTCP connection is just initiated and this is its primary
* subflow
*/
bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY);
mark = init;
}

/* Set the mark of the subflow's socket to its apparition order */
err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark));
pr_debug(fmt1, mark, err);

if (mark == 1)
err = err ?: bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc,
TCP_CA_NAME_MAX);

if (err < 0)
pr_debug(fmt4);

out:
return 0;
}
73 changes: 73 additions & 0 deletions bpf/examples/mptcp_set_sf_sockopt_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#! /bin/bash

TRACEFS="/sys/kernel/debug/tracing"
BPF_OBJECT="mptcp_set_sf_sockopt_kern.o"
USE_MPTCP="mptcp-tools/use_mptcp/use_mptcp.sh"
NS_EXEC="ip netns exec"
NS_CLIENT_EXEC="${NS_EXEC} ns_client"
CLIENT_PROCS="/tmp/cgroup2/client/cgroup.procs"
TCPDUMP_DUMP="/tmp/tcpdump.log"

info () {
echo -e "\n[INFO] ${*}"
}

show () {
while [ 1 ]
do
${NS_EXEC} $1 ss -bit --cgroup
sleep 0.25
done
}

# clean previous trace
echo > "${TRACEFS}/trace"

# setup testing env and load BPF program on client side
./env.sh --clean -c -m -B "${BPF_OBJECT}"

# wait for end of setup
sleep 5

# load output filtering rules on client side
${NS_CLIENT_EXEC} nft -f client.rules

# show server socket status
show ns_server &
SPID="${!}"

# register current process to the client cgroup
echo $$ >> "${CLIENT_PROCS}"

# show client socket status
show ns_client &
CPID="${!}"

# launch tcpdump on client side
${NS_CLIENT_EXEC} tcpdump -Uni any -w "${TCPDUMP_DUMP}" tcp &
TPID="${!}"

# wait for tcpdump launch
sleep 5

# querying server
${NS_CLIENT_EXEC} "${USE_MPTCP}" curl 10.0.1.2:8000 -o /dev/null &> /dev/null

# unregister current process from the client cgroup
echo 0 >> "${CLIENT_PROCS}"

# kill ss wrappers and tcpdump
kill "${CPID}" "${SPID}" &> /dev/null
sleep 5
kill "${TPID}" &> /dev/null

info "Client-side tcpdump log :"
tcpdump -r "${TCPDUMP_DUMP}"

# show output filtering result
info "Client-side output filters :"
${NS_CLIENT_EXEC} nft list ruleset

# show current trace
info "Client-side bpf trace :"
cat "${TRACEFS}/trace"

0 comments on commit 4d12018

Please sign in to comment.