Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign network events and capture features #2152

Closed
rafaeldtinoco opened this issue Sep 6, 2022 · 0 comments · Fixed by #2200 or #2569
Closed

Redesign network events and capture features #2152

rafaeldtinoco opened this issue Sep 6, 2022 · 0 comments · Fixed by #2200 or #2569

Comments

@rafaeldtinoco
Copy link
Contributor

rafaeldtinoco commented Sep 6, 2022

NOTE: this issue might traverse couple of milestones (until everything is implemented)

Redesign network events and capture features

This issue is about redesigning the old network model, based on tc hooks and defining a specific interface to be monitored, into cgroup ebpf program(s) that are capable of generating network packets (including for different layers and protocols). This issue also defines some related posterior (to needed changes) work.

The new network code requirements are:

  • be capable of relating egress/ingress packets to specific host tasks
  • not to rely on specific network interfaces for attachments
  • be close to processes (after mangling and translations)
  • allow near future flow enforcements through eBPF
  • work in older kernels (not only in recent ones)

Technicalities

Covered Issues:

  1. Remove kprobes used to keep context among networking flows
    network: remove old network code based on tc hooks #2569
  2. Improve performance and better skb programs in newer kernels
    Network events and capturing performance studies and optimizations #2578
  3. To have a single pcap file by default for --capture network
    To have a single pcap file by default for --capture network #2587

Related Bugs:

  1. add MPTCP support
    Add MPTCP support  #2068

Finished Work (old Issues model, one issue per item)

  1. Recalculate the socket's task context
    Recalculate scope if needed in cgroup network programs #2221
  2. Entire clsact qdisc is purged when tc hook is destroyed
    Entire clsact qdisc is purged when tc hook is destroyed #1828
  3. network events should also be enriched
    Network events should also be enriched #1922
  4. container folder is not getting created under
    Container folder is not getting created under /tmp/tracee  #2126
  5. pcap capturing options
    Don't require specifying a device for network pcap #2096
    1 Split network eBPF logic into multiple files
    ebpf: split code #2545

Done by others

  1. HTTP request/response events
    Add HTTP request/response events #1385
  2. add e2e test for http events
    add e2e test for http events #2574
@rafaeldtinoco rafaeldtinoco added this to the v0.9.0-rc1 milestone Sep 6, 2022
@rafaeldtinoco rafaeldtinoco self-assigned this Sep 6, 2022
@rafaeldtinoco rafaeldtinoco mentioned this issue Sep 13, 2022
22 tasks
@rafaeldtinoco rafaeldtinoco linked a pull request Sep 13, 2022 that will close this issue
22 tasks
@yanivagman yanivagman removed a link to a pull request Sep 15, 2022
22 tasks
@rafaeldtinoco rafaeldtinoco linked a pull request Sep 19, 2022 that will close this issue
@rafaeldtinoco rafaeldtinoco linked a pull request Sep 28, 2022 that will close this issue
33 tasks
@yanivagman yanivagman modified the milestones: v0.9.0-rc1, v.0.10.0 Oct 26, 2022
rafaeldtinoco added a commit that referenced this issue Dec 1, 2022
rafaeldtinoco added a commit that referenced this issue Dec 1, 2022
- preparation for the new networking code.

Related: #2152
rafaeldtinoco added a commit that referenced this issue Dec 1, 2022
There are multiple ways to follow ingress/egress for a task. One way is
to try to track all flows within network interfaces and keep a map of
addresses tuples and translations. OR, sk_storage and socket cookies
might help in understanding which sock/sk_buff context the bpf program
is dealing with but, at the end, the need is always to tie a flow to a
task (specially when hooking ingress skb bpf programs, when the current
task is a kernel thread most of the times).

Unfortunately that gets even more complicated in older kernels: the
cgroup skb programs have almost no bpf helpers to use, and most of
common code causes verifier to fail. With that in mind, this approach
uses a technique of kprobing the function responsible for calling the
cgroup/skb programs.

All the work, that should be done by the cgroup/skb programs in the
common case, in this case is done by this kprobe/kretprobe hook logic
(right before and right after the cgroup/skb program runs). By doing
that, all the data that cgroup/skb programs need to use is already
placed in a map.

Obviously this has some cons: this kprobe->cgroup/skb->kretprobe
execution flow does not have preemption disabled, so the map used in
between the 3 hooks need to use something that is available to all 3 of
them.

At the end, the logic is simple: every time a socket is created an inode
is also created. The task owning the socket is indexed by the socket
inode so everytime this socket is used we know which task it belongs to
(specially during ingress hook).

Related: #2152
rafaeldtinoco added a commit that referenced this issue Dec 1, 2022
1. Add support (event) for the following network protocols:

- IPV4
- IPv6
- TCP
- UDP
- ICMP
- ICMPv6
- DNS

2. Create a fake net_packet_base event so all needed probes and
   capabilities, for network events, are set in a single place.

3. Network cgroup probes should only be attached when events are
   selected.

With the following added events:

- NetPacketIPBase
- NetPacketIPv4
- NetPacketIPv6
- NetPacketTCPBase
- NetPacketTCP
- NetPacketUDPBase
- NetPacketUDP
- NetPacketICMPBase
- NetPacketICMP
- NetPacketICMPv6Base
- NetPacketICMP
- NetPacketDNSBase
- NetPacketDNS

NOTE:
    All the "Base" events are raw network events (with a single
    argument being the packet payload) and internal only. They
    are used for deriving subsequent events depending on the base
    event. Example: A "NetPacketUDPBase" event has a "payload"
    argument, which is the "IP+UDP" header in "bytes". It will be
    derived into a "NetPacketUDP" event, with appropriate type
    (that can be used by signatures and so on).

Related: #2152
rafaeldtinoco added a commit that referenced this issue Dec 1, 2022
Instead of relying in ingress/egress flows containing a process context
in order to obtain new sockets that are already connected, simply probe:

- security_socket_sendmsg
- security_socket_recvmsg

This has also the advantage of always updating the inodemap (the
socket <=> task context map) with new task context if it has recomputed
its scope during should_trace()).

Other socket operations functions might be added in the future, in order
to create new entries in the inodemap (see "socket_file_ops" struct), if
needed.

Related: #2152
@rafaeldtinoco rafaeldtinoco reopened this Dec 1, 2022
rafaeldtinoco added a commit that referenced this issue Dec 6, 2022
- Derive net_packet_dns_request from net_packet_dns
- Derive net_packet_dns_response from net_packet_dns

Both events, for now, keep the same arguments as the old existing DNS
events, that should be deprecated any time soon. Idea is that signatures
already relying in those events might use these new ones without big
changes (just the event type).

In a near future we might change both events arguments to something
better aligned with "net_packet_dns" types.

Related: #2152
@rafaeldtinoco rafaeldtinoco modified the milestones: v0.10.0, v0.11.0 Dec 8, 2022
@yanivagman yanivagman changed the title [FEAT] redesign network events and capture features Redesign network events and capture features Jan 2, 2023
rafaeldtinoco added a commit that referenced this issue Jan 12, 2023
The new cgroup based pcap capture code is able to capture ANY
TCP, UDP or ICMP packet for both IPv4 and IPv6 protocols from
any physical (L1) and link (L2) layers.

Differently than the current existing pcap capture code, this
new approach allows tracee to capture packets from any process
on the host or in a container, independently of the interfaces
they are using to communicate. There is no need to bind a tc
hook program to a specific interface now.

$ tracee-ebpf --capture network

There is also a mid layer code responsible for managing opened
pcap files using a LRU cache (so tracee has a limit on amount
of opened files being used).

Another feature is the capability of selecting one or multiple
pcap file types to have captured:

- per process   (processes/{host,container_id}/process_TID_TS.pcap)
- per container (containers/container_id.pcap)
- per command   (commands/{host,container_id}/command.pcap)

$ tracee-ebpf --capture network \
              --capture netpcap:process,container,command

NOTE: This allows the previous tc hook based network code to
      be retired and, with it, the need for specifying an
      interface to be hooked to.

Fixes: #2126
Fixes: #2096
Related: #2152
geyslan pushed a commit to geyslan/tracee that referenced this issue Jan 13, 2023
Userland code removal:

1. NetPacket event (now net_packet_xxx events)
2. DnsRequest, DnsResponse events (now net_packet_dns_xxx)
3. capture_pcap event (now Pcaps pkg)
4. NetPacket derivation (now net_packet_xxx derivations)
5. convertArgMonotonicToEpochTime (no need now)

> Replacing old network logic and events with the new one.
> New logic does not use tc hooks so no interface needs to be specified.

Userland files removal:

1. net_decoder
2. net_proto
3. net_proto_handlers

> net_capture is a parallel pipeline receiving network packets events.
> net_capture now uses the same decodeEvents as the main pipeline.
> net_capture processes received network packets using gopacket.

Userland code actions:

1. move net_packet events to network event id ranges
2. split net_packet events in userland and kernel ranges

> Needed to split network events from the rest.

BPF maps removal:

1. sock_ctx_map
2. network_map

> These maps were responsible for saving network context between
> kprobes, tracepoints and the tc hooks.

BPF probes/tracepoints removal:

 1. udp_sendmsg (kprobe)
 2. __udp_disconnect (kprobe)
 3. udp_destroy_sock (kprobe)
 4. udpv6_destroy_sock (kprobe)
 5. inet_sock_set_state (tracepoint)
 6. tcp_connect (kprobe)
 7. icmp_send (kprobe)
 8. icmp6_send (kprobe)
 9. icmp_recv (kprobe)
10. icmpv6_recv (kprobe)
11. ping_v4_sendmsg (kprobe)
12. ping_v6_sendmsg (kprobe)

> These probes were responsible for updating network maps to work
> in conjunction with tc hooks.

BPF functions removal:

1. net_map_update_or_delete_sock
2. icmp_delete_network_map

> Helper BPF functions

Fixes: aquasecurity#1828
Fixes: aquasecurity#2152
Fixes: aquasecurity#1922
Fixes: aquasecurity#2096
Fixes: aquasecurity#2126
geyslan pushed a commit to geyslan/tracee that referenced this issue Jan 15, 2023
Userland code removal:

1. NetPacket event (now net_packet_xxx events)
2. DnsRequest, DnsResponse events (now net_packet_dns_xxx)
3. capture_pcap event (now Pcaps pkg)
4. NetPacket derivation (now net_packet_xxx derivations)
5. convertArgMonotonicToEpochTime (no need now)

> Replacing old network logic and events with the new one.
> New logic does not use tc hooks so no interface needs to be specified.

Userland files removal:

1. net_decoder
2. net_proto
3. net_proto_handlers

> net_capture is a parallel pipeline receiving network packets events.
> net_capture now uses the same decodeEvents as the main pipeline.
> net_capture processes received network packets using gopacket.

Userland code actions:

1. move net_packet events to network event id ranges
2. split net_packet events in userland and kernel ranges

> Needed to split network events from the rest.

BPF maps removal:

1. sock_ctx_map
2. network_map

> These maps were responsible for saving network context between
> kprobes, tracepoints and the tc hooks.

BPF probes/tracepoints removal:

 1. udp_sendmsg (kprobe)
 2. __udp_disconnect (kprobe)
 3. udp_destroy_sock (kprobe)
 4. udpv6_destroy_sock (kprobe)
 5. inet_sock_set_state (tracepoint)
 6. tcp_connect (kprobe)
 7. icmp_send (kprobe)
 8. icmp6_send (kprobe)
 9. icmp_recv (kprobe)
10. icmpv6_recv (kprobe)
11. ping_v4_sendmsg (kprobe)
12. ping_v6_sendmsg (kprobe)

> These probes were responsible for updating network maps to work
> in conjunction with tc hooks.

BPF functions removal:

1. net_map_update_or_delete_sock
2. icmp_delete_network_map

> Helper BPF functions

Fixes: aquasecurity#1828
Fixes: aquasecurity#2152
Fixes: aquasecurity#1922
Fixes: aquasecurity#2096
Fixes: aquasecurity#2126
geyslan pushed a commit to geyslan/tracee that referenced this issue Jan 16, 2023
Userland code removal:

1. NetPacket event (now net_packet_xxx events)
2. DnsRequest, DnsResponse events (now net_packet_dns_xxx)
3. capture_pcap event (now Pcaps pkg)
4. NetPacket derivation (now net_packet_xxx derivations)
5. convertArgMonotonicToEpochTime (no need now)
6. procinfo package (to be refactored and re-added by issue: aquasecurity#2586)
7. events reader bytesT type allows buffer > 4096 for net packets

> Replacing old network logic and events with the new one.
> New logic does not use tc hooks so no interface needs to be specified.

Userland files removal:

1. net_decoder
2. net_proto
3. net_proto_handlers

> net_capture is a parallel pipeline receiving network packets events.
> net_capture now uses the same decodeEvents as the main pipeline.
> net_capture processes received network packets using gopacket.

Userland code actions:

1. move net_packet events to network event id ranges
2. split net_packet events in userland and kernel ranges

> Needed to split network events from the rest.

BPF maps removal:

1. sock_ctx_map
2. network_map

> These maps were responsible for saving network context between
> kprobes, tracepoints and the tc hooks.

BPF probes/tracepoints removal:

 1. udp_sendmsg (kprobe)
 2. __udp_disconnect (kprobe)
 3. udp_destroy_sock (kprobe)
 4. udpv6_destroy_sock (kprobe)
 5. inet_sock_set_state (tracepoint)
 6. tcp_connect (kprobe)
 7. icmp_send (kprobe)
 8. icmp6_send (kprobe)
 9. icmp_recv (kprobe)
10. icmpv6_recv (kprobe)
11. ping_v4_sendmsg (kprobe)
12. ping_v6_sendmsg (kprobe)

> These probes were responsible for updating network maps to work
> in conjunction with tc hooks.

BPF functions removal:

1. net_map_update_or_delete_sock
2. icmp_delete_network_map

> Helper BPF functions

Fixes: aquasecurity#1828
Fixes: aquasecurity#2152
Fixes: aquasecurity#1922
Fixes: aquasecurity#2096
Fixes: aquasecurity#2126
geyslan pushed a commit to geyslan/tracee that referenced this issue Jan 16, 2023
Userland code removal:

1. NetPacket event (now net_packet_xxx events)
2. DnsRequest, DnsResponse events (now net_packet_dns_xxx)
3. capture_pcap event (now Pcaps pkg)
4. NetPacket derivation (now net_packet_xxx derivations)
5. convertArgMonotonicToEpochTime (no need now)
6. procinfo package (to be refactored and re-added by issue: aquasecurity#2586)
7. events reader bytesT type allows buffer > 4096 for net packets

> Replacing old network logic and events with the new one.
> New logic does not use tc hooks so no interface needs to be specified.

Userland files removal:

1. net_decoder
2. net_proto
3. net_proto_handlers

> net_capture is a parallel pipeline receiving network packets events.
> net_capture now uses the same decodeEvents as the main pipeline.
> net_capture processes received network packets using gopacket.

Userland code actions:

1. move net_packet events to network event id ranges
2. split net_packet events in userland and kernel ranges

> Needed to split network events from the rest.

BPF maps removal:

1. sock_ctx_map
2. network_map

> These maps were responsible for saving network context between
> kprobes, tracepoints and the tc hooks.

BPF probes/tracepoints removal:

 1. udp_sendmsg (kprobe)
 2. __udp_disconnect (kprobe)
 3. udp_destroy_sock (kprobe)
 4. udpv6_destroy_sock (kprobe)
 5. inet_sock_set_state (tracepoint)
 6. tcp_connect (kprobe)
 7. icmp_send (kprobe)
 8. icmp6_send (kprobe)
 9. icmp_recv (kprobe)
10. icmpv6_recv (kprobe)
11. ping_v4_sendmsg (kprobe)
12. ping_v6_sendmsg (kprobe)

> These probes were responsible for updating network maps to work
> in conjunction with tc hooks.

BPF functions removal:

1. net_map_update_or_delete_sock
2. icmp_delete_network_map

> Helper BPF functions

Fixes: aquasecurity#1828
Fixes: aquasecurity#2152
Fixes: aquasecurity#1922
Fixes: aquasecurity#2096
Fixes: aquasecurity#2126
rafaeldtinoco added a commit that referenced this issue Jan 16, 2023
Userland code removal:

1. NetPacket event (now net_packet_xxx events)
2. DnsRequest, DnsResponse events (now net_packet_dns_xxx)
3. capture_pcap event (now Pcaps pkg)
4. NetPacket derivation (now net_packet_xxx derivations)
5. convertArgMonotonicToEpochTime (no need now)
6. procinfo package (to be refactored and re-added by issue: #2586)
7. events reader bytesT type allows buffer > 4096 for net packets

> Replacing old network logic and events with the new one.
> New logic does not use tc hooks so no interface needs to be specified.

Userland files removal:

1. net_decoder
2. net_proto
3. net_proto_handlers

> net_capture is a parallel pipeline receiving network packets events.
> net_capture now uses the same decodeEvents as the main pipeline.
> net_capture processes received network packets using gopacket.

Userland code actions:

1. move net_packet events to network event id ranges
2. split net_packet events in userland and kernel ranges

> Needed to split network events from the rest.

BPF maps removal:

1. sock_ctx_map
2. network_map

> These maps were responsible for saving network context between
> kprobes, tracepoints and the tc hooks.

BPF probes/tracepoints removal:

 1. udp_sendmsg (kprobe)
 2. __udp_disconnect (kprobe)
 3. udp_destroy_sock (kprobe)
 4. udpv6_destroy_sock (kprobe)
 5. inet_sock_set_state (tracepoint)
 6. tcp_connect (kprobe)
 7. icmp_send (kprobe)
 8. icmp6_send (kprobe)
 9. icmp_recv (kprobe)
10. icmpv6_recv (kprobe)
11. ping_v4_sendmsg (kprobe)
12. ping_v6_sendmsg (kprobe)

> These probes were responsible for updating network maps to work
> in conjunction with tc hooks.

BPF functions removal:

1. net_map_update_or_delete_sock
2. icmp_delete_network_map

> Helper BPF functions

Fixes: #1828
Fixes: #2152
Fixes: #1922
Fixes: #2096
Fixes: #2126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment