Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netlink event API: add SUBFLOW_CREATED event #222

Open
matttbe opened this issue Aug 4, 2021 · 4 comments
Open

Netlink event API: add SUBFLOW_CREATED event #222

matttbe opened this issue Aug 4, 2021 · 4 comments

Comments

@matttbe
Copy link
Member

matttbe commented Aug 4, 2021

For the moment, there is no event when a subflow has been created, only when it has been established.

If we have this situation:

  • the userspace PM creates a subflow -- when the command will be available, see Add netlink command support #186 -- without specifying the src port/address.
  • the subflow is never established, e.g. SYN are dropped
  • the userspace is not able to destroy the subflow -- again, when the command will be available -- because the userspace doesn't have the 5-tuples.
  • the userspace will see SUBFLOW_CLOSED events for unknown subflow

Solution: have a SUBFLOW_CREATED event.

@fw-strlen
Copy link

We could suppress SF_CLOSED for subflows that were never announced.
I think that for purpose of 'canceling' a subflow connect request it would make more sense to me to support
the NLM_F_ECHO flag in the 'create' api so that mptcpd learns some identifier that it can use to
terminate the connection request when it sends the appropriate create command.

@matttbe
Copy link
Member Author

matttbe commented Aug 11, 2021

We could suppress SF_CLOSED for subflows that were never announced.

Indeed. For me, receiving an event for an unknown subflow is not a big deal: anyway the userspace app has to manage this case in case events are lost, the app is restarted, etc.

I think that for purpose of 'canceling' a subflow connect request it would make more sense to me to support
the NLM_F_ECHO flag in the 'create' api so that mptcpd learns some identifier that it can use to
terminate the connection request when it sends the appropriate create command.

Good point, I didn't think about that!
Will the advantage be that only the creator of the subflow will get the notification?
From what I see; the creator will not have to "block" so it seems good.

I was thinking about a new dedicated event because on the kernel side, we send the same content: the implementation is easy. And this could benefit to others, e.g. if we only have a listener (ip mptcp monitor) or if the subflow has been created by one part the events are processed by another entity. We can have two channels of communications: one for the events and one for the commands (and their ACKs): one part is in charge or managing new events and the other side is in charge of sending commands and wait for ACK. Not sure it is the proper way to do such thing.

@VenkateswaranJ
Copy link

@matttbe what do you mean by when the command will be available? how does the user's space know whether the command is available or not?

@matttbe
Copy link
Member Author

matttbe commented Aug 29, 2022

@VenkateswaranJ it was just to say that this task depends on #186 (now done).

The userspace can know if a command is supported by trying it and check the return code.

matttbe pushed a commit that referenced this issue Oct 27, 2023
The following warning was reported when running "./test_progs -t
test_bpf_ma/percpu_free_through_map_free":

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 68 at kernel/bpf/memalloc.c:342
  CPU: 1 PID: 68 Comm: kworker/u16:2 Not tainted 6.6.0-rc2+ #222
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: events_unbound bpf_map_free_deferred
  RIP: 0010:bpf_mem_refill+0x21c/0x2a0
  ......
  Call Trace:
   <IRQ>
   ? bpf_mem_refill+0x21c/0x2a0
   irq_work_single+0x27/0x70
   irq_work_run_list+0x2a/0x40
   irq_work_run+0x18/0x40
   __sysvec_irq_work+0x1c/0xc0
   sysvec_irq_work+0x73/0x90
   </IRQ>
   <TASK>
   asm_sysvec_irq_work+0x1b/0x20
  RIP: 0010:unit_free+0x50/0x80
   ......
   bpf_mem_free+0x46/0x60
   __bpf_obj_drop_impl+0x40/0x90
   bpf_obj_free_fields+0x17d/0x1a0
   array_map_free+0x6b/0x170
   bpf_map_free_deferred+0x54/0xa0
   process_scheduled_works+0xba/0x370
   worker_thread+0x16d/0x2e0
   kthread+0x105/0x140
   ret_from_fork+0x39/0x60
   ret_from_fork_asm+0x1b/0x30
   </TASK>
  ---[ end trace 0000000000000000 ]---

The reason is simple: __bpf_obj_drop_impl() does not know the freeing
field is a per-cpu pointer and it uses bpf_global_ma to free the
pointer. Because bpf_global_ma is not a per-cpu allocator, so ksize() is
used to select the corresponding cache. The bpf_mem_cache with 16-bytes
unit_size will always be selected to do the unmatched free and it will
trigger the warning in free_bulk() eventually.

Because per-cpu kptr doesn't support list or rb-tree now, so fix the
problem by only checking whether or not the type of kptr is per-cpu in
bpf_obj_free_fields(), and using bpf_global_percpu_ma to these kptrs.

Signed-off-by: Hou Tao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs triage
Development

No branches or pull requests

3 participants