Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: A layered approach to netdev #7736

Open
miri64 opened this issue Oct 13, 2017 · 43 comments
Open

RFC: A layered approach to netdev #7736

miri64 opened this issue Oct 13, 2017 · 43 comments
Assignees
Labels
Area: drivers Area: Device drivers Area: network Area: Networking Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: don't stale State: Tell state-bot to ignore this issue Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Comments

@miri64
Copy link
Member

miri64 commented Oct 13, 2017

@haukepetersen and I today talked a little bit about the problems with netdev_t and how to dynamically add features independent from a specific device. Examples for this are

This is what we came up with:

layered_netdev

In addition to actual device drivers there are additional netdev_driver_ts available who also have another netdev_driver_t associated to themselves (I called this construction netdev_layered_t here but I'm not stuck on the name). This allows a network stack to use those features as though they were a normal network device and the whole thing would be very transparent to the network stack (so I wouldn't count this as a major API change ;-)). The benefit of this should be clear:

  • no code duplication for link-layer specific operations (like packet counting, link-layer address filtering, option setting/getting)
  • simplification of the netdev_ieee802154_set()/netdev_ieee802154_get() situation that seem to confuse people

The questions that are still open are:

  • Where to store netdev_layered_t instances?
  • How initialize such a "netdev_layered_t" stack while also having it configurable?
@miri64 miri64 added Area: drivers Area: Device drivers Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: network Area: Networking Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR labels Oct 13, 2017
@miri64
Copy link
Member Author

miri64 commented Oct 13, 2017

Another thing to consider might be that this could waste a few bytes of memory, since any of these layers could just call the next for certain operations without adding any new functionality (e.g. a netstats counter wouldn't be involved in set() or isr() wasting basically 8 byte of RAM and a bit of ROM for the calldown).

@miri64
Copy link
Member Author

miri64 commented Oct 13, 2017

But that might be premature optimization ;-)

@jnohlgard
Copy link
Member

jnohlgard commented Oct 14, 2017

Since many of the layers will only hook into one or two of the functions, another approach could be to change the functions themselves to linked lists, where the next function in the list is called if the argument is not handled by the current function. That would reduce memory usage when the layer implements 3 or less of the 6 API functions, and it would reduce the latency for the functions which have few hooks (netdev_t::isr for example).
I don't have any solution to where and how to allocate these list items though.

Another important consideration: How do we decide which order the layers should be linked? Do we need a kind of sorting key or do we just add them in the order they are initialized during application start?

Edit: The memory reduction assumes that a function pointer and a pointer to a struct uses the same number of bytes in memory.

@jnohlgard
Copy link
Member

I do like the ideas here, it would certainly be possible to simplify MAC layer implementations if the netstats stuff could be broken out into its own layer, just counting the number of packets passing back and forth. And though it would be working the same way as today, I think it would make netdev_ieee802154_send/recv more visible. The monkey patching of send and recv being done by netdev_ieee802154_init in the current implementation was not at all obvious to me until I saw the call chain in a backtrace in the debugger from inside the send of my netdev driver, but maybe I was just being blind.

Finding a solution for the allocation issue would mean that we also could potentially move the extra members in gnrc_netdev_t added by preprocessor conditions on MODULE_GNRC_MAC (https://github.com/RIOT-OS/RIOT/blob/master/sys/include/net/gnrc/netdev.h#L119-L163). It breaks ABI compatibility between builds to have public API structs change members depending on the included modules.

@kaspar030
Copy link
Contributor

kaspar030 commented Oct 14, 2017

I don't see news here. Just bad documentation and its effects.

Netdev2 has been designed to be stackable from the beginning. Just add a "parent" pointer to any netdev_t and add some logic to pass send/recv/get/set calls up to that parents.

Was anyone listening when I said "implement mac layers stacked on top of the device drivers using netdev"? edit sorry, that came out a lot more rude than intended.

@jnohlgard
Copy link
Member

@kaspar030 thank you, I think using netdev is a good solution for a MAC layer, but there's still the issue of allocating it somewhere for each network device. Do you have any suggestions on how to tackle that?

@kaspar030
Copy link
Contributor

there's still the issue of allocating it somewhere for each network device. Do you have any suggestions on how to tackle that?

Not really. If it is not possible to do that statically (in a not completely ugly way), we could consider allowing some kind of oneway-malloc that can be used only in auto-init.

@jnohlgard
Copy link
Member

I think 1-way malloc is fine on init during boot. Maybe it would be an idea to add a 1-way malloc which works during boot but after calling a certain function (malloc_finalize or sth) then any calls to that malloc will cause a panic, to prevent inadvertent malloc uses and running out of memory. For systems where you don't need dynamic allocation, and want to be able to ensure stability.

@miri64
Copy link
Member Author

miri64 commented Nov 15, 2017

Added the idea of #4795 to the list above to not loose track of it.

@bergzand
Copy link
Member

I've been thinking a bit about this the last few days since I have a few network features in mind which could greatly benefit from "dynamic" allocation per network device. Most notably at the moment #6873 where ETX tracking doesn't make a lot of sense on wired interfaces.

Not really. If it is not possible to do that statically (in a not completely ugly way), we could consider allowing some kind of oneway-malloc that can be used only in auto-init.

Would it be possible to reuse the memory space of a thread for this? Stack starting on one end, "one way heap" on the other end. Maybe even use a thread flag to indicate that this one way malloc is allowed for the thread, as to have a way to enforce restrictions on this. As said before, the malloc could be really simple when assumed that a free() is not necessary.

@miri64
Copy link
Member Author

miri64 commented Nov 27, 2017

Would it be possible to reuse the memory space of a thread for this? Stack starting on one end, "one way heap" on the other end. Maybe even use a thread flag to indicate that this one way malloc is allowed for the thread, as to have a way to enforce restrictions on this. As said before, the malloc could be really simple when assumed that a free() is not necessary.

There are no threads in netdev and not supposed to be, so no.

@bergzand
Copy link
Member

For now, I'm trying to solve get some discussion going on the issue of where to store the netdev_layered_t data if they are not statically allocated at compile time.

There are no threads in netdev and not supposed to be, so no.

Just so that I understand what you're saying here, the netdev radio drivers are not thread aware, so no calls to threading functions right? Somewhere up in the stack there has to be some kind of thread running an event loop controlling the drivers somehow right (gnrc_netif or gnrc_netdev)?

@miri64
Copy link
Member Author

miri64 commented Nov 28, 2017

Just so that I understand what you're saying here, the netdev radio drivers are not thread aware, so no calls to threading functions right? Somewhere up in the stack there has to be some kind of thread running an event loop controlling the drivers somehow right (gnrc_netif or gnrc_netdev)?

Yes, the threading, if required, is provided by the network stack, but netdev's event management (the isr()+event_callback() bootstrap) allows for totally thread-less environments as well.

@bergzand
Copy link
Member

Yes, the threading, if required, is provided by the network stack, but netdev's event management (the isr()+event_callback() bootstrap) allows for totally thread-less environments as well.

Makes sense.

This whole idea of (one time dynamic) allocation of the netdev_layered_t structs doesn't necessarily have to happen inside netdev right? What I'm thinking of is a structure where the process/functions controlling a netdev_t instance (gnrc_netif or a different threadless design) also configures it with the required layered structs. How this allocation then happens is the problem of the functions above netdev and netdev doesn't have to care whether the allocation happens either dynamic or static, as long as it receives a valid pointer to a netdev_layered_t struct.

@miri64
Copy link
Member Author

miri64 commented Nov 28, 2017

Sound's like a nice idea. Are you willing maybe to provide a proof-of-concept. Would do it myself, but currently don't have enough time at hand for that. :-/

@bergzand
Copy link
Member

Sure, not going to promise anything as my time is limited too, but it sounds like a fun challenge :)

@bergzand
Copy link
Member

bergzand commented Dec 5, 2017

While implementing this for netdev, I was thinking if this layered approach is possible for netif too. Maybe MAC layer protocols such as LWMAC and GoMacH could benefit from this approach.

@miri64
Copy link
Member Author

miri64 commented Dec 5, 2017

Could be.

@miri64
Copy link
Member Author

miri64 commented Jul 16, 2018

In general this work aims to reduce code complexity. The layers this issue talks about currently already exist in part and existed in the referenced paper as well (see modules like netdev_ieee802154 and netdev_ethernet). However the calling hierarchy is quite messed up at the moment + private member fields are touched in places where they are not supposed to be touched (and I admit that this is to 98% my fault ;-)). One example: when a radio specific option is retrieved for a network interface via gnrc_netapi, the interface's thread then first asks the radio driver. If the radio driver doesn't have it, it asks the IEEE 802.15.4 netdev layer. This is supposed to be the otherway around. However, to isolate IEEE 802.15.4 specific options (such as addresses, PAN ID, header flags) and to reduce code duplication, the comparably complicated header construction (they have variable length) of the IEEE 802.15.4 header is also moved to that module in #9417. That's all that is happening and shouldn't have much influence on performance (which should be proven of course).

@bergzand
Copy link
Member

Okay, I'd like to have some opinions again.

I'm currently looking at the way link layer addresses are initialized. My main issue with the current architecture is that the link layer address generated by the device driver is passed up and down the stack.

Problem

The device driver generates an address based on the luid module. This is written to the netdev_t struct by the device driver. gnrc_netif requests the addres from the device driver where it is then also stored.

My problem here is that 1. there is data directly written between "layers", by the device driver, to the netdev(_ieee802154)_t struct — and 2. In a setup where multiple layers require knowledge of the link layer address, there is no way to guarantee this.

Solution

My current solution would be to have the higher layer (gnrc_netif or the netdev glue layers) generate the link layer address and set it in the device driver. This way it has to traverse all netdev layers, giving all layers explicitly knowledge of the new link layer address.

In this implementation checks would be required to check whether the netdev stack uses a link layer address and to check if the device driver provides a link layer address and then use that address.

@miri64
Copy link
Member Author

miri64 commented Nov 12, 2018

Didn't we "solve" the luid problem already by using luid_custom() with the netdev's pointer instead of luid_get() (see #9656 (comment)).

@bergzand
Copy link
Member

Didn't we "solve" the luid problem already by using luid_custom() with the netdev's pointer instead of luid_get() (see #9656 (comment)).

That only solves changes to the link layer address when the device driver is reset, but that doesn't solve my problem number 1 and number 2 here (right?) :)

@miri64
Copy link
Member Author

miri64 commented Nov 12, 2018

Well, problem 1 and 2 won't arise, when the reset is idempotent ;-).

@miri64
Copy link
Member Author

miri64 commented Nov 12, 2018

Or am I misunderstanding them? :-/

@bergzand
Copy link
Member

As a practical (or less hypothetical) example, the nrf52840 doesn't have hardware filtering. The software filter could be implemented as a netdev layer. This extra filter layer would then require knowledge of both the generated link layer address and the PAN ID. IMHO the easiest way for this layer to get the link layer address would be if it could grab the information from a netdev::set call.

Or am I misunderstanding them? :-/

That just means I didn't explain them good enough :)

I'm trying to remove this write by the device driver to the netdev_ieee802154_t struct. At the same time I'm trying to get to a solution that is usable when in the future multiple layers require knowledge of the link layer address.

A different solution for the second issue might be to do a netdev::get call in the init function of the layer for the link layer address and also listen for any netdev::set call that changes the link layer address. Not sure yet if this also works always though.

@bergzand
Copy link
Member

bergzand commented Nov 13, 2018

These "blockers" are all cases where data is directly read from a netdev struct at a position where in a layered module the content of this struct can't be guaranteed. List might be expanded in the future

Blockers:

@bergzand
Copy link
Member

Link layer address generation:

At the moment I think that the easiest way is to remove the link layer address from the netdev_ieee802154_t struct. This way the behaviour becomes identical to the behaviour of the ethernet drivers.

The main issue is that now the address generated by the device driver has to be synced somehow with the netdev_ieee802154_t struct member. It is not easily possible to have the netdev_ieee802154_t layer generate the address, some radios (socket_zep) have a built in address which should be used instead.

The only place where the link layer address is requested is with the ifconfig shell command, when netif is requesting the l2 address to initialize it's own copy and with a SLAAC failure (after setting the address).

@miri64
Copy link
Member Author

miri64 commented Nov 15, 2018

with a SLAAC failure (after setting the address).

DAD failure ;-).

@bergzand
Copy link
Member

DAD failure ;-).

That's what I was thinking, but not what I was writing :)

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@miri64 miri64 added State: don't stale State: Tell state-bot to ignore this issue and removed State: stale State: The issue / PR has no activity for >185 days labels Aug 10, 2019
@MrKevinWeiss MrKevinWeiss added this to the Release 2021.07 milestone Jun 21, 2021
@MrKevinWeiss MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: drivers Area: Device drivers Area: network Area: Networking Discussion: RFC The issue/PR is used as a discussion starting point about the item of the issue/PR State: don't stale State: Tell state-bot to ignore this issue Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

No branches or pull requests

8 participants