Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/net/gnrc/tx_sync: new module #15694

Merged
merged 3 commits into from
Feb 8, 2021
Merged

sys/net/gnrc/tx_sync: new module #15694

merged 3 commits into from
Feb 8, 2021

Conversation

maribu
Copy link
Member

@maribu maribu commented Dec 23, 2020

Contribution description

The new gnrc_tx_sync module allows users of the GNRC network stack to synchronize with the actual transmission of outgoing packets. This is directly integrated into gnrc_sock. Hence, if gnrc_tx_sync is used, calls to e.g. sock_udp_send() will block until the network stack has processed the message.

Use cases:

  1. Prevent packet drop when sending at high rate
    • If the application is sending faster than the stack can handle, the message queues will overflow and outgoing packets are lost
  2. Passing auxiliary data about the transmission back the stack
    • When e.g. the number of required retransmissions, the transmission time stamp, etc. should be made available to a user of an UDP sock, a synchronization mechanism is needed
  3. Simpler error reporting without footguns
    • The current approach of using core/msg for passing up error messages is difficult to use if other message come in. Currently, gnrc_sock is busy-waiting and fetching messages from the message queue until the number of expected status reports is received. It will enqueue all non-status-report messages again at the end of the queue. This has multiple issues:
      • Busy waiting is especially in lower power scenarios with time slotted MAC protocols harmful, as the CPU will remain active and consume power even though the it could sleep until the TX slot is reached
      • The status reports from the network stack are send to the user thread blocking. If the message queue of the user thread is full, the network stack would block until the user stack can fetch the messages. If another higher priority thread would start sending a message, it would busy wait for its status reports to completely come in. Hence, the first thread doesn't get CPU time to fetch messages and unblock the network stack. As a result, the system would lock up completely.
    • Just adding the error/status code to the gnrc_tx_sync_t would preallocate and reserve memory for the error reporting. That way gnrc_sock does not need to search through the message queue for status reports and the network stack does not need to block for the user thread fetching it.

Testing procedure

  1. Run the test provided as tests/gnrc_tx_sync as is. It should pass.
  2. Run the test again, but comment out USEMODULE += gnrc_tx_sync in the test's Makefile. The test should fail now.

State

  • Integrate into link layer fragmentation. (I think only 6LoWPAN has this, right?) But that shouldn't be too hard to add. Just split of the TX sync snip before creating the fragments, send out all fragments as before, and finally release the TX sync snip.
  • Get drivers/net: Add netdev_driver_t::confirm_send #14660 merged, so that this can not only sync with GNRC but also with the actual transmission of the hardware.

The first should be handled before merging, the second could be a follow up.

Issues/PRs references

None

@maribu maribu added Area: network Area: Networking State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Type: new feature The issue requests / The PR implemements a new feature for RIOT labels Dec 23, 2020
@maribu maribu requested review from miri64 and benpicco December 23, 2020 16:39
Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts.

@maribu
Copy link
Member Author

maribu commented Jan 27, 2021

Rebased on master, as #15839 is now merged. I guess it is time to also add synchronization for the link-layer fragmentation use case, so that the WIP label can be dropped.

@maribu maribu force-pushed the gnrc_tx_sync branch 2 times, most recently from 872b599 to 7886b0e Compare January 28, 2021 08:23
@maribu
Copy link
Member Author

maribu commented Jan 28, 2021

let's reorder the commits, so that each commit can be individually compiled to ease git bisect.

@maribu maribu added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet labels Jan 28, 2021
@maribu maribu force-pushed the gnrc_tx_sync branch 2 times, most recently from 6c2525e to 3442a87 Compare January 28, 2021 08:49
@maribu
Copy link
Member Author

maribu commented Jan 28, 2021

So, vera++ should now also be happy :-)

@maribu
Copy link
Member Author

maribu commented Feb 2, 2021

@jia200x suggested that it is a bad idea to bundle gnrc_tx_sync with throttling 6LoWPAN fragmentation, as e.g. multiple 6LoWPAN interfaces (e.g. when using `at86rf215) could be present.

I can split this out and just split the tx_sync snip from the IPv6 packet and instead attach it to the last fragment. This way, gnrc_tx_sync would still sync with the transmission of the whole packet (including all fragments), but it would never blow the 6LoWPAN thread.

Generally, I do believe that throttling 6LoWPAN fragmentation to match the speed of the transceiver is beneficial. (I also agree that just blocking in the 6LoWPAN thread is a quick and dirty hack, rather than a clean solution.) But I fully agree that bundling throttling of fragmentation with gnrc_tx_sync is a bad idea.

I will look into this asap.

@benpicco
Copy link
Contributor

benpicco commented Feb 2, 2021

@jia200x suggested that it is a bad idea to bundle gnrc_tx_sync with throttling 6LoWPAN fragmentation, as e.g. multiple 6LoWPAN interfaces (e.g. when using `at86rf215) could be present.

I still think this would be a small price to pay for getting actually working 6LoWPAN fragmentation with all drivers. (And the possibility to remove the hacks that were added to the drivers to make 6LoWPAN fragmentation work)

@maribu
Copy link
Member Author

maribu commented Feb 2, 2021

@jia200x suggested that it is a bad idea to bundle gnrc_tx_sync with throttling 6LoWPAN fragmentation, as e.g. multiple 6LoWPAN interfaces (e.g. when using `at86rf215) could be present.

I still think this would be a small price to pay for getting actually working 6LoWPAN fragmentation with all drivers. (And the possibility to remove the hacks that were added to the drivers to make 6LoWPAN fragmentation work)

Agreed. But there is no harm in allowing the user to decide on this. E.g. one could instead use gnrc_netif_pktq to pump out fragments on multiple transceivers at high rate, if faced with use case for this.

(And we could still add (say) gnrc_sixlowpan_throttle to Makefile.default for boards with a single 6LoWPAN transceiver, if we want to provide users with a reasonable default.)

@jia200x
Copy link
Member

jia200x commented Feb 2, 2021

I still think this would be a small price to pay for getting actually working 6LoWPAN fragmentation with all drivers. (And the possibility to remove the hacks that were added to the drivers to make 6LoWPAN fragmentation work)

well, wouldn't 6LoWPAN work with all ieee802154_radio_hal based drivers? And which hack are you referring to?

@benpicco
Copy link
Contributor

benpicco commented Feb 2, 2021

And which hack are you referring to?

I had to add _block_while_busy() in at86rf215, also nrf24l01p_ng can't handle large packets with many fragments.

@jia200x
Copy link
Member

jia200x commented Feb 2, 2021

I had to add _block_while_busy() in at86rf215, also nrf24l01p_ng can't handle large packets with many fragments.

Ah yes, but you don't need to add that if you use the Radio HAL (the send function never blocks). That's how it's implemented in the CC2538 and NRF802154

@maribu
Copy link
Member Author

maribu commented Feb 2, 2021

Both nrf24l01p_ng and cc110x are using 6LoWPAN, but are not 802.15.4. So porting them to the 802.15.4 radio HAL is not really an option.

The hack in the cc110x is particularly nasty, as the cc110x can send frames of 255 bytes (including some header not supplied by the hardware), but only has a 64 byte FIFO. So the driver provides its own mechanism to call netdev_driver_t::isr() during TX, while it is blocking until TX completes. It would be super nice to get rid of that.

@jia200x
Copy link
Member

jia200x commented Feb 2, 2021

Both nrf24l01p_ng and cc110x are using 6LoWPAN, but are not 802.15.4. So porting them to the 802.15.4 radio HAL is not really an option.

The hack in the cc110x is particularly nasty, as the cc110x can send frames of 255 bytes (including some header not supplied by the hardware), but only has a 64 byte FIFO. So the driver provides its own mechanism to call netdev_driver_t::isr() during TX, while it is blocking until TX completes. It would be super nice to get rid of that.

I remember you proposed to create a "generic network device" in the RDM. Although there were some discussions there, maybe this use case can be a concrete example (let's say a generic radio + some technology specific extensions).

@maribu
Copy link
Member Author

maribu commented Feb 2, 2021

I remember you proposed to create a "generic network device" in the RDM. Although there were some discussions there, maybe this use case can be a concrete example (let's say a generic radio + some technology specific extensions).

I'm still all in for this :-) In the meantime, allowing users to throttle the 6LoWPAN thread should be a quick fix. But anyway, I dropped this from this PR, so that this could be a separate PR and a separate (pseudo)module. We could document that we plan removing this module right away, when a better solution is merged.

@maribu maribu removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Feb 2, 2021
The new `gnrc_tx_sync` module allows users of the GNRC network stack to
synchronize with the actual transmission of outgoing packets. This is directly
integrated into gnrc_sock. Hence, if `gnrc_tx_sync` is used, calls to e.g.
sock_udp_send() will block until the network stack has processed the message.

Use cases:
1. Prevent packet drop when sending at high rate
    - If the application is sending faster than the stack can handle, the
      message queues will overflow and outgoing packets are lost
2. Passing auxiliary data about the transmission back the stack
    - When e.g. the number of required retransmissions, the transmission time
      stamp, etc. should be made available to a user of an UDP sock, a
      synchronization mechanism is needed
3. Simpler error reporting without footguns
    - The current approach of using `core/msg` for passing up error messages is
      difficult to use if other message come in. Currently, gnrc_sock is
      busy-waiting and fetching messages from the message queue until the number
      of expected status reports is received. It will enqueue all
      non-status-report messages again at the end of the queue. This has
      multiple issues:
        - Busy waiting is especially in lower power scenarios with time slotted
          MAC protocols harmful, as the CPU will remain active and consume
          power even though the it could sleep until the TX slot is reached
        - The status reports from the network stack are send to the user thread
          blocking. If the message queue of the user thread is full, the network
          stack would block until the user stack can fetch the messages. If
          another higher priority thread would start sending a message, it
          would busy wait for its status reports to completely come in. Hence,
          the first thread doesn't get CPU time to fetch messages and unblock
          the network stack. As a result, the system would lock up completely.
    - Just adding the error/status code to the gnrc_tx_sync_t would preallocate
      and reserve memory for the error reporting. That way gnrc_sock does not
      need to search through the message queue for status reports and the
      network stack does not need to block for the user thread fetching it.
@maribu
Copy link
Member Author

maribu commented Feb 3, 2021

@miri64: I adapted the test to now use a simulated IEEE 802.15.4 or a simulated Ethernet devices and dropped the additional netdev type.

@miri64 miri64 added CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 8, 2021
@miri64
Copy link
Member

miri64 commented Feb 8, 2021

From my side this is ready to get merged, so please squash. Let's see if Murdock finds any issues.

It is currently not possible to use both gnrc_tx_sync and
gnrc_sixlowpand_frag_sfr at the same time - this will be added in a follow
up PR.
@maribu
Copy link
Member Author

maribu commented Feb 8, 2021

All green :-)

Copy link
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then take this green as well :-)! ACK!

@miri64 miri64 merged commit 209b48e into RIOT-OS:master Feb 8, 2021
@maribu
Copy link
Member Author

maribu commented Feb 8, 2021

Thanks! :-)

@miri64
Copy link
Member

miri64 commented Feb 16, 2021

From #15694 (comment)

[…]

And a good invitation to provide generalized mock devices and/or interfaces for certain L2 protocols in the future so one does not have to re-implement them all the time. ^^

See #16020

@kaspar030 kaspar030 added this to the Release 2021.04 milestone Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR Type: new feature The issue requests / The PR implemements a new feature for RIOT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants