Skip to content

dts: bindings: usb: Specify fifo sizes in devicetree#99510

Open
tmon-nordic wants to merge 2 commits intozephyrproject-rtos:mainfrom
tmon-nordic:dwc2-fifos
Open

dts: bindings: usb: Specify fifo sizes in devicetree#99510
tmon-nordic wants to merge 2 commits intozephyrproject-rtos:mainfrom
tmon-nordic:dwc2-fifos

Conversation

@tmon-nordic
Copy link
Copy Markdown
Contributor

Add g-rx-fifo-size, g-np-tx-fifo-size and g-tx-fifo-size required properties to snps,dwc2 compatible. Property names are the same as used in Linux because Zephyr aims for devicetree source compatibility with other operating systems.

Specifying fifo sizes in devicetree greatly reduces necessary driver complexity and allows application developer to adjust the sizes if necessary to best utilize underlying hardware.

Implements #96206


usb0: usb@ffb30000 {
compatible = "snps,dwc2";
reg = <0xffb30000 0xffff>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nashif I found fifo information in Cyclone V HPS Register Address Map and Definitions but there the usb0 is at 0xFFB00000 and not 0xFFB30000. Is it a bug in zephyr devicetree or am I looking at wrong place?

Copy link
Copy Markdown
Contributor

@walidbadar walidbadar Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nashif I found fifo information in Cyclone V HPS Register Address Map and Definitions but there the usb0 is at 0xFFB00000 and not 0xFFB30000. Is it a bug in zephyr devicetree or am I looking at wrong place?

I think this might be a bug in the Zephyr devicetree. The Linux kernel also uses 0xFFB00000 as the base address for Cyclone V usb0.

https://github.com/torvalds/linux/blob/113ae7b4decc6c2d95bdbbe52e615a0137ef7f9f/arch/arm/boot/dts/intel/socfpga/socfpga.dtsi#L944-L955

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

@raffarost Can you provide the devicetree values from esp32-s3?

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

@jean12332 can you verify whether this also fixes #95500?

@raffarost
Copy link
Copy Markdown

raffarost commented Nov 17, 2025

@raffarost Can you provide the devicetree values from esp32-s3?

hi @tmon-nordic,
does this look reasonable? Values for GRXFSIZ and GNPTXFSIZ are the ones currently in use.

Section         Words
----------------------
RX FIFO         64
EP0 TX (NP)     32
EP1 TX          40
EP2 TX          40
EP3 TX          40
EP4 TX          40
----------------------
Total           256  (GDFIFOCFG)

@sylvioalves

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

@raffarost Can you provide the devicetree values from esp32-s3?

hi @tmon-nordic, does this look reasonable? Values for GRXFSIZ and GNPTXFSIZ are the ones currently in use.

Section         Words
----------------------
RX FIFO         64
EP0 TX (NP)     32
EP1 TX          40
EP2 TX          40
EP3 TX          40
EP4 TX          40
----------------------
Total           256  (GDFIFOCFG)

@sylvioalves

What is actual GDFIFOCFG register value? The upper 16 bits show how many locations are available for fifos, the lower 16 bits show total SPRAM size. When DMA is used, some locations are used for storing DMA pointers (this is where the difference between the upper 16 bits and lower 16 bits come from).

GHWCFG3 upper 16 bits report available DFIFO Depth (after reset should be the same as upper 16 bits from GDFIFOCFG register).

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

Rebased to fix conflict in dts/vendor/nordic/nrf54lm20a.dtsi. No other changes.

@zephyrbot zephyrbot added the area: Devicetree Binding PR modifies or adds a Device Tree binding label Feb 17, 2026
@sonarqubecloud
Copy link
Copy Markdown

sylvioalves
sylvioalves previously approved these changes Feb 18, 2026
josuah
josuah previously approved these changes Feb 18, 2026
Comment thread dts/bindings/usb/snps,dwc2.yaml
Copy link
Copy Markdown
Contributor

@jfischer-no jfischer-no left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamically allocating fifo sizes leads to very complex code that yields
subpar results.

For sure not, #94766 is not complex, and provides the necessary FIFO sizes based on actual USB device configuration. Using devicetree would not scale, and we have already far too many overlays in the tree.

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

tmon-nordic commented Mar 14, 2026

Dynamically allocating fifo sizes leads to very complex code that yields
subpar results.

For sure not, #94766 is not complex, and provides the necessary FIFO sizes based on actual USB device configuration.

It is not complex, but it does not allow the users to take control over the allocations. One problem is for example deciding whether to have at least two wMaxPacketSize available for bulk endpoints (this does have significant impact on MSC performance).

There is a reason why reference Synopsys DWC2 driver (the one in Linux mainline) does use these very devicetree entries. I feel it is not worth it to try to come up with smarter code. However, if you want to go that path - do propose something that gives actual control to the application developer. There are already multiple problems with endpoint assignemnt logic in Zephyr (see #102041 (comment)) and trying to come up with magic code for allocating fifos is not making the task any simpler.

You are also completely ignoring the fact that Synopsys DWC2 can have different limitations for individual fifos. Such limitation in actual silicon is the original reason why #96206 was opened. The actual hardware limitations together with application specific requirements make me strongly believe that leaving fifo assignments up to user is not only the simplest but also the best solution.

Using devicetree would not scale, and we have already far too many overlays in the tree.

With reasonable default generic fifo configurations like proposed here, I don't think there would be a need for any per-sample specific overlays. For platforms with large SPRAM and flexible fifos, I doubt anyone would ever have to change it. For the small SPRAM most likely the users won't actually adjust (unless they have very specific use case, but that's up to application). Allowing the user to specify the fifo sizes IMHO is most beneficial is for devices with middle sized SPRAM, because it then allows developers to tailor cut it for their actual application.

Another issue that would be trivial to solve with devicetree and IMHO not really worth attempting to come up a "clever code" is to determine when to enable thresholding. It should be up to the user, because there are way too many variables to consider (memory access times being one of them).

@carlescufi
Copy link
Copy Markdown
Member

Thinking about this, why not have both? If the user does not specify the fifo sizes in Devicetree the the stack can heuristically determine the sizes, and otherwise the ones provided in the overlay are used instead.

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

Thinking about this, why not have both? If the user does not specify the fifo sizes in Devicetree the the stack can heuristically determine the sizes, and otherwise the ones provided in the overlay are used instead.

Having both may sound like a good idea, but then we wouldn't get the benefit of being able to remove a significant chunk of code that gives subpar results. Stack is currently not really suited for having sensible heuristics (I believe the only way for the heuristics to be sensible is if the stack provided complete overview of all endpoints used in configuration to UDC driver in a single call).

Driver is currently attempting to dynamically allocate memory (fifo space) based on .ep_enable (alloc) and .ep_disable (free). This PR effectively replaces this completely with fixed allocation. I expect the default fixed allocation to very rarely needing adjustments, just like flash partitions - it is user that ultimately decides it in application, the samples in tree can pretty much all work with the sensible defaults.

@carlescufi
Copy link
Copy Markdown
Member

carlescufi commented Mar 16, 2026

Having both may sound like a good idea, but then we wouldn't get the benefit of being able to remove a significant chunk of code that gives subpar results.

Well, we would if we had a new Kconfig option (that could be enabled by default), something like (pseudocode):

if (DT_NODE_EXISTS(fifo_sizes))
    /* Use FIFO sizes from Devicetree */
else if (IS_ENABLED(CONFIG_USB_FIFO_SIZE_HEURISTICS))
   /* Use heuristics */
else
  BUILD_ASSERT("Unable to determine FIFO sizes")
endif

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

tmon-nordic commented Mar 16, 2026

Well, we would if we had a new Kconfig option (that could be enabled by default), something like (pseudocode):

This would actually complicate the code even more instead of making it cleaner.

The reasons to have this fixed at build time are:

  • RX fifo size, and hence maximum support OUT endpoint wMaxPacketSize must be known before device receives first SETUP token. This effectively means that it must be known before USB stack is enabled. Currently there is
    /* Minimum RX FIFO size in 32-bit words considering the largest used OUT packet
    * of 512 bytes. The value must be adjusted according to the number of OUT
    * endpoints.
    */
    #define UDC_DWC2_GRXFSIZ_FS_DEFAULT (15U + 512U/4U)
    /* Default Rx FIFO size in 32-bit words calculated to support High-Speed with:
    * * 1 control endpoint in Completer/Buffer DMA mode: 13 locations
    * * Global OUT NAK: 1 location
    * * Space for 3 * 1024 packets: ((1024/4) + 1) * 3 = 774 locations
    * Driver adds 2 locations for each OUT endpoint to this value.
    */
    #define UDC_DWC2_GRXFSIZ_HS_DEFAULT (13 + 1 + 774)
    /* TX FIFO0 depth in 32-bit words (used by control IN endpoint)
    * Try 2 * bMaxPacketSize0 to allow simultaneous operation with a fallback to
    * whatever is available when 2 * bMaxPacketSize0 is not possible.
    */
    #define UDC_DWC2_FIFO0_DEPTH (2 * 16U)
    /* Get Data FIFO access register */
    #define UDC_DWC2_EP_FIFO(base, idx) ((mem_addr_t)base + 0x1000 * (idx + 1))
    /* Percentage limit of how much SPRAM can be allocated for RxFIFO */
    #define MAX_RXFIFO_GDFIFO_PERCENTAGE 25

    which does not allow user to adjust the value in any way.
  • Not all configurations (synthesis time, i.e. fixed before silicon tapeout) allow full flexibility on fifo allocations. In fact for example on nRF54H20 software has very limited possibility to do changes.
    • Both RxFIFO and each TxFIFO are subject to starting address constaints. Similarly, the maximum assigned length is also constrained.
    • Due to the mentioned constraints, decreasing memory used for some TxFIFO may not necessarily result in this memory being available for another TxFIFO.
  • SPRAM may not be able to hold all the required RxFIFO + TxFIFO locations. Thresholding can be used in such case, but thresholding definitely must not be an automatic driver decision. There are simply way too many factors to consider before enabling thresholding.
  • There is no benefit in leaving some SPRAM locations unused (which the driver currently does pretty much in every single case). There is however very significant performance hit for bulk endpoints if it is not allocated fifo locations enough to hold 2 * wMaxPacketSize.

The workaround for the significant bulk endpoint hit could be as simple as multiplying

reqdep = DIV_ROUND_UP(udc_mps_ep_size(cfg), 4U);

by 2 if endpoint is bulk type. This however would be just applying a bandaid instead of solving the actual root cause (having to dynamically allocate SPRAM locations without having full system overview).

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

we would if we had a new Kconfig option (that could be enabled by default)

Even bigger thing is that we don't have a working allocator. There are multiple ways where the current fifo allocation fails. This is summed up in commit message

Dynamically allocating fifo sizes leads to very complex code that yields
subpar results. DWC2 controller has very specific requirements related
to configured FIFO sizes that significantly increase the complexity:
  * each fifo has specific upper bound on both size and starting
    location (fixed for each silicon)
  * assigned locations must be contiguous
  * assigned locations cannot be changed on-the-fly

Zephyr tried to dynamically assign fifos using simple but essentially
broken algorithm that:
  * assumed 1-to-1 endpoint number to TxFIFO mapping
    DWC2 controller can be configured with non-contiguous IN endpoint
    numbers, but TxFIFOs are always contiguous.
  * allocated minimum required space for each endpoint
    There is no benefit in using less SPRAM than hardware has available,
    but using only 128 locations per bulk endpoint significantly limits
    maximum transfer rate when using DMA. At least 256 locations allows
    DMA to load next packet data before previous packet is transmitted
    on the bus. Actual performance degradation depends on clock rates
    and latencies. MSC on nRF54H20 using bulk endpoints with only 128
    locations can achieve no more than 15.6 MB/s throughput, while with
    256 locations speeds up to 36.9 MB/s are possible (running otherwise
    identical software).
  * used one-size-fits-all defaults for RxFIFO
    Remedied with some upper limits calculated at runtime, but the limit
    was not taking into account complete configuration.
  * was unable to effectively handle multiple alternate settings
    This did lead to set interface failures where e.g. HID interface
    with wMaxPacketSize > 64 was not using the highest numbered IN
    endpoint within active configuration.

Instead of trying to come up with bandaid for each issue that would
significantly complicate the code, just shift the responsibility towards
firmware developer. While for many typical applications board defaults
are perfectly fine, the application developer may want to come up with
specific devicetree overrides best suited for their use case.

Last time the 1-to-1 endpoint number was attempted to be fixed (#95654) it went out of budget during review.

There is also a hidden cost in delaying this PR: every time new snps,dwc2 entry gets added it will be without the fifo sizes. This will make getting this at later time much more difficult.

Add g-rx-fifo-size, g-np-tx-fifo-size and g-tx-fifo-size required
properties to snps,dwc2 compatible. Property names are the same as used
in Linux because Zephyr aims for devicetree source compatibility with
other operating systems.

Specifying fifo sizes in devicetree greatly reduces necessary driver
complexity and allows application developer to adjust the sizes if
necessary to best utilize underlying hardware.

Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
Dynamically allocating fifo sizes leads to very complex code that yields
subpar results. DWC2 controller has very specific requirements related
to configured FIFO sizes that significantly increase the complexity:
  * each fifo has specific upper bound on both size and starting
    location (fixed for each silicon)
  * assigned locations must be contiguous
  * assigned locations cannot be changed on-the-fly

Zephyr tried to dynamically assign fifos using simple but essentially
broken algorithm that:
  * assumed 1-to-1 endpoint number to TxFIFO mapping
    DWC2 controller can be configured with non-contiguous IN endpoint
    numbers, but TxFIFOs are always contiguous.
  * allocated minimum required space for each endpoint
    There is no benefit in using less SPRAM than hardware has available,
    but using only 128 locations per bulk endpoint significantly limits
    maximum transfer rate when using DMA. At least 256 locations allows
    DMA to load next packet data before previous packet is transmitted
    on the bus. Actual performance degradation depends on clock rates
    and latencies. MSC on nRF54H20 using bulk endpoints with only 128
    locations can achieve no more than 15.6 MB/s throughput, while with
    256 locations speeds up to 36.9 MB/s are possible (running otherwise
    identical software).
  * used one-size-fits-all defaults for RxFIFO
    Remedied with some upper limits calculated at runtime, but the limit
    was not taking into account complete configuration.
  * was unable to effectively handle multiple alternate settings
    This did lead to set interface failures where e.g. HID interface
    with wMaxPacketSize > 64 was not using the highest numbered IN
    endpoint within active configuration.

Instead of trying to come up with bandaid for each issue that would
significantly complicate the code, just shift the responsibility towards
firmware developer. While for many typical applications board defaults
are perfectly fine, the application developer may want to come up with
specific devicetree overrides best suited for their use case.

Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Boards/SoCs area: Devicetree Binding PR modifies or adds a Device Tree binding area: Devicetree Bindings area: USB Universal Serial Bus area: Xtensa Xtensa Architecture platform: ESP32 Espressif ESP32 platform: Intel SoC FPGA Agilex Intel Corporation, SoC FPGA Agilex platform: nRF Nordic nRFx

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants