Skip to content

usb: device_next: Allocate data IN stage in handler#102491

Open
tmon-nordic wants to merge 2 commits intozephyrproject-rtos:mainfrom
tmon-nordic:wlength
Open

usb: device_next: Allocate data IN stage in handler#102491
tmon-nordic wants to merge 2 commits intozephyrproject-rtos:mainfrom
tmon-nordic:wlength

Conversation

@tmon-nordic
Copy link
Copy Markdown
Contributor

For control transfers with data stage from device to host, only the handler knows how large buffer is necessary (by specification host is allowed to request up to 65535 bytes for every control read transfer). While generally hosts do keep wLength relatively low, there are cases where the value is big enough that the buffer cannot be allocated with
default UDC_BUF_POOL_SIZE value. Right now the only workaround is to increase UDC_BUF_POOL_SIZE to large enough value but this is not a visable solution on memory constrained targets. For example USB3CV Mass Storage tests read string descriptor 0 with wLength set to 4096, while the Zephyr response to this request is 4 bytes (and therefore 4 byte long buffer is enough).

Move the allocation to handler to allow the code to allocate as small buffer as possible to handle the transfer. This allows MSC sample to pass USB3CV with default UDC_BUF_POOL_SIZE value.

return udc_ep_buf_alloc(dev, ep, size);
}

struct net_buf *udc_ctrl_data_alloc(const struct device *dev,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating buffer in reponse to control transfer is part of processing
and therefore should not be done in UDC driver but rather in the stack.

This commit only changes data IN and status stage buffer allocation,
because doing so is possible without modifying UDC drivers. While the
buffer for Data OUT should also ideally be allocated by the stack, doing
so would require rather significant changes to drivers (it is expected
that such change would overall simplify the UDC driver design).

If it would be that easy to allocate the buffer in the stack, and if it would always work with all the controllers without buffer bouncing, then it would already be the case. Allocating IN packets is there mostly for consistency, and first it is needed to find the solution for the OUT packet allocation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to handle Data OUT packet allocation separately as it will require changes in all UDC drivers, which significantly increase the scope.

Comment thread include/zephyr/usb/usbd.h Outdated
* static int foo_to_host_cb(const struct usbd_context *const ctx,
* const struct usb_setup_packet *const setup,
* struct net_buf *const buf)
* struct net_buf **const pbuf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against this change, as it adds more error-prone complexity to the user code. I suggest we continue to use pre-allocated memory, but quote it to a value based on the available memory in the pool.

Copy link
Copy Markdown
Contributor Author

@tmon-nordic tmon-nordic Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocating based on available memory in the pool will make the handling even more convoluted in the handlers, because it will require the handler to check if buffer it received can hold response.

This is error prone, because if handler does not check the length then it can reply with partial response that is smaller than wLength and smaller than the actual amount of data. Doing so would be against USB specification - returning STALL in such case is the appropriate action (request cannot be handled).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question from me @jfischer-no. If the callback handler is the only one that actually knows the required size, won't we still be wasting a lot of memory even if we limit it to the available memory in the pool?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only that, but then also the memory waste will mean that depending on which request comes first, it will succeed or else run out of memory?

Copy link
Copy Markdown
Contributor

@josuah josuah Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmon-nordic could you confirm I understood correctly?

  • wLength gives the maximum amount of memory that the protocol allows for the response: the maximum announced by the host.

  • There could be a much lower maximum Zephyr-side, if the class knows it will never answer with large responses.

When variable size responses are allowed by a class spec, the host better set a very large wLength to make sure to have enough room, and Zephyr better not try to allocate that, but instead use its own limits?

MIN(wLength, usbd_xxxxx_max_response_size)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDC driver is not synchronized against USB stack

It does seem possible for an UDC driver to keep it in its internal FIFO as long as it needs...
For control transfers in particular, it seems like past this fact, they still need to implement some kind of state machine: https://github.com/josuah/zephyr/blob/main/drivers/usb/udc/udc_dwc2.c#L873-L1062

Do you think this would give enough guarantee that buffers are released before the next one is allocated?
Due to the fact that control buffers always follow a SETUP -> DATA -> STATUS without allowing other setup from other class to happen between these 3 (or 2 if nodata) steps.

However, even if a single control buffer remains allocated at all time, this might still require some change: to always allocate the entire pool size: allocating would always succeed, but filling the buffer would fail if the allocated size is too small.

I think this might be what @jfischer-no said here?

actual length of the DATA IN/OUT stage is significantly smaller than wLength does not mean that exactly that much buffer is allocated. So "wasting" is always there

Copy link
Copy Markdown
Contributor

@josuah josuah Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong on what works and what does not, but I will try to name the alternatives:

  • A this PR: let the class allocate just enough data
  • B always allocate all the available memory assuming only one buffer is allocated at a time and separate pool for CTRL
  • C slight modification of A to avoid carrying a &pbuf pointer around

For C it is #102491 (review)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this would give enough guarantee that buffers are released before the next one is allocated?

No. When UDC submits buffer to stack, UDC no longer is able to free it before USB stack enqueues it (which is good because otherwise there would be use-after-free races).

In my rework (which I truly prefer to be out-of-scope of this PR), I effectively make USB stack to let UDC know when it is ready to process next SETUP - if new SETUP comes before USB stack is ready, then UDC is responsible for cancelling (completing with error) any queued data or status stage buffer.

Due to the fact that control buffers always follow a SETUP -> DATA -> STATUS without allowing other setup from other class to happen between these 3 (or 2 if nodata) steps.

On the bus yes, but the problem is with multiple threads here (which are not synchronized in current UDC model).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When UDC submits buffer to stack, UDC no longer is able to free it before USB stack enqueues it

Either of these 3 functions are called from the USB buffer once a control message is completely processed:

int udc_ctrl_submit_s_out_status(const struct device *dev, struct net_buf *const dout);
int udc_ctrl_submit_s_in_status(const struct device *dev);
int udc_ctrl_submit_s_status(const struct device *dev);

Which always ends-up with a call to udc_submit_ep_event(dev, data->setup, ret); which submits the data->setup buffer, along with the other buffers (data/status) chained in buf->frag.

So now we are on a single thread... however, the USB drivers are still responsible for allocating and freeing some buffers (grepping alloc in drivers/usb/udc), so if I understand correctly, for B to work, this would first require more modifications of the UDC API to ensure there is no use-after-free possible.

Sorry for such slow step-by-step questioning, I see how your proposition solve the problem and am trying to see how the alternatives would not solve it, or if it is a compromise to take.

Thank you once again!

Copy link
Copy Markdown
Contributor Author

@tmon-nordic tmon-nordic Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have multiple buffers allocated in following example scenario:

  1. SETUP is received with wLength=1024, with data going from host to device. UDC allocates buffer for SETUP and Data OUT, starts receiving Data OUT.
  2. Data OUT is completely received. UDC submits SETUP + Data OUT to USB stack.
  3. Host timeouts, sends new SETUP with wLength=1024. UDC allocates buffer for SETUP and Data OUT, starts receiving Data OUT.
  4. USB stack thread finishes processing SETUP + Data OUT from point 2.

After point 3, there would be two buffers allocated for SETUP data, and two for Data OUT. This is of course pretty extreme scenario that would require very precise timing. In order to solve this, a rework is necessary (which is what I am working on) as there simply isn't any simple solution possible within the current UDC model.

Comment thread include/zephyr/usb/usbd.h
* const struct usb_setup_packet *const setup,
* struct net_buf **const pbuf)
* static net_buf *foo_to_host_cb(const struct usbd_context *const ctx,
* const struct usb_setup_packet *const setup)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this commit makes it simpler, but rather inconsistent. You actually add more code, +511 −319.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The starting point is consistent but IMHO overly complicated and not really well suited for memory constrained devices.

Comment thread subsys/usb/device_next/usbd_ch9.c Outdated
Comment thread subsys/usb/device_next/usbd_ch9.c
@sonarqubecloud
Copy link
Copy Markdown

Comment thread subsys/usb/device_next/usbd_ch9.c Outdated
Copy link
Copy Markdown
Contributor

@MarkWangChinese MarkWangChinese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I prefer to alloc the buf where the buf is used. This way, the user, who possesses all the necessary information, can handle the buffer allocation.
All the observations are based on the Zephyr main stream. Here is a breakdown of my thoughts for different control transfers.

  • For (s - data in - status out) control transfer:
    • The “data in” buf is currently allocated in controller driver and used in upper layers (ch9 or classes) now, then it is enqueued in the ch9. Since the buf is filled in the upper layer, I think the upper layers should alloc the buf, the ch9 (common place) enqueue it. The controller driver release the buf after it is transferred.
    • The “status out” buf is currently allocated and enqueued directly in controller driver. The upper layer doesn’t need to use the buf, so I think it is ok to keep it in controller driver. But putting it in a common upper place can reducing the complexity of controller driver.
  • For (s - data out - status in) control transfer:
    • The “data out” buf is currently allocated and enqueued directly in controller driver. Howerver, I think the upper layer should check the setup before receiving the out data, as the upper layer may stall control endpoints based on setup. And the out data is actually used in the upper layer, so I think it would be more logical for the upper layer to handle buf allocation, the enqueuing buf can be put in common place (ch9).
    • The “status in” buf is currently allocated in controller driver and enqueued in upper layer. I suggest to allocate and enqueue it in upper layer common place (ch9) because upper layer may stall control endpoints based on the out data.
  • For (s-status in) control transfer:
    • The “status in” buf is currently allocated in controller driver and enqueued in the upper layer. I suggest to allocate and enqueue it in upper layer common place (ch9) because upper layer may stall control endpoints based on setup.

If the most buf allocations are moved to upper layer, I prefer to move all allocations there. By doing so, the control transfer state machine, which is currently partly maintained by the controller driver, can also be moved to the upper layer (ch9). This way, only one implementation is enough in the upper layer. Currently, every controller driver needs to call udc_ctrl_update_stage to maintain the control transfer states. If we move this responsibility to the upper layer, the controller driver can focus solely on data transactions.

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

IMO, I prefer to alloc the buf where the buf is used. This way, the user, who possesses all the necessary information, can handle the buffer allocation.

I consider this to be agreement that this PR is correct.

If the most buf allocations are moved to upper layer, I prefer to move all allocations there. By doing so, the control transfer state machine, which is currently partly maintained by the controller driver, can also be moved to the upper layer (ch9). This way, only one implementation is enough in the upper layer. Currently, every controller driver needs to call udc_ctrl_update_stage to maintain the control transfer states. If we move this responsibility to the upper layer, the controller driver can focus solely on data transactions.

I am working on it, but I would prefer to be in separate PR because it will touch all the UDC drivers. IMHO the UDC drivers will be a whole lot simpler after the rework, but it is quite some work (I am doing it initially for DWC2 and nRF USBD drivers).

@tmon-nordic
Copy link
Copy Markdown
Contributor Author

tmon-nordic commented Feb 4, 2026

If the most buf allocations are moved to upper layer, I prefer to move all allocations there. By doing so, the control transfer state machine, which is currently partly maintained by the controller driver, can also be moved to the upper layer (ch9). This way, only one implementation is enough in the upper layer. Currently, every controller driver needs to call udc_ctrl_update_stage to maintain the control transfer states. If we move this responsibility to the upper layer, the controller driver can focus solely on data transactions.

Please see #103493 for implementation

For control transfers with data stage from device to host, only the
handler knows how large buffer is necessary (by specification host is
allowed to request up to 65535 bytes for every control read transfer).
While generally hosts do keep wLength relatively low, there are cases
where the value is big enough that the buffer cannot be allocated with
default UDC_BUF_POOL_SIZE value. Right now the only workaround is to
increase UDC_BUF_POOL_SIZE to large enough value but this is not a
visable solution on memory constrained targets. For example USB3CV
Mass Storage tests read string descriptor 0 with wLength set to 4096,
while the Zephyr response to this request is 4 bytes (and therefore 4
byte long buffer is enough).

Move the allocation to handler to allow the code to allocate as small
buffer as possible to handle the transfer. This allows MSC sample to
pass USB3CV with default UDC_BUF_POOL_SIZE value.

Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When USB control request cannot be handled the endpoint should be
stalled regardless of the reason. Simplify control request to host
handlers by changing the return type to the allocated buffer. Returning
NULL is enough to indicate to device stack that the request should be
stalled.

Keep the errno for the time being, but in my opinion it is not necessary
at all.

Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 7 days. Note, that you can always re-open a closed pull request at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: dap area: Samples Samples area: USB Universal Serial Bus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants