usb: device_next: Allocate data IN stage in handler#102491
usb: device_next: Allocate data IN stage in handler#102491tmon-nordic wants to merge 2 commits intozephyrproject-rtos:mainfrom
Conversation
5bff9e0 to
466e714
Compare
| return udc_ep_buf_alloc(dev, ep, size); | ||
| } | ||
|
|
||
| struct net_buf *udc_ctrl_data_alloc(const struct device *dev, |
There was a problem hiding this comment.
Allocating buffer in reponse to control transfer is part of processing
and therefore should not be done in UDC driver but rather in the stack.
This commit only changes data IN and status stage buffer allocation,
because doing so is possible without modifying UDC drivers. While the
buffer for Data OUT should also ideally be allocated by the stack, doing
so would require rather significant changes to drivers (it is expected
that such change would overall simplify the UDC driver design).
If it would be that easy to allocate the buffer in the stack, and if it would always work with all the controllers without buffer bouncing, then it would already be the case. Allocating IN packets is there mostly for consistency, and first it is needed to find the solution for the OUT packet allocation.
There was a problem hiding this comment.
I would prefer to handle Data OUT packet allocation separately as it will require changes in all UDC drivers, which significantly increase the scope.
| * static int foo_to_host_cb(const struct usbd_context *const ctx, | ||
| * const struct usb_setup_packet *const setup, | ||
| * struct net_buf *const buf) | ||
| * struct net_buf **const pbuf) |
There was a problem hiding this comment.
I am against this change, as it adds more error-prone complexity to the user code. I suggest we continue to use pre-allocated memory, but quote it to a value based on the available memory in the pool.
There was a problem hiding this comment.
Allocating based on available memory in the pool will make the handling even more convoluted in the handlers, because it will require the handler to check if buffer it received can hold response.
This is error prone, because if handler does not check the length then it can reply with partial response that is smaller than wLength and smaller than the actual amount of data. Doing so would be against USB specification - returning STALL in such case is the appropriate action (request cannot be handled).
There was a problem hiding this comment.
Question from me @jfischer-no. If the callback handler is the only one that actually knows the required size, won't we still be wasting a lot of memory even if we limit it to the available memory in the pool?
There was a problem hiding this comment.
Not only that, but then also the memory waste will mean that depending on which request comes first, it will succeed or else run out of memory?
There was a problem hiding this comment.
@tmon-nordic could you confirm I understood correctly?
-
wLengthgives the maximum amount of memory that the protocol allows for the response: the maximum announced by the host. -
There could be a much lower maximum Zephyr-side, if the class knows it will never answer with large responses.
When variable size responses are allowed by a class spec, the host better set a very large wLength to make sure to have enough room, and Zephyr better not try to allocate that, but instead use its own limits?
MIN(wLength, usbd_xxxxx_max_response_size)
There was a problem hiding this comment.
UDC driver is not synchronized against USB stack
It does seem possible for an UDC driver to keep it in its internal FIFO as long as it needs...
For control transfers in particular, it seems like past this fact, they still need to implement some kind of state machine: https://github.com/josuah/zephyr/blob/main/drivers/usb/udc/udc_dwc2.c#L873-L1062
Do you think this would give enough guarantee that buffers are released before the next one is allocated?
Due to the fact that control buffers always follow a SETUP -> DATA -> STATUS without allowing other setup from other class to happen between these 3 (or 2 if nodata) steps.
However, even if a single control buffer remains allocated at all time, this might still require some change: to always allocate the entire pool size: allocating would always succeed, but filling the buffer would fail if the allocated size is too small.
I think this might be what @jfischer-no said here?
actual length of the DATA IN/OUT stage is significantly smaller than wLength does not mean that exactly that much buffer is allocated. So "wasting" is always there
There was a problem hiding this comment.
I might be wrong on what works and what does not, but I will try to name the alternatives:
- A this PR: let the class allocate just enough data
- B always allocate all the available memory assuming only one buffer is allocated at a time and separate pool for CTRL
- C slight modification of A to avoid carrying a
&pbufpointer around
For C it is #102491 (review)
There was a problem hiding this comment.
Do you think this would give enough guarantee that buffers are released before the next one is allocated?
No. When UDC submits buffer to stack, UDC no longer is able to free it before USB stack enqueues it (which is good because otherwise there would be use-after-free races).
In my rework (which I truly prefer to be out-of-scope of this PR), I effectively make USB stack to let UDC know when it is ready to process next SETUP - if new SETUP comes before USB stack is ready, then UDC is responsible for cancelling (completing with error) any queued data or status stage buffer.
Due to the fact that control buffers always follow a SETUP -> DATA -> STATUS without allowing other setup from other class to happen between these 3 (or 2 if nodata) steps.
On the bus yes, but the problem is with multiple threads here (which are not synchronized in current UDC model).
There was a problem hiding this comment.
When UDC submits buffer to stack, UDC no longer is able to free it before USB stack enqueues it
Either of these 3 functions are called from the USB buffer once a control message is completely processed:
int udc_ctrl_submit_s_out_status(const struct device *dev, struct net_buf *const dout);
int udc_ctrl_submit_s_in_status(const struct device *dev);
int udc_ctrl_submit_s_status(const struct device *dev);
Which always ends-up with a call to udc_submit_ep_event(dev, data->setup, ret); which submits the data->setup buffer, along with the other buffers (data/status) chained in buf->frag.
So now we are on a single thread... however, the USB drivers are still responsible for allocating and freeing some buffers (grepping alloc in drivers/usb/udc), so if I understand correctly, for B to work, this would first require more modifications of the UDC API to ensure there is no use-after-free possible.
Sorry for such slow step-by-step questioning, I see how your proposition solve the problem and am trying to see how the alternatives would not solve it, or if it is a compromise to take.
Thank you once again!
There was a problem hiding this comment.
You can have multiple buffers allocated in following example scenario:
- SETUP is received with wLength=1024, with data going from host to device. UDC allocates buffer for SETUP and Data OUT, starts receiving Data OUT.
- Data OUT is completely received. UDC submits SETUP + Data OUT to USB stack.
- Host timeouts, sends new SETUP with wLength=1024. UDC allocates buffer for SETUP and Data OUT, starts receiving Data OUT.
- USB stack thread finishes processing SETUP + Data OUT from point 2.
After point 3, there would be two buffers allocated for SETUP data, and two for Data OUT. This is of course pretty extreme scenario that would require very precise timing. In order to solve this, a rework is necessary (which is what I am working on) as there simply isn't any simple solution possible within the current UDC model.
| * const struct usb_setup_packet *const setup, | ||
| * struct net_buf **const pbuf) | ||
| * static net_buf *foo_to_host_cb(const struct usbd_context *const ctx, | ||
| * const struct usb_setup_packet *const setup) |
There was a problem hiding this comment.
I do not think this commit makes it simpler, but rather inconsistent. You actually add more code, +511 −319.
There was a problem hiding this comment.
The starting point is consistent but IMHO overly complicated and not really well suited for memory constrained devices.
|
MarkWangChinese
left a comment
There was a problem hiding this comment.
IMO, I prefer to alloc the buf where the buf is used. This way, the user, who possesses all the necessary information, can handle the buffer allocation.
All the observations are based on the Zephyr main stream. Here is a breakdown of my thoughts for different control transfers.
- For (s - data in - status out) control transfer:
- The “data in” buf is currently allocated in controller driver and used in upper layers (ch9 or classes) now, then it is enqueued in the ch9. Since the buf is filled in the upper layer, I think the upper layers should alloc the buf, the ch9 (common place) enqueue it. The controller driver release the buf after it is transferred.
- The “status out” buf is currently allocated and enqueued directly in controller driver. The upper layer doesn’t need to use the buf, so I think it is ok to keep it in controller driver. But putting it in a common upper place can reducing the complexity of controller driver.
- For (s - data out - status in) control transfer:
- The “data out” buf is currently allocated and enqueued directly in controller driver. Howerver, I think the upper layer should check the setup before receiving the out data, as the upper layer may stall control endpoints based on setup. And the out data is actually used in the upper layer, so I think it would be more logical for the upper layer to handle buf allocation, the enqueuing buf can be put in common place (ch9).
- The “status in” buf is currently allocated in controller driver and enqueued in upper layer. I suggest to allocate and enqueue it in upper layer common place (ch9) because upper layer may stall control endpoints based on the out data.
- For (s-status in) control transfer:
- The “status in” buf is currently allocated in controller driver and enqueued in the upper layer. I suggest to allocate and enqueue it in upper layer common place (ch9) because upper layer may stall control endpoints based on setup.
If the most buf allocations are moved to upper layer, I prefer to move all allocations there. By doing so, the control transfer state machine, which is currently partly maintained by the controller driver, can also be moved to the upper layer (ch9). This way, only one implementation is enough in the upper layer. Currently, every controller driver needs to call udc_ctrl_update_stage to maintain the control transfer states. If we move this responsibility to the upper layer, the controller driver can focus solely on data transactions.
I consider this to be agreement that this PR is correct.
I am working on it, but I would prefer to be in separate PR because it will touch all the UDC drivers. IMHO the UDC drivers will be a whole lot simpler after the rework, but it is quite some work (I am doing it initially for DWC2 and nRF USBD drivers). |
Please see #103493 for implementation |
For control transfers with data stage from device to host, only the handler knows how large buffer is necessary (by specification host is allowed to request up to 65535 bytes for every control read transfer). While generally hosts do keep wLength relatively low, there are cases where the value is big enough that the buffer cannot be allocated with default UDC_BUF_POOL_SIZE value. Right now the only workaround is to increase UDC_BUF_POOL_SIZE to large enough value but this is not a visable solution on memory constrained targets. For example USB3CV Mass Storage tests read string descriptor 0 with wLength set to 4096, while the Zephyr response to this request is 4 bytes (and therefore 4 byte long buffer is enough). Move the allocation to handler to allow the code to allocate as small buffer as possible to handle the transfer. This allows MSC sample to pass USB3CV with default UDC_BUF_POOL_SIZE value. Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
When USB control request cannot be handled the endpoint should be stalled regardless of the reason. Simplify control request to host handlers by changing the return type to the allocated buffer. Returning NULL is enough to indicate to device stack that the request should be stalled. Keep the errno for the time being, but in my opinion it is not necessary at all. Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
|
|
This pull request has been marked as stale because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 7 days. Note, that you can always re-open a closed pull request at any time. |



For control transfers with data stage from device to host, only the handler knows how large buffer is necessary (by specification host is allowed to request up to 65535 bytes for every control read transfer). While generally hosts do keep wLength relatively low, there are cases where the value is big enough that the buffer cannot be allocated with
default UDC_BUF_POOL_SIZE value. Right now the only workaround is to increase UDC_BUF_POOL_SIZE to large enough value but this is not a visable solution on memory constrained targets. For example USB3CV Mass Storage tests read string descriptor 0 with wLength set to 4096, while the Zephyr response to this request is 4 bytes (and therefore 4 byte long buffer is enough).
Move the allocation to handler to allow the code to allocate as small buffer as possible to handle the transfer. This allows MSC sample to pass USB3CV with default UDC_BUF_POOL_SIZE value.