Skip to content

subsys: usb: cdc_ncm/ecm: Clear data iface in suspend#103836

Open
nandojve wants to merge 2 commits intozephyrproject-rtos:mainfrom
nandojve:usb/fix_ecm_ncm_suspend
Open

subsys: usb: cdc_ncm/ecm: Clear data iface in suspend#103836
nandojve wants to merge 2 commits intozephyrproject-rtos:mainfrom
nandojve:usb/fix_ecm_ncm_suspend

Conversation

@nandojve
Copy link
Copy Markdown
Member

Using the CDC-NCM/ECM with VBUS detection leads to problems when device is suspended. Clearing the data iface state solved the issue.

Using the CDC-NCM with VBUS detection leads to problems when device is
suspended. Clearing the data iface state solved the issue.

Signed-off-by: BUDKE Gerson Fernando <gerson.budke@leica-geosystems.com>
Using the CDC-ECM with VBUS detection leads to problems when device is
suspended. Clearing the data iface state solved the issue.

Signed-off-by: BUDKE Gerson Fernando <gerson.budke@leica-geosystems.com>
@nandojve nandojve added this to the v4.4.0 milestone Feb 10, 2026
@zephyrbot zephyrbot added the area: USB Universal Serial Bus label Feb 10, 2026
@sonarqubecloud
Copy link
Copy Markdown

const struct device *dev = usbd_class_get_private(c_data);
struct cdc_ncm_eth_data *data = dev->data;

atomic_clear_bit(&data->state, CDC_NCM_DATA_IFACE_ENABLED);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is absolutely no explanation why it could cause the problem or why it is the right fix. Finally, the one has nothing to do with the other, they are just different ends of the subsystem.

Copy link
Copy Markdown
Member Author

@nandojve nandojve Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jfischer-no ,

Root Cause

The bug is a stale state problem in the usbd_cdc_ncm_suspended() handler when VBUS detection is used.

Before the fix (original code), the suspend handler only did:
atomic_set_bit(&data->state, CDC_NCM_CLASS_SUSPENDED);

It did not clear CDC_NCM_DATA_IFACE_ENABLED.

Why this matters with VBUS detection:

When a USB cable is physically unplugged (VBUS lost), the USB stack fires a suspend event — not a disable event. The disable handler properly clears both flags:

// disable handler (line 868-869) — correct
atomic_clear_bit(&data->state, CDC_NCM_DATA_IFACE_ENABLED);
atomic_clear_bit(&data->state, CDC_NCM_CLASS_SUSPENDED);

But suspend only set CDC_NCM_CLASS_SUSPENDED, leaving CDC_NCM_DATA_IFACE_ENABLED still set. This caused problems because:

  1. cdc_ncm_out_start() (line 612): After processing an OUT transfer, if CDC_NCM_DATA_IFACE_ENABLED is still set, it tries to restart the OUT transfer on a suspended/disconnected bus.
  2. cdc_ncm_send_notification() (line 679): The check for CDC_NCM_DATA_IFACE_ENABLED passes (it's still set), then the check for CDC_NCM_CLASS_SUSPENDED blocks it — but only for notifications. Other paths that only check CDC_NCM_DATA_IFACE_ENABLED would incorrectly think the interface is still active.
  3. Resume path (line 888): usbd_cdc_ncm_resumed() only clears CDC_NCM_CLASS_SUSPENDED — it does not re-set CDC_NCM_DATA_IFACE_ENABLED. So after a suspend→resume cycle triggered by VBUS reconnect, the data interface flag would still be set from the previous connection, but the host hasn't actually re-enabled
    the alternate setting yet. This means the driver could try to use endpoints that haven't been re-configured by the host.

The fix adds one line to the suspend handler:

atomic_clear_bit(&data->state, CDC_NCM_DATA_IFACE_ENABLED);  // NEW
atomic_set_bit(&data->state, CDC_NCM_CLASS_SUSPENDED);

This ensures that on suspend (VBUS disconnect), the driver correctly marks the data interface as disabled — matching what disable already does. The host must re-select alternate setting 1 after resume to re-enable data transfer, which will properly set the flag again via usbd_cdc_ncm_update() (line 848).

Do you want this details in the commit message or in the PR description?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that on suspend (VBUS disconnect)

VBUS disconnect is not synonym to suspend. There is very significant difference between suspended and disconnected bus. In suspend, all the internal state must be preserved.

When suspended the current consumption must be below USB Suspend current limit (2.5 mA). I see no reason why suspend would have to limit enqueuing transfers by the stack or why it would have to result in interface being disabled (I believe enqueuing IN transfer should trigger Remote Wakeup if host allowed it, but this is out of scope here).

Copy link
Copy Markdown
Member Author

@nandojve nandojve Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tmon-nordic ,

You are correct that VBUS disconnect and suspend are different per the USB spec, and that during a standard host-initiated suspend, internal state should be preserved. However, let me explain the specific hardware scenario where this breaks down.

Hardware setup: The STM32 is connected downstream of a USB HUB, with GPIO-based VBUS detection on the upstream port.

What happens on cable unplug:

  1. The upstream cable is removed from the host
  2. The HUB loses upstream signaling and stops driving the downstream port
  3. The STM32 sees bus idle → hardware fires suspend interrupt → class .suspended() is called
  4. VBUS on the downstream port persists (hub capacitors, power rail decay) — GPIO VBUS detection has not fired yet
  5. Eventually VBUS drops → UDC_EVT_VBUS_REMOVED → application calls usbd_disable() → class .disable()

With a HUB in the path, the window between step 3 and step 5 is not a microsecond glitch — it can be hundreds of milliseconds. During this window, CDC_NCM_DATA_IFACE_ENABLED is still set, and code paths like cdc_ncm_out_start() (line 612) attempt to re-enqueue transfers on a dead bus.

Additionally, on this platform we power off the USB PHY and the USB HUB when VBUS is removed to meet power budget requirements. Once the PHY is off, the bus is physically dead — no host can resume, no transfer can succeed, and preserving internal state has no practical meaning. When power is eventually restored, the host must fully re-enumerate and re-select alternate setting 1, which re-sets CDC_NCM_DATA_IFACE_ENABLED via usbd_cdc_ncm_update().

So in this scenario, the class driver cannot distinguish a "cable-unplug suspend" from a host-initiated suspend, and the PHY is powered off in both cases regardless. Clearing CDC_NCM_DATA_IFACE_ENABLED in .suspended() matches the actual hardware state — the data interface is no longer functional — and causes no harm on the resume path since the host must reconfigure anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we essentially missing resetting the USB device state on VBUS removal?

Related discussion #100809 (comment) where resetting USB device state on VBUS removal would allow simplifying the code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing CDC_NCM_DATA_IFACE_ENABLED in .suspended() matches the actual hardware state — the data interface is no longer functional — and causes no harm on the resume path since the host must reconfigure anyway.

What do you mean by "host must reconfigure anyway"? Host does not have to reconfigure on resume. The configuration is already applied and host does not change it on resume.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the fix (original code), the suspend handler only did: atomic_set_bit(&data->state, CDC_NCM_CLASS_SUSPENDED);

It did not clear CDC_NCM_DATA_IFACE_ENABLED.

It must not clear CDC_NCM_DATA_IFACE_ENABLED. A suspended device may have alternate interface selected and must maintain this state.

Why this matters with VBUS detection:

When a USB cable is physically unplugged (VBUS lost), the USB stack fires a suspend event — not a disable event. The disable handler properly clears both flags:

I suppose you are observing this on a device controller without VBUS detection?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmon-nordic and @jfischer-no ,

I hope the scenario is very clear at this point. The argumentation is on top of that and you need to understand this is a clear scenario and this create a deadlock.

If this is a solution that may not work on all cases, let's enumerate the scenarios.

Clearing CDC_NCM_DATA_IFACE_ENABLED in .suspended() matches the actual hardware state — the data interface is no longer functional — and causes no harm on the resume path since the host must reconfigure anyway.

What do you mean by "host must reconfigure anyway"? Host does not have to reconfigure on resume. The configuration is already applied and host does not change it on resume.

The cable was unplugged so there is no host in the upstream port on the HUB.

Before the fix (original code), the suspend handler only did: atomic_set_bit(&data->state, CDC_NCM_CLASS_SUSPENDED);
It did not clear CDC_NCM_DATA_IFACE_ENABLED.

It must not clear CDC_NCM_DATA_IFACE_ENABLED. A suspended device may have alternate interface selected and must maintain this state.

You mean, a suspended device that still have VBUS active, right? Otherwise what happen with a device receive a from the HUB suspend and then detect VBUS removal ?

I suppose you are observing this on a device controller without VBUS detection?

I strong believe that this should not matter. If there is a suspend + vbus removal we have a deadlock without doing what I'm exercising this this PR.

@tmon-nordic , @jfischer-no ,

At end, we need to cover this use case, which seems to be valid. It should not matter if the VBUS detector is in the controller or it is using a GPIO. Even the STM32 HS-PHY/Controller from STM32U5xxxx may suffer from a slow ramp in VBUS if there are capacitors in the line because VBUS can power the circuit.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At end, we need to cover this use case, which seems to be valid. It should not matter if the VBUS detector is in the controller or it is using a GPIO.

Yes, it should not matter. But if CDC_NCM_DATA_IFACE_ENABLED is cleared, then it has to be on VBUS removal not on suspend.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmon-nordic ,

Yes, it should not matter. But if CDC_NCM_DATA_IFACE_ENABLED is cleared, then it has to be on VBUS removal not on suspend.

I'll check this and provide feedback.

@jfischer-no jfischer-no removed this from the v4.4.0 milestone Feb 10, 2026
@nandojve nandojve added this to the v4.4.0 milestone Feb 11, 2026
@nandojve nandojve requested a review from jfischer-no February 11, 2026 08:32
@jfischer-no jfischer-no removed this from the v4.4.0 milestone Feb 24, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 7 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions Bot added the Stale label Mar 27, 2026
@nandojve nandojve removed the Stale label Mar 27, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 7 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions Bot added the Stale label Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: USB Universal Serial Bus Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants