Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Commissioning reads have a timeout that's too low #36803

Closed
bzbarsky-apple opened this issue Dec 11, 2024 · 0 comments · Fixed by #36812
Closed

[BUG] Commissioning reads have a timeout that's too low #36803

bzbarsky-apple opened this issue Dec 11, 2024 · 0 comments · Fixed by #36812
Assignees
Labels
bug Something isn't working needs triage

Comments

@bzbarsky-apple
Copy link
Contributor

Reproduction steps

See https://github.com/project-chip/connectedhomeip/actions/runs/12271203864/job/34238154369

We are failing commissioning because we do a read and time out after 2 seconds, while the server takes 4 seconds to produce a response.

But spec says:

Whenever the Fail-Safe timer is armed, Commissioners and Administrators SHALL NOT consider any cluster operation to have timed-out before waiting at least 30 seconds for a valid response from the cluster server.

This used to work correctly before #36603, but that PR replaced this logic in AutoCommissioner::GetCommandTimeout:

        timeout = app::kExpectedIMProcessingTime;
...
    auto sessionHandle = device->GetSecureSession();
    if (sessionHandle.HasValue())
    {
        timeout = sessionHandle.Value()->ComputeRoundTripTimeout(timeout);
    }

    // Enforce the spec minimal timeout.  Maybe this enforcement should live in
    // the DeviceCommissioner?
    if (timeout < kMinimumCommissioningStepTimeout)
    {
        timeout = kMinimumCommissioningStepTimeout;
    }

    return MakeOptional(timeout);

with:

    const auto timeout = MakeOptional(app::kExpectedIMProcessingTime); // TODO: Save timeout from PerformCommissioningStep?

which was not obviously an issue because our timeout logic is so convoluted.

Bug prevalence

Always, but only affects commissioning sometimes, since this is timing-dependent

GitHub hash of the SDK that was being used

9c8a552

Platform

core

Platform Version(s)

No response

Anything else?

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
None yet
2 participants