Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Default secure session eviction policy is broken when a new session is established #30728

Closed
bzbarsky-apple opened this issue Nov 29, 2023 · 0 comments · Fixed by #30806
Labels
bug Something isn't working core needs triage p1 priority 1 work
Milestone

Comments

@bzbarsky-apple
Copy link
Contributor

Reproduction steps

  1. Start in a state where the secure session table is full (with one of the slots being the pre-allocated session that CASEServer will use for new incoming CASE attempts), with only one session per peer in the table. Ideally all on the same fabric.
  2. Have a CASE initiator that does not have any existing sessions in the session table try to establish CASE.
  3. The new CASE session is established.
  4. We try to allocate a new secure session to listen for more incoming CASE (see step 1).
  5. This evicts the just-established CASE session.

This happens because CASE passes the just-established ScopedNodeId as an eviction hint to session allocation. When DefaultEvictionPolicy does its sort it:

  1. Prioritizes fabrics with more sessions. If all the sessions are on the same fabric, this is a no-op.
  2. Prioritizes sessions that match the hint's fabric. Again, if there is only one fabric this is all a no-op.
  3. Prioritizes peers that have more sessions to them. Since by assumption there is one session per peer, this is also a no-op.
  4. Prioritizes sessions that match the eviction hint. This will prioritize the just-established session.

Only after that are things like defunct state and activity time checked. So you can have lost of defunct sessions to various peers and end up evicting the just-established session.

Bug prevalence

Every time, given the preconditions

GitHub hash of the SDK that was being used

b379158

Platform

core

Platform Version(s)

No response

Anything else?

In practice we have a nonempty hint in two situations:

  1. For a CASE initiator, it's the peer ID we're establishing a session to. In this case, we can assume that we do not have an existing active session, since CASESessionManager would have just used it.
  2. For a CASE responder, it's the peer ID that we just established a new session. In that case we definitely have an active session and we do not want to evict it.

So what I would propose is that we treat a session as not matching the eviction hint if it's (1) active and (2) has mNumMatchingOnPeer == 0 (which means it's the only active session to that peer).

@bzbarsky-apple bzbarsky-apple added bug Something isn't working needs triage labels Nov 29, 2023
@woody-apple woody-apple added p1 priority 1 work core labels Dec 4, 2023
@woody-apple woody-apple added this to the 1.3 Release milestone Dec 4, 2023
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Dec 4, 2023
When we establish a session as a CASE responder, we try to allocate a new
session to listen to session establishments, and use the
just-established-session's peer ID as the eviction hint.

If we had a bunch of active sessions on a fabric, all to different nodes,
this would cause the just-established session to be evicted, since it matched
the hint.

The fix is to only consider sessions for eviction based on the hint if they are
either non-active or not unique sessions to the peer (at which point, the
just-established session should be last in priority order for eviction).

Fixes project-chip#30728
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Dec 4, 2023
When we establish a session as a CASE responder, we try to allocate a new
session to listen to session establishments, and use the
just-established-session's peer ID as the eviction hint.

If we had a bunch of active sessions on a fabric, all to different nodes,
this would cause the just-established session to be evicted, since it matched
the hint.

The fix is to only consider sessions for eviction based on the hint if they are
either non-active or not unique sessions to the peer (at which point, the
just-established session should be last in priority order for eviction).

Fixes project-chip#30728
@mergify mergify bot closed this as completed in #30806 Dec 5, 2023
mergify bot pushed a commit that referenced this issue Dec 5, 2023
When we establish a session as a CASE responder, we try to allocate a new
session to listen to session establishments, and use the
just-established-session's peer ID as the eviction hint.

If we had a bunch of active sessions on a fabric, all to different nodes,
this would cause the just-established session to be evicted, since it matched
the hint.

The fix is to only consider sessions for eviction based on the hint if they are
either non-active or not unique sessions to the peer (at which point, the
just-established session should be last in priority order for eviction).

Fixes #30728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core needs triage p1 priority 1 work
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants