Use time-based event index for app session events by gabrielcorado · Pull Request #38495 · gravitational/teleport

gabrielcorado · 2024-02-21T14:52:01Z

As described on the issue, the events were being overwritten on DynamoDB events storage because they had the same key pair (SessionID and Index). The problem is that every time a new session chunk is initialized, it uses a new recorder, which would then have the index start at 0. This would cause any previous event from the session to be overwritten.

The solution proposed on this PR is, for app sessions, to generate an event index based on the session start (here, the time the certificate was issued), removing the necessity of a distributed counter.

In addition, the solution also needed to take into account if different servers handle an app session. For app access, the sessions are not ended if the server handling it changes (HA). In case that happens, the recorder would start over at index 0 and overwrite all the events emitted by the proxy and other app server.

NOTE: This change only affects app session recordings; other resources will still use the incremental method.

This PR also adds a DynamoDB condition expression to guarantee events are not being overwritten on the storage layer.

changelog: Fix application access events being overwritten when using DynamoDB as event storage.

github-actions · 2024-02-21T14:52:37Z

The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with changelog: followed by the changelog entries for the PR.

codingllama · 2024-02-22T14:23:38Z

 }
+
+// certFromConnState returns certificate from the connection.
+func certFromConn(tlsConn *tls.Conn) *x509.Certificate {


Suggested change

func certFromConn(tlsConn *tls.Conn) *x509.Certificate {

func leafCertFromConn(tlsConn *tls.Conn) *x509.Certificate {

codingllama · 2024-02-22T14:39:02Z

Overall looks good, added a few questions as I'm not super familiar with these parts.

greedy52

The approach looks good to me. The key index is used in app sessions recordings as well. After this change, ppl will see big numbers as key index instead of 1,2,3, so we need to document this behavior somewhere.

greedy52 · 2024-02-22T15:55:34Z

-		TableName: aws.String(l.Tablename),
+		Item:                av,
+		TableName:           aws.String(l.Tablename),
+		ConditionExpression: aws.String("attribute_not_exists(SessionID) AND attribute_not_exists(EventIndex)"),


Would this break existing environments when auth is upgraded but app server is not yet?

Yes, as soon as they update their auth servers and events from older app servers, they will hit this, at least for the first app.session.chunk (which conflicts with app.session.start). As mentioned in the PR description, if the app server serving the requests changes, there will be more conflicts (generating more errors).

If we release this check with the patch, customers's auth will be "spammed" with this before app servers are upgraded. As discussed offline, it is preferred to handle this separately and ideally release it in a new major version. By that time, app servers should have been upgraded with this patch already and we can likely catch other events that have the same problem during the release testing.

Though, since this check does not block using the app, and we are losing events before app servers are upgraded anyway, I don't mind if we really want to release this together.

I prefer having an error log to silently overwriting audit events (current behavior).

@codingllama, Any thoughts on this?

I'm afraid I don't have enough context to have a strong opinion here.

greedy52 · 2024-02-22T16:24:43Z

+// valid date.
+func (s *Server) sessionStartTime(ctx context.Context) time.Time {
+	var startTime time.Time
+	if userCert, err := authz.UserCertificateFromContext(ctx); err == nil {


Indeed interesting approach. Chunks are every five minutes if I recall. So using nanoseconds ensures chunks are relatively in order. However in HA case, time on different app servers may not be perfectly in sync.

codingllama · 2024-02-22T17:38:08Z

After this change, ppl will see big numbers as key index instead of 1,2,3, so we need to document this behavior somewhere.

We could also use UnixNano for "smaller" numbers, but it'll still be quite large compared to "1,2,3".

Co-authored-by: Alan Parra <alan.parra@goteleport.com>

greedy52

Tested today and worked as expected 👍 . Do we expect this change affect other event backend?

greedy52 · 2024-02-26T14:36:30Z

-		TableName: aws.String(l.Tablename),
+		Item:                av,
+		TableName:           aws.String(l.Tablename),
+		ConditionExpression: aws.String("attribute_not_exists(SessionID) AND attribute_not_exists(EventIndex)"),


If we release this check with the patch, customers's auth will be "spammed" with this before app servers are upgraded. As discussed offline, it is preferred to handle this separately and ideally release it in a new major version. By that time, app servers should have been upgraded with this patch already and we can likely catch other events that have the same problem during the release testing.

Though, since this check does not block using the app, and we are losing events before app servers are upgraded anyway, I don't mind if we really want to release this together.

public-teleport-github-review-bot · 2024-02-29T03:35:16Z

@gabrielcorado See the table below for backport results.

Branch	Result
branch/v13	Failed
branch/v14	Create PR
branch/v15	Create PR

feat(events): add event indexer based on time

9dcd990

gabrielcorado requested a review from greedy52 February 21, 2024 14:52

gabrielcorado self-assigned this Feb 21, 2024

github-actions Bot requested review from codingllama and marcoandredinis February 21, 2024 14:52

github-actions Bot added application-access audit-log Issues related to Teleports Audit Log size/md labels Feb 21, 2024

zmb3 reviewed Feb 21, 2024

View reviewed changes

Comment thread lib/events/dynamoevents/dynamoevents_test.go Outdated

Comment thread lib/events/setter.go Outdated

Comment thread lib/events/setter_test.go Outdated

gabrielcorado added 2 commits February 21, 2024 14:04

chore(events): code review changes

4a4465e

chore: typos

b4bbae6

codingllama reviewed Feb 22, 2024

View reviewed changes

greedy52 reviewed Feb 22, 2024

View reviewed changes

Apply suggestions from code review

e8ba6da

Co-authored-by: Alan Parra <alan.parra@goteleport.com>

greedy52 self-requested a review February 23, 2024 18:48

greedy52 approved these changes Feb 26, 2024

View reviewed changes

chore: code review changes

3ba4322

gabrielcorado requested review from codingllama and zmb3 February 26, 2024 21:59

codingllama approved these changes Feb 27, 2024

View reviewed changes

public-teleport-github-review-bot Bot removed the request for review from marcoandredinis February 27, 2024 14:09

gabrielcorado added backport/branch/v14 labels Feb 28, 2024

gabrielcorado enabled auto-merge February 28, 2024 23:50

test(integration): ignore index field on audit events

84b0819

gabrielcorado added this pull request to the merge queue Feb 29, 2024

Merged via the queue into master with commit 89d1aa4 Feb 29, 2024

gabrielcorado deleted the gabrielcorado/avoid-index-collision-app-sessions branch February 29, 2024 03:33

This was referenced Feb 29, 2024

[v15] Use time-based event index for app session events #38815

Merged

[v14] Use time-based event index for app session events #38816

Merged

greedy52 mentioned this pull request Mar 26, 2024

Using SAML app gives ConditionalCheckFailedException on backend with DynamoDB #39833

Open

espadolini mentioned this pull request Apr 2, 2024

session.command event hits a ConditionalCheckFailedException on dynamoevents #40126

Closed

	func certFromConn(tlsConn tls.Conn) x509.Certificate {
	func leafCertFromConn(tlsConn tls.Conn) x509.Certificate {

Conversation

gabrielcorado commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 21, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codingllama commented Feb 22, 2024

Uh oh!

greedy52 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codingllama commented Feb 22, 2024

Uh oh!

greedy52 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

public-teleport-github-review-bot Bot commented Feb 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gabrielcorado commented Feb 21, 2024 •

edited

Loading

greedy52 left a comment •

edited

Loading