Skip to content

Determine the session type based on the events on session events streaming#49361

Closed
gabrielcorado wants to merge 3 commits intomasterfrom
gabrielcorado/stream-session-events-peek-audit-log
Closed

Determine the session type based on the events on session events streaming#49361
gabrielcorado wants to merge 3 commits intomasterfrom
gabrielcorado/stream-session-events-peek-audit-log

Conversation

@gabrielcorado
Copy link
Copy Markdown
Contributor

@gabrielcorado gabrielcorado commented Nov 22, 2024

Related to #49164

Brief context: After the changes are made at #47309, when a request for session recording events (playback) is made, we query to find the last event to set the SessionType field on the audit log. Depending on the cluster's backend and the volume of audit events, this can cause increased latency.

This PR updates the logic of determining the session type by consuming (peeking) the first event of the stream.

changelog: Improve session playback initial delay caused by an additional events query.

@gabrielcorado gabrielcorado marked this pull request as ready for review November 22, 2024 19:21
var sessionKind types.SessionKind
select {
case <-ctx.Done():
case <-firstErrCh:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we need to peek if the error handling is ignored
The comments says

Also peek the error channel to avoid blocking

but the implementation will resent the error to errCh so I'm but consufed how this help to avoid blocking.

Copy link
Copy Markdown
Contributor Author

@gabrielcorado gabrielcorado Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the StreamSessionEvents call makes an error, we never receive an event on the regular events channel (only one error on the error channel). If we don't peek at the error channel, we will only be unblocked when the context is finished.

As for peeking, we only don't forward the error channel if the emit audit event fails. If we peek, the caller will only be sent a message on the events channel that will never be sent instead of a channel receiving the error.

To clarify, we ignored the error because the audit log does not specify whether the session recording access was successful.

Flow:

sequenceDiagram
    participant C as Caller
    participant S as Server
    participant A as Audit log
    
    C->>S: StreamSessionEvents
    S->>A: StreamSessionEvents
    A->>S: (chan AuditEvent, chan error)

    alt Receives event (on channel)
        S->>S: Determine session type
    else Receives error (on channel)
        S->>S: Does nothing
    else Context done
        S->>S: Does nothing
    end
    
    S->>S: EmitAuditEvent
    alt Emit fails
        S->>C: return new error channel<br/>and empty events channel
    else
        S->>C: peeked channels
    end
Loading

DatabaseSessionKind SessionKind = "db"
AppSessionKind SessionKind = "app"
WindowsDesktopSessionKind SessionKind = "desktop"
UnknownSessionKind SessionKind = ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining the purpose of UnknownSessionKind and describing how it might occur?

return c, e
}

eventsCh, errCh := a.alog.StreamSessionEvents(ctx, sessionID, startIndex)
Copy link
Copy Markdown
Contributor

@greedy52 greedy52 Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will miss the first event when startIndex is non-zero. How complex would it be if we had to change the StreamSessionEvents interface?

And curious how often/when the startIndex is non-zero

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The events won't be lost. The only difference is that we won't be able to set the session type (because the event type differs from the ones we're checking). In this case, the value would be unknown (empty string).

@zmb3
Copy link
Copy Markdown
Collaborator

zmb3 commented Nov 25, 2024

This peek implementation looks a little too complicated. I haven't looked in detail yet, but can we do something like this?

evts, errs := StreamSessionEvents(...)

typeC := make(chan string, 1)

// kick off a goroutine to emit a recording access event
// (wait up to 5 seconds to determine the type)
go func() {
    recordingType := "unknown"
    defer EmitAuditEvent(SessionRecordingAccess{Type: recordingType})

    select {
    case <-time.After(5 * time.Second):
    case typ := <-typeC:
        recordingType = typ
    }
}()

for evt := range evts {
   // if first event, write type to typeC
}

@zmb3 zmb3 linked an issue Nov 26, 2024 that may be closed by this pull request
@gabrielcorado
Copy link
Copy Markdown
Contributor Author

@zmb3 We need to peek at the first event and return a channel containing it so that it does not affect the callers. We have thought about some alternatives to this, but those require updating the interface (which is defined in different places and has mocks so that the changes would be more significant):

  • Update function return to include an additional channel to receive the first event. This would require updating both implementations and callers.
  • @greedy52 suggested adding a new optional argument to provide the function with a callback when the first event is available or an error occurs.

Do you think those alternatives will work better?

@smallinsky smallinsky self-requested a review November 27, 2024 18:38
@greedy52 greedy52 self-requested a review November 27, 2024 19:06
@zmb3
Copy link
Copy Markdown
Collaborator

zmb3 commented Nov 27, 2024

@gabrielcorado how about this diff? (I haven't removed the emitting from the current place yet..)

If we move where we emit the recording access event from the auth grpc server to the actual audit log implementation we should be able to do this without changing any interface anywhere.

The other nice property of this implementation is it works even if you specify a start event > 0.

diff --git a/lib/events/auditlog.go b/lib/events/auditlog.go
index 51180746cb..0c6c1dab60 100644
--- a/lib/events/auditlog.go
+++ b/lib/events/auditlog.go
@@ -39,6 +39,7 @@ import (
 	"github.com/gravitational/teleport/api/internalutils/stream"
 	apievents "github.com/gravitational/teleport/api/types/events"
 	"github.com/gravitational/teleport/lib/defaults"
+	"github.com/gravitational/teleport/lib/events"
 	"github.com/gravitational/teleport/lib/observability/metrics"
 	"github.com/gravitational/teleport/lib/session"
 	"github.com/gravitational/teleport/lib/utils"
@@ -545,6 +546,22 @@ func (l *AuditLog) StreamSessionEvents(ctx context.Context, sessionID session.ID
 		"session_id", string(sessionID),
 	)
 
+	recordingFormat := make(chan string, 1)
+	go func() {
+		var format string
+		for {
+			select {
+			case f := <-recordingFormat:
+				format = f
+			case <-l.Clock.After(5 * time.Second):
+				l.EmitAuditEvent(ctx, apievents.SessionRecordingAccess{
+					// TODO: other fields
+					Format: format,
+				})
+				return
+			}
+		}
+	}()
+
 	go func() {
 		defer rawSession.Close()
 		// this shouldn't be necessary as the position should be already 0 (Download
@@ -572,6 +589,14 @@ func (l *AuditLog) StreamSessionEvents(ctx context.Context, sessionID session.ID
 				return
 			}
 
+			switch event.GetType() {
+			case events.SessionStartEvent:
+				recordingFormat <- "ssh"
+			case events.WindowsDesktopSessionStartEvent:
+				recordingFormat <- "desktop"
+				// TODO: other cases
+			}
+
 			if event.GetIndex() >= startIndex {
 				select {
 				case c <- event:

@gabrielcorado
Copy link
Copy Markdown
Contributor Author

@zmb3 I've tried your proposed solution, and after discussing it with @greedy52, it seems that it raises a few concerns, given we're calling it directly:

Given this, we proposed a new solution that uses a callback (provided to the AuditLog through context so we don't change the function signature) called when the first event is available. Please take a look and see what you think. I've placed it in this draft PR.

@gabrielcorado
Copy link
Copy Markdown
Contributor Author

Closing this in favor of #50395

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Playing back a session has a 5s delay at the beginning

4 participants