-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Session streams structured events. #4045
Conversation
Builds are failing because |
c82a12c
to
8f60cb7
Compare
7709c4e
to
90d9088
Compare
c32d957
to
c4a6bd2
Compare
2dfb89f
to
7db575f
Compare
1a33194
to
f5aae70
Compare
ae3930c
to
22ae84e
Compare
f223bef
to
278c9d7
Compare
retest this please |
retest this please |
I know we plan to roll this out as experimental, but I wanted to add more content for the file config for alpha / beta testers of this. Would this be correct?
|
e7fc9b8
to
1dd84c2
Compare
@gravitational-jenkins retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skimmed through some code in lib/events
(not subdirectories).
i think we're adding some performance overhead by not using the proto types directly (see ToOneOf
and FromOneOf
in lib/events/convert.go
, or any code that uses json encode/decode as conversion). not sure how much though.
@klizhentas did you happen to look at any CPU or memory profiles for a server under load?
lib/events/auditwriter.go
Outdated
// This first drain is necessary to give status updates a priority | ||
// in the event processing loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do status updates need priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't give updates the priority then you may never receive the update in time - the sending part of the loop can block before it gets a chance to receive status update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify - select below may pop an item off of a.eventsCh
and block on EmitAuditEvents
for too long without processing status updates?
how does this problem manifest at a higher level: is a.buffer
growing too large spiking the memory usage or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in practice when there are too many events, select on status channel will happen much later, yes, which leads to increased buffer, yes.
lib/events/auditwriter.go
Outdated
// retry is created on the first failure to resume | ||
if retry == nil { | ||
var rerr error | ||
retry, rerr = utils.NewLinear(utils.LinearConfig{ | ||
Step: defaults.NetworkRetryDuration, | ||
Max: defaults.NetworkBackoffDuration, | ||
}) | ||
if rerr != nil { | ||
return nil, trace.Wrap(err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not create this before the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
@awly |
@klizhentas gotcha. I'm still curious to see some CPU/memory profiles if you have a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially reviewed (mostly auth
package). Will resume tomorrow.
1dd84c2
to
486555d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly spelling/grammar fixes.
lib/events/api.go
Outdated
// GetID returns unique event ID | ||
GetID() string | ||
// SetID sets unique event ID | ||
SetID(id string) | ||
|
||
// GetCode returns event short diagnostic code | ||
GetCode() string | ||
// SetCode sets unique event diagnostic code | ||
SetCode(string) | ||
|
||
// GetType returns event type | ||
GetType() string | ||
// SetCode sets unique type | ||
SetType(string) | ||
|
||
// GetTime returns event time | ||
GetTime() time.Time | ||
// SetTime sets event time | ||
SetTime(time.Time) | ||
|
||
// GetIndex gets event index - a non-unique | ||
// monotonicaly incremented number | ||
// in the event sequence | ||
GetIndex() int64 | ||
// SetIndex sets event index | ||
SetIndex(idx int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface can be significantly simplified by just exposing *Metadata
. Ex:
type AuditEvent interface {
ProtoMarshaler
Meta() *Metadata
}
// Set ID
event.Meta().ID = "some-id"
// Get Index
index := event.Meta().Index
// etc...
Not sure if that is desirable, but its worth noting.
edit: Same kind of simplification could be applied to the other getter/setter style interfaces in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will keep it for now, but noted for the future refactorings
lib/events/auditlog.go
Outdated
if err != nil { | ||
return trace.ConvertSystemError(err) | ||
} | ||
if format.Proto == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if format.Proto == true { | |
if format.Proto { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that @awly prefers == true
and == false
, I wonder what do you folks think about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @russjones likes == true/false
, I don't ;)
486555d
to
57ff608
Compare
@awly check out the node profiles: this is a 10 second workload that produces ~4MB compressed recording in a loop on a node. |
57ff608
to
2e43a01
Compare
@awly @fspmarshall if you guys don't have any outstanding requests, I'm ready to merge and cut beta.1 today. I will continue testing for the next 2 weeks until Oct 7th release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the profiles, all numbers look reasonable to me
This commit introduces GRPC API for streaming sessions. It adds structured events and sync streaming that avoids storing events on disk. You can find design in rfd/0002-streaming.md RFD.
2e43a01
to
57b11be
Compare
retest this please |
This commit refactors events subsystem in teleport in a backwards compatible way and fixes several issues.
fixes #3549 (Session streaming design), fixes #3800 (Events overwritten in DynamoDB), fixes #3182 (Teleport consuming all disk space with multipart uploads)
Structured Events
Events have been refactored from unstructured to structured definitions generated
from protobuf spec
Each event embeds common required metadata:
This metadata is accompanied by common event fields:
That allow every event to comply with the common interface
Session events
Session events embed session metadata:
And implement extended interfaces:
This approach allows common event interface to be converted to other event classes without casting to specific type
Other event types
Other event types, such as events dealing with connections embed other common types of metadata, introducing different event classes, for example connection metadata events:
Streams
Streamer is the new interface for nodes and proxies to send session events to the auth server as a continuous sequence of events:
Nodes and proxies can resume streams that were interrupted using upload ID.
Streams represent continuous sequence of events associated with session.
Clients can use stream status to create back-pressure - (stop sending until streams reports events uploaded) or resume the upload without re sending all events.
Uploaders
Uploaders provide abstraction over multipart upload API, specifically S3 for AWS and GCS for Google.
The stream on-disk format is optimized to support parallel uploads of events to S3 and resuming of uploads.
Session events storage format
The storage format for session recordings is optimized around fast marshal/unmarshal using protobuf, compression using gzip and support for parallel uploads to S3 or GCS storage.
Unlike v0 (json based multiple file recording format), the individual session is represented by continuous globally ordered sequence of events serialized to binary protobuf format.
Multipart Uploader storage backends
Backwards compatibility
GRPC
GRPC streaming
Nodes and proxies are using GRPC interface implementation to submit individual global events and create and resume streams
GRPC/HTTPs protocol switching
ServeHTTP compatibility handler that allowed to serve GRPC over HTTPs connection had to be replaced with native GRPC problems, because of the problems with backpressure described here.
This in turn required to implement protocol switching to be done on TLS level using NextProto. This approach takes the advantage of the fact that all clients (Golang) are using HTTP/1.1, so old clients talking to the new Auth server will not be affected (gprc will continue to work as expected using HTTP/2 and legacy API is still served by HTTP/1.1 until deprecated). We are not aware of the clients accessing the server's legacy HTTPs API over HTTP/2, but those if exist would be broken.
Sync and async streams
The existing 4.3 stream implementation is async - the sessions are streamed on disk of proxy and node and then uploaded as a single tarball. This created performance and stability problems for large uploads - #3182 (Teleport consuming all disk space with multipart uploads) and security concerns - storage on disk required disk encryption to support FedRamp mode.
Sync stream
This commit introduces experimental
proxy-sync
andnode-sync
recording mode. In this mode, proxy and node sends logs directly to the auth server that in turn sends the recordings to S3 or other storage without storing the records on disk at all.This created potential problem of resuming the session stream. The new audit writer takes advantage of stream status reporting and a new option to resume stream to replay the events that have not been uploaded to the storage.
Because auth server never stores any local data about stream and instead initiates multipart upload that can be resumed on the auth server, the loss of the single auth server will not lead to sync sessions problems in case if another auth server is available to resume the stream.
Async streams
Default mode in teleport 4.4 remains async, with file uploader used to store the events on disk in the new protobuf format.
Disk uploader now attempts to resume the upload to the auth server based on the last reported status if possible solving the problem of very large uploads interrupted because of the server overload or intermittent network problems.
Unified stream interface
The sync and async streams are using the same GRPC API to implement functionality, session chooses the appropriate emitter based on the cluster configuration.
Completing broken streams
In teleport 4.3 some streams and sessions were never uploaded to the auth server, for example in cases when node or proxy crashed before marking the session on disk as complete, the session would stay on the proxy or node without being uploaded.
Switching to S3 based multipart upload API allowed to implement upload completer - special service that watches uploads that haven't been completed over grace period (> 12 hours) and completes them.
P.S.
@awly @fspmarshall @russjones please check out this interview guide with direct links to diff, hopefully this simplify the review of this lengthy PR.
I have added the same content in RFD format here