Skip to content

feat: add metrics for event sizes#35440

Merged
zmb3 merged 3 commits intomasterfrom
nklaassen/event-metrics
Dec 7, 2023
Merged

feat: add metrics for event sizes#35440
zmb3 merged 3 commits intomasterfrom
nklaassen/event-metrics

Conversation

@nklaassen
Copy link
Copy Markdown
Contributor

@nklaassen nklaassen commented Dec 6, 2023

based on top of #35402

changelog: added prometheus metrics for audit event sizes

Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).
@github-actions github-actions Bot added audit-log Issues related to Teleports Audit Log size/sm labels Dec 6, 2023
@gravitational gravitational deleted a comment from github-actions Bot Dec 6, 2023
Base automatically changed from nklaassen/s3-large-events to master December 7, 2023 02:11
@zmb3 zmb3 enabled auto-merge December 7, 2023 16:14
@zmb3 zmb3 added this pull request to the merge queue Dec 7, 2023
Merged via the queue into master with commit 4252cc4 Dec 7, 2023
@zmb3 zmb3 deleted the nklaassen/event-metrics branch December 7, 2023 17:03
@public-teleport-github-review-bot
Copy link
Copy Markdown

@nklaassen See the table below for backport results.

Branch Result
branch/v14 Failed

rosstimothy pushed a commit that referenced this pull request Dec 7, 2023
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).

* feat: add metrics for event sizes
github-merge-queue Bot pushed a commit that referenced this pull request Dec 7, 2023
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).

* feat: add metrics for event sizes

Co-authored-by: Nic Klaassen <nic@goteleport.com>
nklaassen pushed a commit that referenced this pull request Jan 26, 2024
github-merge-queue Bot pushed a commit that referenced this pull request Jan 26, 2024
* [v13] fix: trim large events in Athena querier

Backport #35402 to branch/v13
Fixes #35161

Large events queried from the Athena audit backend will now be trimmed
before they are stored and before they are returned from a query
according to the existing TrimToMaxSize implementations for each event
type already used by the Dynamo and File backends.

The other backends typically trim the event before storing it, for
Dynamo this is due to the 400 KB item size limit, for the file backend
it's due to the 64 KiB bufio.MaxScanTokenSize.

There is no hard limit to events stored in Parquet files in S3, but
we've been using a 2 GiB limit in the publisher so far.
With this change we will attempt to trim events to 2 GiB before writing
them (if we haven't already run out of memory) instead of just failing.

We've also been using a 1 MiB limit in the querier and just returning an
empty result when an event larger than that is encountered.
With this change we will attempt to trim the event to 1MiB before
returning it.
The 1 MiB limit ultimately stems from the 4MB max gRPC message size.

We could just trim to 1 MiB in the publisher, but I'd prefer to preserve
as much of the event data as possible in case we improve the querying
story for large events in the future (and in case the user wants to
query the events directly from S3).

* [v13] feat: add metrics for event sizes

Backport #35440 to branch/v13

---------

Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log size/sm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants