switch event exporter to bulk api#47369
Merged
fspmarshall merged 2 commits intomasterfrom Oct 16, 2024
Merged
Conversation
1ed8c76 to
b4bc6e7
Compare
tigrato
reviewed
Oct 14, 2024
b4bc6e7 to
3003694
Compare
bernardjkim
approved these changes
Oct 15, 2024
tigrato
approved these changes
Oct 15, 2024
|
@fspmarshall See the table below for backport results.
|
fspmarshall
added a commit
that referenced
this pull request
Oct 16, 2024
This was referenced Oct 16, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR switches over the event exporter to use the new bulk event export API. Most of the actual changes in this PR relate to reworking the old logic to behave more like the new event exporter helpers introduced in previous PRs so that it is easier to switch from the old API to the new one once the control plane has been upgraded. The one standout exception is a change to how export of session events works.
Prior to this change, the event exporter's session event processing was entirely asynchronous. Each session whose events needed to be processed had its initial cursor state written to disk, and a background goroutine was spawned that would eventually kick off processing of the session when capacity opened up. The new bulk event export API is way too fast for this strategy to work, and audit data with high session density (e.g. massive automated workloads) ends up producing an ever-growing number of cursors on disk and background coroutines which will eventually swamp the host system. In order to mitigate this effect, there is now a fixed sized backlog for unstated sessions and backpressure is exerted on primary event processing if the backlog is filled.
The overall effect of this PR in dedicated test cases is a roughly 8x to 10x increase in event processing throughput, depending on how session-heavy the data is, and what the specified
--concurrencyvalue is. On a 128 core machine I found the optimal--concurrencyvalue to be about 256 when churning through a large backlog of session heavy audit log data, though note that the vast majority of teleport clusters do not produce enough audit log data to merit a--concurrencyvalue nearly this high.Note: this PR (and the preceding bulk event export API work in general) represents a bit of a philosophical shift. A lot less work is done to avoid duplicate and out of order event emission. In general, the event handler now prioritizes event throughput over ordering/deduplication.
Status: PR is good to review, but some manual testing is still ongoing so expect minor ongoing changes.
Closes #46193
Changelog: reworked the
teleport-event-handlerintegration to significantly improve performance, especially when running with larger--concurrencyvalues.