Skip to content

Fix encrypted recording uploads in S3#60542

Merged
Joerger merged 16 commits intomasterfrom
joerger/pad-encrypted-recordings-s3
Oct 31, 2025
Merged

Fix encrypted recording uploads in S3#60542
Joerger merged 16 commits intomasterfrom
joerger/pad-encrypted-recordings-s3

Conversation

@Joerger
Copy link
Copy Markdown
Contributor

@Joerger Joerger commented Oct 24, 2025

Changelog: Fix a bug where encrypted sessions recordings could not be uploaded to S3.

S3 buckets enforce a 5MB minimum for multipart uploads, but currently Encrypted recordings are uploaded in 128kb parts due to technical limitations.

2025-10-23T18:23:34.107-07:00 WARN [UPLOAD]    Uploader scan failed, applying backoff before retrying backoff:19.704082796s error:[
ERROR REPORT:
Original Error: *smithy.OperationError operation error S3: CompleteMultipartUpload, https response error StatusCode: 400, RequestID: NB4P6J3SVNEDB64H, HostID: v/0Fn5CrKDuo1zWLsqo8P5oML4EXV3wsWZ6I/hIfUD0hxbp+rL6+34qUqKIN+Zjnk4vf62iSvpA=, api error EntityTooSmall: Your proposed upload is smaller than the minimum allowed size
Stack Trace:
	github.com/gravitational/teleport/lib/events/s3sessions/s3stream.go:203 github.com/gravitational/teleport/lib/events/s3sessions.(*Handler).CompleteUpload
	github.com/gravitational/teleport/lib/events/auditlog.go:655 github.com/gravitational/teleport/lib/events.(*AuditLog).UploadEncryptedRecording
	github.com/gravitational/teleport/lib/events/filesessions/fileasync.go:432 github.com/gravitational/teleport/lib/events/filesessions.(*Uploader).uploadEncryptedRecording
	github.com/gravitational/teleport/lib/events/filesessions/fileasync.go:503 github.com/gravitational/teleport/lib/events/filesessions.(*Uploader).startUpload
	github.com/gravitational/teleport/lib/events/filesessions/fileasync.go:278 github.com/gravitational/teleport/lib/events/filesessions.(*Uploader).Scan
	github.com/gravitational/teleport/lib/events/filesessions/fileasync.go:239 github.com/gravitational/teleport/lib/events/filesessions.(*Uploader).Serve
	github.com/gravitational/teleport/lib/service/service.go:3648 github.com/gravitational/teleport/lib/service.(*TeleportProcess).initUploaderService.func1
	github.com/gravitational/teleport/lib/service/supervisor.go:605 github.com/gravitational/teleport/lib/service.(*LocalService).Serve
	github.com/gravitational/teleport/lib/service/supervisor.go:328 github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve.func1
	runtime/asm_amd64.s:1700 runtime.goexit
User Message: completing upload
	CompleteMultipartUpload(upload CLLKcVEllOAMbhJo4SGBBpibc2WxHJkm53CPKhNh54kwhJC2cSXvq9IGeNzYcmgrIECYuqmu8fIl3sUa9quydUTmf97o22fpYsHvxZbAG6qB6olQggGR5vBg_bzKrqXed778KMfaB02bn14gFh3GaUZGj.u.Z.iiwI4tlfzmHKY-) session(eeb355b0-69dc-4daa-b4de-ae6378f1c938)
		operation error S3: CompleteMultipartUpload, https response error StatusCode: 400, RequestID: NB4P6J3SVNEDB64H, HostID: v/0Fn5CrKDuo1zWLsqo8P5oML4EXV3wsWZ6I/hIfUD0hxbp+rL6+34qUqKIN+Zjnk4vf62iSvpA=, api error EntityTooSmall: Your proposed upload is smaller than the minimum allowed size] filesessions/fileasync.go:242

This PR fixes this error by:

  • Aggregating the 128kb upload parts into larger upload parts before sending the upload to Auth.
    • Auth is already agnostic to upload boundaries for both streamUploads and playback, meaning that multiple parts can be aggregated in a single upload part.
    • We aggregate up to exceed 5MB for local uploads (on Auth process) or under 4MB for gRPC client uploads (gRPC max msg recv size).
  • Before Auth sends the upload part to S3, it adds padding to reach the 5MB minimum.
    • This is only necessary for uploads with the gRPC client, limited by the 4MB max msg recv size.
    • With the default max msg recv size size of 4MB, this will result in 1MB of padding per 5MB upload part.
    • We skip this padding on the last upload part as it is not required and could result in up to 5MB of unnecessary padding.
    • The recording playback now ignores empty upload parts, as the 1MB padding above is an individual upload part with PartSize=0 && PaddingSize=1MB

This is a short term fix as the 1MB of padding per upload part is not ideal. It is also tangentially related to encrypted recordings not being processed for session summaries, metadata, or thumbnails. Note that due to the lack of metadata, this bug also results in encrypted recordings being unplayable in the WebUI in v18.2.3+. A more complete fix will be handled in a follow up PR.

Manual Testing (34f1f23):

  • encrypted recording w/ S3
    • For each of the following configurations, I created a session recording > 5MB, played it back with tsh play and checked that it was the expected size in the S3 bucket.
    • node recording
      • node in auth process
        • async
        • sync
      • node in separate process (gRPC client used)
        • async
          • [ ] Setting TELEPORT_UNSTABLE_GRPC_RECV_SIZE=5MB should result in the 1MB padding being skipped, as the client will create 5MB aggregated upload parts itself.
        • sync
    • proxy recording
      • proxy in auth process
        • async
        • sync
      • proxy in separate process (gRPC client used)
        • async
          • [ ] Setting TELEPORT_UNSTABLE_GRPC_RECV_SIZE=5MB should result in the 1MB padding being skipped, as the client will create 5MB aggregated upload parts itself.
        • sync
  • non-encrypted recording
    • playback w/o new padding

@Joerger Joerger requested review from eriktate and rosstimothy and removed request for eriktate October 24, 2025 01:37
@github-actions github-actions bot added audit-log Issues related to Teleports Audit Log size/md labels Oct 24, 2025
@Joerger Joerger requested a review from eriktate October 24, 2025 01:39
@marcoandredinis marcoandredinis removed their request for review October 24, 2025 13:28
@russjones
Copy link
Copy Markdown
Contributor

@Joerger Manual test plan looks great. Just to confirm you tested against actual S3 right?

@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Oct 24, 2025

@Joerger Manual test plan looks great. Just to confirm you tested against actual S3 right?

Yes!

Copy link
Copy Markdown
Contributor

@rosstimothy rosstimothy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add test coverage to prevent a regression?

Comment thread lib/events/filesessions/fileasync.go Outdated
Comment thread lib/events/filesessions/fileasync.go Outdated
@Joerger Joerger requested a review from rosstimothy October 29, 2025 21:16
@Joerger Joerger force-pushed the joerger/pad-encrypted-recordings-s3 branch from 5f579c7 to 34f1f23 Compare October 30, 2025 00:09
@Joerger
Copy link
Copy Markdown
Contributor Author

Joerger commented Oct 30, 2025

@eriktate @rosstimothy I made significant changes in recent commits, mostly to accommodate the new tests. I re-ran the manual test plan in the PR description. Can you give it a second look?

@rosstimothy rosstimothy requested a review from eriktate October 30, 2025 19:38
// NewMemoryUploader returns a new memory uploader implementing multipart
// upload
func NewMemoryUploader(eventsC ...chan events.UploadEvent) *MemoryUploader {
func NewMemoryUploader(cfg ...MemoryUploaderConfig) *MemoryUploader {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sure would be nice to clean this API up one day 😬.

@public-teleport-github-review-bot public-teleport-github-review-bot bot removed the request for review from smallinsky October 31, 2025 20:12
@@ -438,26 +445,26 @@ func TestUploadBackoff(t *testing.T) {

// Fix the streamer, make sure the upload succeeds
terminateConnectionAt.Store(0)
p.clock.BlockUntil(2)
p.clock.BlockUntilContext(ctx, 2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Flaky Test Detector looks to maybe be hanging here?

https://github.com/gravitational/teleport/actions/runs/18951112917/job/54115715130?pr=60542#step:10:748

goroutine 17077 [select]:
github.com/jonboulle/clockwork.(*FakeClock).BlockUntilContext(0xc00521f4a0, {0x4597898, 0xc0004c1110}, 0x2)
	/go/pkg/mod/github.com/jonboulle/clockwork@v0.5.0/clockwork.go:244 +0xfb
github.com/gravitational/teleport/lib/events/filesessions.TestUploadBackoff(0xc001ca0fc0)
	/__w/teleport/teleport/lib/events/filesessions/fileasync_test.go:448 +0xcb8
testing.tRunner(0xc001ca0fc0, 0x42389d0)
	/opt/go/src/testing/testing.go:1934 +0x21d
created by testing.(*T).Run in goroutine 1
	/opt/go/src/testing/testing.go:1997 +0x9d3

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just hitting the flaky test timeout, this is the slowest test of those that are run for this PR:

go test -run "^TestUploadBackoff$" github.com/gravitational/teleport/lib/events/filesessions --race --count=100
ok  	github.com/gravitational/teleport/lib/events/filesessions	164.958s

Copy link
Copy Markdown
Contributor

@eriktate eriktate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍

Comment thread lib/events/filesessions/fileasync.go Outdated
Comment thread lib/events/filesessions/fileasync.go Outdated
@rosstimothy
Copy link
Copy Markdown
Contributor

/excludeflake *

Co-authored-by: Erik Tate <erik.tate@goteleport.com>
@Joerger Joerger enabled auto-merge October 31, 2025 21:22
@Joerger Joerger added this pull request to the merge queue Oct 31, 2025
Merged via the queue into master with commit 67399e9 Oct 31, 2025
44 checks passed
@Joerger Joerger deleted the joerger/pad-encrypted-recordings-s3 branch October 31, 2025 22:03
@backport-bot-workflows
Copy link
Copy Markdown
Contributor

@Joerger See the table below for backport results.

Branch Result
branch/v18 Failed

Joerger added a commit that referenced this pull request Oct 31, 2025
* Add necessary padding to encrypted recordings for s3.

* Add padded part instead of editing existing part; Don't pad last upload part; Support skipping padded part in streamer.

* Reconstruct encrypted uploads from agent.

* Concatenate upload parts into a single upload part instead of restructuring and combining them.

* Make iterator buffer safe.

* Remove padding from 128kb parts during concatenation.

* Fix padding through gRPC service.

* Cleanup; Fix discarded padding logic.

* Add tests.

* Fix bad upload test.

* Expand test, add comments.

* Minor edits.

* Add padding helper with test; Add comments.

* Fix tests, lint.

* Don't use TELEPORT_UNSTABLE_GRPC_RECV_SIZE.

* Apply suggestions from code review

Co-authored-by: Erik Tate <erik.tate@goteleport.com>

---------

Co-authored-by: Erik Tate <erik.tate@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Nov 3, 2025
* Add necessary padding to encrypted recordings for s3.

* Add padded part instead of editing existing part; Don't pad last upload part; Support skipping padded part in streamer.

* Reconstruct encrypted uploads from agent.

* Concatenate upload parts into a single upload part instead of restructuring and combining them.

* Make iterator buffer safe.

* Remove padding from 128kb parts during concatenation.

* Fix padding through gRPC service.

* Cleanup; Fix discarded padding logic.

* Add tests.

* Fix bad upload test.

* Expand test, add comments.

* Minor edits.

* Add padding helper with test; Add comments.

* Fix tests, lint.

* Don't use TELEPORT_UNSTABLE_GRPC_RECV_SIZE.

* Apply suggestions from code review



---------

Co-authored-by: Erik Tate <erik.tate@goteleport.com>
mmcallister pushed a commit that referenced this pull request Nov 19, 2025
* Add necessary padding to encrypted recordings for s3.

* Add padded part instead of editing existing part; Don't pad last upload part; Support skipping padded part in streamer.

* Reconstruct encrypted uploads from agent.

* Concatenate upload parts into a single upload part instead of restructuring and combining them.

* Make iterator buffer safe.

* Remove padding from 128kb parts during concatenation.

* Fix padding through gRPC service.

* Cleanup; Fix discarded padding logic.

* Add tests.

* Fix bad upload test.

* Expand test, add comments.

* Minor edits.

* Add padding helper with test; Add comments.

* Fix tests, lint.

* Don't use TELEPORT_UNSTABLE_GRPC_RECV_SIZE.

* Apply suggestions from code review

Co-authored-by: Erik Tate <erik.tate@goteleport.com>

---------

Co-authored-by: Erik Tate <erik.tate@goteleport.com>
mmcallister pushed a commit that referenced this pull request Nov 20, 2025
* Add necessary padding to encrypted recordings for s3.

* Add padded part instead of editing existing part; Don't pad last upload part; Support skipping padded part in streamer.

* Reconstruct encrypted uploads from agent.

* Concatenate upload parts into a single upload part instead of restructuring and combining them.

* Make iterator buffer safe.

* Remove padding from 128kb parts during concatenation.

* Fix padding through gRPC service.

* Cleanup; Fix discarded padding logic.

* Add tests.

* Fix bad upload test.

* Expand test, add comments.

* Minor edits.

* Add padding helper with test; Add comments.

* Fix tests, lint.

* Don't use TELEPORT_UNSTABLE_GRPC_RECV_SIZE.

* Apply suggestions from code review

Co-authored-by: Erik Tate <erik.tate@goteleport.com>

---------

Co-authored-by: Erik Tate <erik.tate@goteleport.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

audit-log Issues related to Teleports Audit Log backport/branch/v18 size/md

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants