feat: add debug info upload service#4648
Conversation
d202551 to
8577f33
Compare
8577f33 to
5c053a7
Compare
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add debug-level logging to help troubleshoot debuginfo upload issues: - Log ShouldInitiateUpload results (build_id, decision, reason) - Log Upload gRPC results (success with size, or failure with error) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Debuginfod fetching is now handled in the symbolizer instead of in the parca debuginfo store. This removes the debuginfod client implementation, related tests, and test fixtures from the parca/debuginfo package. Also removes tenant ID handling from the debuginfo store since it's not needed for this use case. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…test Refactor the store test to use an HTTP/2 server with h2c and the util.AuthenticateUser middleware instead of a direct gRPC server. This better reflects the production setup where requests go through HTTP middleware for tenant authentication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
||
| t := noop.Tracer{} | ||
| l = log.With(l, "component", "debug-info") | ||
| bucket = objstore.NewPrefixedBucket(bucket, symbolizer.BucketPrefixParcaDebugInfo) |
There was a problem hiding this comment.
I think we don't have a cleanup mechanism to control unbounded growth (neither for symbolizer path). Could it be an issue?
There was a problem hiding this comment.
I think it's fine to keep lidia's indefinitely. And preferably to remove the debug information files once they are converted to lidia. All of this is hard to implement in a stateless manner in distributors, so I propose to postpone these decision until we have a proper place/way to implement this https://github.com/grafana/pyroscope-squad/issues/687
Co-authored-by: Marc Sanmiquel <marcsanmiquel@gmail.com>
|
Just wanted to clarify, the PR was not ready for review, only the new protobufs and overall intention to use the new protobufs. I also think we should have an idea how to move forward with infinite data growth (both lidia and debug info) |
I talked with @\simonswine and we agreed to solve this later, and merge this for private use, not for public. |
Remove parcaBucket parameter from test helper and all callers after the symbolizer was refactored to use a single bucket for both lidia and parca debug info storage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move proto package from debuginfo.v1 to debuginfo.v1alpha1 and update all generated code and consumer imports accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename proto fields to snake_case (gnu_build_id, go_build_id, otel_file_id) - Rename enum DebuginfoType → Type, add TYPE_UNSPECIFIED, fix prefix - Fix bug: set STATE_UPLOADED after upload completes - Replace gRPC status.Error with connect.NewError for consistency - Rename fetchFromParca → fetchFromUploadedDebugInfo - Revert ProcessELFData back to private method on Symbolizer Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9757e94dc9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
#4896) * add unit tests for debuginfo service Cover the store logic (ValidateGnuBuildID, validateInit, NewStore, shouldInitiateUpload, uploadIsStale, fetchMetadata, writeMetadata, ObjectPath, MetadataObjectPath) and the chunked upload reader (Read, Size, context cancellation) with ~43 test cases across two new test files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix store_test.go to use base branch protobuf field names The base branch (debuginfo-upload) uses GnuBuildId and FileMetadata_TYPE_* naming in the generated protobuf, while this worktree had a divergent proto with GNU and DEBUGINFO_TYPE_* names. Align the test file with the base branch so CI compiles correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix gofmt alignment in store_test.go Align struct field formatting to pass gofmt check in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * add end-to-end connect client tests for debuginfo Upload RPC Add TestUploadE2E with three subtests that exercise the full Upload bidi streaming RPC over a real HTTP/2 test server with connect client: - full upload flow: init -> chunks -> verify stored data and metadata - already uploaded: pre-populate metadata, verify service declines upload - disabled service: verify service returns ReasonDisabled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * remove unused net/http import from store_test.go Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix e2e test to use TLS for HTTP/2 bidi streaming BidiStream requires HTTP/2. Switch from h2c (cleartext) to TLS via StartTLS() which negotiates HTTP/2 through ALPN. Remove unused h2c and http2 imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix store.go and e2e test: align with base branch protobuf, fix upload state bug - Align store.go with base branch protobuf field names (GnuBuildId, TYPE_EXECUTABLE_FULL/NO_TEXT) and use connect.NewError instead of grpc status.Error - Add missing md.State = STATE_UPLOADED after successful upload - Fix e2e test to wait for server handler completion by calling Receive() after CloseRequest() before checking bucket contents Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix wrong error messages: "init.FileData" → "init.File" to match actual field name - Enforce MaxUploadSize in UploadReader to prevent unlimited chunked uploads - Tenant-isolate lidia cache paths to prevent cross-tenant cache poisoning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
|
@cursor-bugbot review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed:
filepath.Joinproduces wrong paths on non-Unix OS- Changed
lidiaObjectPathto usepath.Joinso object-storage keys always use forward slashes across OSes.
- Changed
- ✅ Fixed: Removed not-found cache causes extra bucket lookups
- Reintroduced the debuginfod not-found cache short-circuit in
getLidiaBytesto avoid unnecessary object-store lookups for known-missing build IDs.
- Reintroduced the debuginfod not-found cache short-circuit in
- ✅ Fixed: IDE run config commits debug-level logging settings
- Removed the debug logging/output-file/large upload-size flags from the shared
.ideav2 run configuration and restored non-debug defaults.
- Removed the debug logging/output-file/large upload-size flags from the shared
- ✅ Fixed: Reader drops bytes when current chunk ends with data
- Reworked
UploadReader.Readto return already-read bytes before advancing chunks and added tests coveringn>0withio.EOFbehavior.
- Reworked
Or push these changes by commenting:
@cursor push 5b9bdf720c
Preview (5b9bdf720c)
diff --git a/.idea/runConfigurations/v2.xml b/.idea/runConfigurations/v2.xml
--- a/.idea/runConfigurations/v2.xml
+++ b/.idea/runConfigurations/v2.xml
@@ -1,10 +1,9 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="v2" type="GoApplicationRunConfiguration" factoryName="Go Application">
- <output_file path="intellij-pyroscope.log" is_save="true" />
<module name="pyroscope" />
<working_directory value="$PROJECT_DIR$" />
<go_parameters value="-tags embedassets" />
- <parameters value="-log.level=debug -storage.backend=filesystem -write-path=segment-writer -enable-query-backend=true -symbolizer.enabled=true -debug-info.max-upload-size=1000000000" />
+ <parameters value="-storage.backend=filesystem -write-path=segment-writer -enable-query-backend=true" />
<envs>
<env name="PYROSCOPE_V2" value="true" />
</envs>
diff --git a/pkg/debuginfo/reader/reader.go b/pkg/debuginfo/reader/reader.go
--- a/pkg/debuginfo/reader/reader.go
+++ b/pkg/debuginfo/reader/reader.go
@@ -24,39 +24,39 @@
}
func (r *UploadReader) Read(p []byte) (int, error) {
- if r.cur == nil {
- var err error
- r.cur, err = r.next()
- if err == io.EOF {
- return 0, io.EOF
+ for {
+ if r.cur == nil {
+ var err error
+ r.cur, err = r.next()
+ if err == io.EOF {
+ return 0, io.EOF
+ }
+ if err != nil {
+ return 0, fmt.Errorf("get upload chunk (%d bytes read so far): %w", r.size, err)
+ }
}
- if err != nil {
- return 0, fmt.Errorf("get first upload chunk: %w", err)
+
+ i, err := r.cur.Read(p)
+ if i > 0 {
+ r.size += uint64(i)
+ if r.maxSize > 0 && r.size > uint64(r.maxSize) {
+ return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.size, r.maxSize)
+ }
+ if err == io.EOF {
+ r.cur = nil
+ }
+ return i, nil
}
- }
- i, err := r.cur.Read(p)
- if err != nil && err != io.EOF {
- return 0, fmt.Errorf("read upload chunk (%d bytes read so far): %w", r.size, err)
- }
- if err == io.EOF {
- r.cur, err = r.next()
+
if err == io.EOF {
- return 0, io.EOF
+ r.cur = nil
+ continue
}
if err != nil {
- return 0, fmt.Errorf("get next upload chunk (%d bytes read so far): %w", r.size, err)
+ return 0, fmt.Errorf("read upload chunk (%d bytes read so far): %w", r.size, err)
}
- i, err = r.cur.Read(p)
- if err != nil {
- return 0, fmt.Errorf("read next upload chunk (%d bytes read so far): %w", r.size, err)
- }
+ return 0, nil
}
-
- r.size += uint64(i)
- if r.maxSize > 0 && r.size > uint64(r.maxSize) {
- return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.size, r.maxSize)
- }
- return i, nil
}
func (r *UploadReader) next() (io.Reader, error) {
diff --git a/pkg/debuginfo/reader/reader_test.go b/pkg/debuginfo/reader/reader_test.go
--- a/pkg/debuginfo/reader/reader_test.go
+++ b/pkg/debuginfo/reader/reader_test.go
@@ -22,6 +22,20 @@
}
}
+type eofAfterDataReader struct {
+ data []byte
+ read bool
+}
+
+func (r *eofAfterDataReader) Read(p []byte) (int, error) {
+ if r.read {
+ return 0, io.EOF
+ }
+ r.read = true
+ n := copy(p, r.data)
+ return n, io.EOF
+}
+
func TestUploadReader_Read(t *testing.T) {
t.Parallel()
@@ -102,6 +116,28 @@
require.Error(t, err)
assert.Contains(t, err.Error(), "broken")
})
+
+ t.Run("reader returning data with EOF keeps bytes", func(t *testing.T) {
+ t.Parallel()
+
+ r := New(context.Background(), chunksFunc(nil), 0)
+ r.cur = &eofAfterDataReader{data: []byte("abc")}
+
+ data, err := io.ReadAll(r)
+ require.NoError(t, err)
+ assert.Equal(t, "abc", string(data))
+ })
+
+ t.Run("reader returning data with EOF does not overwrite buffer", func(t *testing.T) {
+ t.Parallel()
+
+ r := New(context.Background(), chunksFunc([][]byte{[]byte("def")}), 0)
+ r.cur = &eofAfterDataReader{data: []byte("abc")}
+
+ data, err := io.ReadAll(r)
+ require.NoError(t, err)
+ assert.Equal(t, "abcdef", string(data))
+ })
}
func TestUploadReader_Size(t *testing.T) {
diff --git a/pkg/symbolizer/symbolizer.go b/pkg/symbolizer/symbolizer.go
--- a/pkg/symbolizer/symbolizer.go
+++ b/pkg/symbolizer/symbolizer.go
@@ -9,6 +9,7 @@
"fmt"
"io"
"net/http"
+ "path"
"path/filepath"
"time"
@@ -417,6 +418,15 @@
}
func (s *Symbolizer) getLidiaBytes(ctx context.Context, buildID string) ([]byte, error) {
+ if client, ok := s.client.(*DebuginfodHTTPClient); ok {
+ if sanitizedBuildID, err := sanitizeBuildID(buildID); err == nil {
+ if found, _ := client.notFoundCache.Get(sanitizedBuildID); found {
+ s.metrics.cacheOperations.WithLabelValues("not_found", "get", statusSuccess).Inc()
+ return nil, buildIDNotFoundError{buildID: buildID}
+ }
+ }
+ }
+
tenantID, err := tenant.TenantID(ctx)
if err != nil {
return nil, err
@@ -461,7 +471,7 @@
}
func lidiaObjectPath(tenantID, buildID string) string {
- return filepath.Join(bucketPrefix, tenantID, buildID)
+ return path.Join(bucketPrefix, tenantID, buildID)
}
// fetchLidiaFromDebuginfod fetches debug info from debuginfod and converts to Lidia formatThere was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b33f1a7572
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), r); err != nil { | ||
| return connect.NewError(connect.CodeInternal, fmt.Errorf("upload debuginfo: %w", err)) | ||
| } | ||
| md.State = debuginfov1alpha1.ObjectMetadata_STATE_UPLOADED | ||
| md.FinishedAt = timestamppb.New(time.Now()) |
There was a problem hiding this comment.
Reject empty upload streams before marking metadata uploaded
If a client sends init and then closes the stream without any chunk, debuginforeader.New(...).Read returns EOF immediately, bucket.Upload succeeds with a zero-byte object, and this code still marks metadata as STATE_UPLOADED; from that point shouldInitiateUpload returns ReasonDebuginfoAlreadyExists, so the tenant is effectively stuck with an unusable debug-info object until manual cleanup.
Useful? React with 👍 / 👎.
| // fetchLidiaFromDebuginfod fetches debug info from debuginfod and converts to Lidia format | ||
| func (s *Symbolizer) fetchLidiaFromDebuginfod(ctx context.Context, buildID string) ([]byte, error) { | ||
| debugReader, err := s.fetchFromDebuginfod(ctx, buildID) | ||
| debugReader, err := s.fetch(ctx, buildID) |
There was a problem hiding this comment.
Enforce symbol size limits for uploaded debug-info reads
Switching this call to s.fetch allows uploaded debug-info files to flow through fetchLidiaFromDebuginfod, but that path then does an unbounded io.ReadAll on the returned reader before limit checks; unlike the debuginfod HTTP client path, this bypasses SymbolizerMaxSymbolSizeBytes for uploaded files, so tenants with stricter symbol-size limits can still trigger large in-memory reads from object storage.
Useful? React with 👍 / 👎.
- Use path.Join instead of filepath.Join in lidiaObjectPath for OS-independent object storage keys - Restore tenant-scoped not-found cache in getLidiaBytes to reduce redundant object storage lookups for unknown build IDs - Extract MaxSizeReader from UploadReader into a composable wrapper - Handle simultaneous data+EOF in UploadReader.Read per io.Reader contract - Revert debug-specific IDE run configuration changes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@cursor-bugbot review |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for all 3 issues found in the latest run.
- ✅ Fixed: Silently swallowed errors hide uploaded debug info failures
- Added a debug log in
fetchwhen uploaded debug info retrieval fails before falling back to debuginfod so fallback-causing errors are visible.
- Added a debug log in
- ✅ Fixed: Not-found cache blocks newly uploaded debug info lookups
- Removed the tenant-scoped not-found cache short-circuit and write path from
getLidiaBytesso newly uploaded debug info is not blocked by stale not-found entries.
- Removed the tenant-scoped not-found cache short-circuit and write path from
- ✅ Fixed: MaxSizeReader drops read bytes on limit exceeded
- Updated
MaxSizeReader.Readto return the actual bytes read with the size-limit error and added a test to verify partial-read behavior.
- Updated
Or push these changes by commenting:
@cursor push 2ec2c9ddbb
Preview (2ec2c9ddbb)
diff --git a/pkg/debuginfo/reader/reader.go b/pkg/debuginfo/reader/reader.go
--- a/pkg/debuginfo/reader/reader.go
+++ b/pkg/debuginfo/reader/reader.go
@@ -94,7 +94,7 @@
n, err := r.r.Read(p)
r.n += int64(n)
if r.maxSize > 0 && r.n > r.maxSize {
- return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.n, r.maxSize)
+ return n, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.n, r.maxSize)
}
return n, err
}
diff --git a/pkg/debuginfo/reader/reader_test.go b/pkg/debuginfo/reader/reader_test.go
--- a/pkg/debuginfo/reader/reader_test.go
+++ b/pkg/debuginfo/reader/reader_test.go
@@ -170,6 +170,21 @@
assert.Contains(t, err.Error(), "exceeds maximum allowed size")
})
+ t.Run("returns already-read bytes when limit exceeded", func(t *testing.T) {
+ t.Parallel()
+ r := New(context.Background(), chunksFunc([][]byte{
+ []byte("abcdef"),
+ }))
+ lr := NewMaxSizeReader(r, 5)
+
+ buf := make([]byte, 16)
+ n, err := lr.Read(buf)
+ require.Error(t, err)
+ assert.Equal(t, 6, n)
+ assert.Equal(t, "abcdef", string(buf[:n]))
+ assert.Contains(t, err.Error(), "exceeds maximum allowed size")
+ })
+
t.Run("zero means unlimited", func(t *testing.T) {
t.Parallel()
r := New(context.Background(), chunksFunc([][]byte{
diff --git a/pkg/symbolizer/symbolizer.go b/pkg/symbolizer/symbolizer.go
--- a/pkg/symbolizer/symbolizer.go
+++ b/pkg/symbolizer/symbolizer.go
@@ -423,15 +423,6 @@
return nil, err
}
- // Check not-found cache with tenant-scoped key to avoid repeated object storage lookups
- // for build IDs that are known to be unavailable for this tenant.
- notFoundCacheKey := tenantID + "/" + buildID
- if httpClient, ok := s.client.(*DebuginfodHTTPClient); ok {
- if found, _ := httpClient.notFoundCache.Get(notFoundCacheKey); found {
- return nil, buildIDNotFoundError{buildID: buildID}
- }
- }
-
lidiaBytes, err := s.fetchLidiaFromObjectStore(ctx, tenantID, buildID)
if err == nil {
s.metrics.cacheOperations.WithLabelValues("object_storage", "get", statusSuccess).Inc()
@@ -441,13 +432,6 @@
lidiaBytes, err = s.fetchLidiaFromDebuginfod(ctx, buildID)
if err != nil {
- // Cache not-found errors with tenant-scoped key to avoid repeated lookups.
- var bnfErr buildIDNotFoundError
- if errors.As(err, &bnfErr) {
- if httpClient, ok := s.client.(*DebuginfodHTTPClient); ok {
- httpClient.notFoundCache.SetWithTTL(notFoundCacheKey, true, 1, httpClient.cfg.NotFoundCacheTTL)
- }
- }
return nil, err
}
@@ -514,6 +498,8 @@
func (s *Symbolizer) fetch(ctx context.Context, buildID string) (io.ReadCloser, error) {
if r, err := s.fetchFromUploadedDebugInfo(ctx, buildID); err == nil {
return r, nil
+ } else {
+ level.Debug(s.logger).Log("msg", "failed to fetch uploaded debug info, falling back to debuginfod", "buildID", buildID, "err", err)
}
return s.fetchFromDebuginfod(ctx, buildID)
}| return r, nil | ||
| } | ||
| return s.fetchFromDebuginfod(ctx, buildID) | ||
| } |
There was a problem hiding this comment.
Silently swallowed errors hide uploaded debug info failures
Medium Severity
The fetch method silently discards all errors from fetchFromUploadedDebugInfo, including non-transient errors like bucket connectivity issues or permission failures. When uploaded debug info exists but can't be read due to an infrastructure problem, the symbolizer silently falls through to debuginfod with no logging, making it impossible to diagnose why uploaded debug info is never used.
- Remove MaxSizeReader wrapper; check r.Size() after upload and delete oversized objects in store.go - Remove duplicate not-found cache in symbolizer (debuginfod client already caches not-found build IDs) - Bound io.ReadAll in fetchLidiaFromDebuginfod with readAllWithLimit to enforce SymbolizerMaxSymbolSizeBytes for uploaded debug-info files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Upload size check happens after full upload completes
- The upload now streams through an
io.LimitReadercapped atMaxUploadSize+1, preventing unbounded writes to object storage before size validation.
- The upload now streams through an
- ✅ Fixed: Metadata left as UPLOADING after size-exceeded rejection
- On size-exceeded rejection the code now deletes both the uploaded object and its metadata so retries are not blocked by stale
STATE_UPLOADINGmetadata.
- On size-exceeded rejection the code now deletes both the uploaded object and its metadata so retries are not blocked by stale
Or push these changes by commenting:
@cursor push faf97c8dda
Preview (faf97c8dda)
diff --git a/pkg/debuginfo/store.go b/pkg/debuginfo/store.go
--- a/pkg/debuginfo/store.go
+++ b/pkg/debuginfo/store.go
@@ -165,11 +165,16 @@
}
r := debuginforeader.New(ctx, readChunksFromStream(stream))
- if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), r); err != nil {
+ uploadReader := io.Reader(r)
+ if s.cfg.MaxUploadSize > 0 {
+ uploadReader = io.LimitReader(r, s.cfg.MaxUploadSize+1)
+ }
+ if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), uploadReader); err != nil {
return connect.NewError(connect.CodeInternal, fmt.Errorf("upload debuginfo: %w", err))
}
if s.cfg.MaxUploadSize > 0 && r.Size() > uint64(s.cfg.MaxUploadSize) {
_ = s.bucket.Delete(ctx, ObjectPath(tenantID, id))
+ _ = s.bucket.Delete(ctx, MetadataObjectPath(tenantID, id))
return connect.NewError(connect.CodeInvalidArgument,
fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize))
}
diff --git a/pkg/debuginfo/store_test.go b/pkg/debuginfo/store_test.go
--- a/pkg/debuginfo/store_test.go
+++ b/pkg/debuginfo/store_test.go
@@ -657,4 +657,83 @@
require.NoError(t, stream.CloseRequest())
require.NoError(t, stream.CloseResponse())
})
+
+ t.Run("oversized upload is rejected and upload state is cleaned up", func(t *testing.T) {
+ t.Parallel()
+ store, bucket := newTestStore(t, Config{
+ Enabled: true,
+ MaxUploadDuration: time.Minute,
+ MaxUploadSize: 8,
+ })
+ client := startTestServer(t, store)
+
+ ctx := tenant.InjectTenantID(context.Background(), "test-tenant")
+ stream := client.Upload(ctx)
+
+ err := stream.Send(&debuginfov1alpha1.UploadRequest{
+ Data: &debuginfov1alpha1.UploadRequest_Init{
+ Init: &debuginfov1alpha1.ShouldInitiateUploadRequest{
+ File: &debuginfov1alpha1.FileMetadata{
+ GnuBuildId: "aabbccdd",
+ Name: "my-binary",
+ Type: debuginfov1alpha1.FileMetadata_TYPE_EXECUTABLE_FULL,
+ },
+ },
+ },
+ })
+ require.NoError(t, err)
+
+ resp, err := stream.Receive()
+ require.NoError(t, err)
+ initResp := resp.GetInit()
+ require.NotNil(t, initResp)
+ assert.True(t, initResp.ShouldInitiateUpload)
+ assert.Equal(t, ReasonFirstTimeSeen, initResp.Reason)
+
+ for _, chunk := range [][]byte{[]byte("12345"), []byte("67890")} {
+ err = stream.Send(&debuginfov1alpha1.UploadRequest{
+ Data: &debuginfov1alpha1.UploadRequest_Chunk{
+ Chunk: &debuginfov1alpha1.UploadChunk{Chunk: chunk},
+ },
+ })
+ require.NoError(t, err)
+ }
+
+ require.NoError(t, stream.CloseRequest())
+ _, err = stream.Receive()
+ require.Error(t, err)
+ assert.Equal(t, connect.CodeInvalidArgument, connect.CodeOf(err))
+ assert.Contains(t, err.Error(), "exceeds maximum allowed size")
+ require.NoError(t, stream.CloseResponse())
+
+ id := mustValidateGnuBuildID(t, "aabbccdd")
+ objects := bucket.Objects()
+ _, objectExists := objects[ObjectPath("test-tenant", id)]
+ _, metadataExists := objects[MetadataObjectPath("test-tenant", id)]
+ assert.False(t, objectExists)
+ assert.False(t, metadataExists)
+
+ retryStream := client.Upload(ctx)
+ err = retryStream.Send(&debuginfov1alpha1.UploadRequest{
+ Data: &debuginfov1alpha1.UploadRequest_Init{
+ Init: &debuginfov1alpha1.ShouldInitiateUploadRequest{
+ File: &debuginfov1alpha1.FileMetadata{
+ GnuBuildId: "aabbccdd",
+ Name: "my-binary",
+ Type: debuginfov1alpha1.FileMetadata_TYPE_EXECUTABLE_FULL,
+ },
+ },
+ },
+ })
+ require.NoError(t, err)
+
+ retryResp, err := retryStream.Receive()
+ require.NoError(t, err)
+ retryInitResp := retryResp.GetInit()
+ require.NotNil(t, retryInitResp)
+ assert.True(t, retryInitResp.ShouldInitiateUpload)
+ assert.Equal(t, ReasonFirstTimeSeen, retryInitResp.Reason)
+ require.NoError(t, retryStream.CloseRequest())
+ require.NoError(t, retryStream.CloseResponse())
+ })
}| _ = s.bucket.Delete(ctx, ObjectPath(tenantID, id)) | ||
| return connect.NewError(connect.CodeInvalidArgument, | ||
| fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize)) | ||
| } |
There was a problem hiding this comment.
Upload size check happens after full upload completes
Medium Severity
The MaxUploadSize validation occurs after s.bucket.Upload has already written the entire payload to object storage. An authenticated client can stream unlimited data (in 5MB chunks) to the bucket; only after the upload finishes is the size checked and the object deleted. This enables resource exhaustion of the storage backend since there is no streaming size limit enforced during the upload itself.
| _ = s.bucket.Delete(ctx, ObjectPath(tenantID, id)) | ||
| return connect.NewError(connect.CodeInvalidArgument, | ||
| fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize)) | ||
| } |
There was a problem hiding this comment.
Metadata left as UPLOADING after size-exceeded rejection
Medium Severity
When an upload is rejected for exceeding MaxUploadSize, the object file is deleted via s.bucket.Delete but the metadata (written at line 163 as STATE_UPLOADING) is never cleaned up or updated. Subsequent upload attempts for the same build ID will see the orphaned STATE_UPLOADING metadata and be rejected with "upload in progress" until the staleness timer (MaxUploadDuration + 2 minutes) expires, effectively creating a temporary denial-of-service window for that build ID.



Summary
Add a new
DebuginfoServicethat allows profiling agents (e.g., the OTel eBPF profiler) to upload debug information (ELF executables) to Pyroscope's object storage. The symbolizer then uses these uploaded files as an additional source when resolving symbols, falling back to debuginfod if no upload is available.This is an ALPHA state implementation for internal use and testing. Breaking changes will follow.
What's included
debuginfo.v1alpha1): Bidirectional streamingUploadRPC using Connect. Clients send aShouldInitiateUploadRequest, receive a response indicating whether to proceed, then streamUploadChunkmessages. The API is markedv1alpha1to signal experimental status.pkg/debuginfo/store.go): Handles upload lifecycle — validates GNU build IDs, checks for duplicate/in-progress/stale uploads viaObjectMetadatapersisted as JSON in object storage, streams the uploaded binary to the bucket.pkg/debuginfo/reader/): Anio.Readeradapter that assembles streamed chunks into a contiguous reader for the object storage upload.symbolizer/,debug-info/).Key design decisions
UPLOADING/UPLOADED) is tracked via a JSON metadata file alongside the binary in the bucket, avoiding the need for an additional database.MaxUploadDuration + 2min, it's considered stale and can be retried.Test plan
Storeupload flow, metadata lifecycle, and edge cases (pkg/debuginfo/store_test.go)UploadReaderchunk assembly (pkg/debuginfo/reader/reader_test.go)pkg/symbolizer/symbolizer_test.go)🤖 Generated with Claude Code
Note
Medium Risk
Adds a new authenticated streaming upload endpoint that writes executables/metadata to shared object storage and changes symbolization to read from those uploads, which can affect storage usage and symbol resolution behavior if misconfigured or abused.
Overview
Adds an alpha Connect bidirectional streaming
DebuginfoService.UploadAPI (debuginfo.v1alpha1) plus generated Go/OpenAPI artifacts, enabling agents to upload ELF executables in chunks after an init/"should upload" handshake.Implements
pkg/debuginfo.Storeto validate GNU build IDs, track upload state via JSONObjectMetadatain the bucket, detect stale/in-progress uploads, enforce max upload size/duration, and stream chunks into object storage via a newUploadReaderhelper.Wires the service into the distributor HTTP API with new
connectOptionsDebugInfo(auth + 5MB read limit) and configuration flags, and updates the symbolizer to (1) use tenant-scoped object-store keys for cached Lidia data and (2) try fetching uploaded debuginfo (debug-info/<tenant>/<buildid>/exe) before calling debuginfod; tests updated/added accordingly.Written by Cursor Bugbot for commit 68c2fab. This will update automatically on new commits. Configure here.