Skip to content

feat: add debug info upload service#4648

Merged
korniltsev-grafanista merged 45 commits into
mainfrom
debuginfo-upload
Mar 13, 2026
Merged

feat: add debug info upload service#4648
korniltsev-grafanista merged 45 commits into
mainfrom
debuginfo-upload

Conversation

@korniltsev-grafanista
Copy link
Copy Markdown
Contributor

@korniltsev-grafanista korniltsev-grafanista commented Nov 28, 2025

Summary

Add a new DebuginfoService that allows profiling agents (e.g., the OTel eBPF profiler) to upload debug information (ELF executables) to Pyroscope's object storage. The symbolizer then uses these uploaded files as an additional source when resolving symbols, falling back to debuginfod if no upload is available.

This is an ALPHA state implementation for internal use and testing. Breaking changes will follow.

What's included

  • New proto API (debuginfo.v1alpha1): Bidirectional streaming Upload RPC using Connect. Clients send a ShouldInitiateUploadRequest, receive a response indicating whether to proceed, then stream UploadChunk messages. The API is marked v1alpha1 to signal experimental status.
  • Debug info store (pkg/debuginfo/store.go): Handles upload lifecycle — validates GNU build IDs, checks for duplicate/in-progress/stale uploads via ObjectMetadata persisted as JSON in object storage, streams the uploaded binary to the bucket.
  • Upload reader (pkg/debuginfo/reader/): An io.Reader adapter that assembles streamed chunks into a contiguous reader for the object storage upload.
  • Symbolizer integration: The symbolizer now checks the debug info upload bucket (keyed by tenant + GNU build ID) before falling back to debuginfod. The symbolizer bucket was changed from a prefixed bucket to the shared storage bucket, with explicit path prefixes (symbolizer/, debug-info/).
  • API wiring: The service is registered on the distributor's HTTP handler with auth and a 5MB per-message limit.

Key design decisions

  • Own proto, not Parca's: Inspired by Parca's debug info upload but uses Pyroscope's own protobuf definitions with Connect-Go handlers instead of gRPC.
  • Metadata in object storage: Upload state (UPLOADING/UPLOADED) is tracked via a JSON metadata file alongside the binary in the bucket, avoiding the need for an additional database.
  • Stale upload recovery: If a previous upload started but never finished and exceeds MaxUploadDuration + 2min, it's considered stale and can be retried.

Test plan

  • Unit tests for Store upload flow, metadata lifecycle, and edge cases (pkg/debuginfo/store_test.go)
  • Unit tests for UploadReader chunk assembly (pkg/debuginfo/reader/reader_test.go)
  • Symbolizer tests for the uploaded debug info fetch path (pkg/symbolizer/symbolizer_test.go)
  • Manual test: upload debug info via profiling agent, verify symbolization uses uploaded file

🤖 Generated with Claude Code


Note

Medium Risk
Adds a new authenticated streaming upload endpoint that writes executables/metadata to shared object storage and changes symbolization to read from those uploads, which can affect storage usage and symbol resolution behavior if misconfigured or abused.

Overview
Adds an alpha Connect bidirectional streaming DebuginfoService.Upload API (debuginfo.v1alpha1) plus generated Go/OpenAPI artifacts, enabling agents to upload ELF executables in chunks after an init/"should upload" handshake.

Implements pkg/debuginfo.Store to validate GNU build IDs, track upload state via JSON ObjectMetadata in the bucket, detect stale/in-progress uploads, enforce max upload size/duration, and stream chunks into object storage via a new UploadReader helper.

Wires the service into the distributor HTTP API with new connectOptionsDebugInfo (auth + 5MB read limit) and configuration flags, and updates the symbolizer to (1) use tenant-scoped object-store keys for cached Lidia data and (2) try fetching uploaded debuginfo (debug-info/<tenant>/<buildid>/exe) before calling debuginfod; tests updated/added accordingly.

Written by Cursor Bugbot for commit 68c2fab. This will update automatically on new commits. Configure here.

@korniltsev-grafanista korniltsev-grafanista force-pushed the debuginfo-upload branch 5 times, most recently from d202551 to 8577f33 Compare January 16, 2026 07:36
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@korniltsev-grafanista korniltsev-grafanista changed the title WIP parca debug-info upload parca debug-info upload Jan 16, 2026
@korniltsev-grafanista korniltsev-grafanista marked this pull request as ready for review January 16, 2026 08:05
korniltsev-grafanista and others added 5 commits January 19, 2026 05:45
Add debug-level logging to help troubleshoot debuginfo upload issues:
- Log ShouldInitiateUpload results (build_id, decision, reason)
- Log Upload gRPC results (success with size, or failure with error)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Debuginfod fetching is now handled in the symbolizer instead of in the
parca debuginfo store. This removes the debuginfod client implementation,
related tests, and test fixtures from the parca/debuginfo package.

Also removes tenant ID handling from the debuginfo store since it's not
needed for this use case.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…test

Refactor the store test to use an HTTP/2 server with h2c and the
util.AuthenticateUser middleware instead of a direct gRPC server.
This better reflects the production setup where requests go through
HTTP middleware for tenant authentication.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comment thread pkg/parca/debuginfo/metadata.go Outdated
Comment thread pkg/parca/debuginfo/store.go Outdated
Comment thread pkg/parca/debuginfo/store.go Outdated
Comment thread pkg/symbolizer/symbolizer.go Outdated
Comment thread pkg/debuginfo/parcaglue.go Outdated

t := noop.Tracer{}
l = log.With(l, "component", "debug-info")
bucket = objstore.NewPrefixedBucket(bucket, symbolizer.BucketPrefixParcaDebugInfo)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't have a cleanup mechanism to control unbounded growth (neither for symbolizer path). Could it be an issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep lidia's indefinitely. And preferably to remove the debug information files once they are converted to lidia. All of this is hard to implement in a stateless manner in distributors, so I propose to postpone these decision until we have a proper place/way to implement this https://github.com/grafana/pyroscope-squad/issues/687

Comment thread pkg/parca/cache/cache.go Outdated
@korniltsev-grafanista korniltsev-grafanista marked this pull request as draft February 16, 2026 07:25
@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

Just wanted to clarify, the PR was not ready for review, only the new protobufs and overall intention to use the new protobufs.

I also think we should have an idea how to move forward with infinite data growth (both lidia and debug info)

@korniltsev-grafanista korniltsev-grafanista changed the title parca debug-info upload debug-info upload Mar 13, 2026
@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

korniltsev-grafanista commented Mar 13, 2026

I also think we should have an idea how to move forward with infinite data growth (both lidia and debug info)

I talked with @\simonswine and we agreed to solve this later, and merge this for private use, not for public.

korniltsev-grafanista and others added 3 commits March 13, 2026 10:59
Remove parcaBucket parameter from test helper and all callers
after the symbolizer was refactored to use a single bucket for
both lidia and parca debug info storage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move proto package from debuginfo.v1 to debuginfo.v1alpha1 and update
all generated code and consumer imports accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
korniltsev-grafanista and others added 2 commits March 13, 2026 11:26
- Rename proto fields to snake_case (gnu_build_id, go_build_id, otel_file_id)
- Rename enum DebuginfoType → Type, add TYPE_UNSPECIFIED, fix prefix
- Fix bug: set STATE_UPLOADED after upload completes
- Replace gRPC status.Error with connect.NewError for consistency
- Rename fetchFromParca → fetchFromUploadedDebugInfo
- Revert ProcessELFData back to private method on Symbolizer

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9757e94dc9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/symbolizer/symbolizer.go
Comment thread pkg/debuginfo/store.go
#4896)

* add unit tests for debuginfo service

Cover the store logic (ValidateGnuBuildID, validateInit, NewStore,
shouldInitiateUpload, uploadIsStale, fetchMetadata, writeMetadata,
ObjectPath, MetadataObjectPath) and the chunked upload reader
(Read, Size, context cancellation) with ~43 test cases across
two new test files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix store_test.go to use base branch protobuf field names

The base branch (debuginfo-upload) uses GnuBuildId and
FileMetadata_TYPE_* naming in the generated protobuf, while
this worktree had a divergent proto with GNU and
DEBUGINFO_TYPE_* names. Align the test file with the base
branch so CI compiles correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix gofmt alignment in store_test.go

Align struct field formatting to pass gofmt check in CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* add end-to-end connect client tests for debuginfo Upload RPC

Add TestUploadE2E with three subtests that exercise the full Upload
bidi streaming RPC over a real HTTP/2 test server with connect client:

- full upload flow: init -> chunks -> verify stored data and metadata
- already uploaded: pre-populate metadata, verify service declines upload
- disabled service: verify service returns ReasonDisabled

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove unused net/http import from store_test.go

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix e2e test to use TLS for HTTP/2 bidi streaming

BidiStream requires HTTP/2. Switch from h2c (cleartext) to TLS via
StartTLS() which negotiates HTTP/2 through ALPN. Remove unused
h2c and http2 imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix store.go and e2e test: align with base branch protobuf, fix upload state bug

- Align store.go with base branch protobuf field names (GnuBuildId,
  TYPE_EXECUTABLE_FULL/NO_TEXT) and use connect.NewError instead of
  grpc status.Error
- Add missing md.State = STATE_UPLOADED after successful upload
- Fix e2e test to wait for server handler completion by calling
  Receive() after CloseRequest() before checking bucket contents

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@korniltsev-grafanista korniltsev-grafanista changed the title debug-info upload feat: add debug info upload service Mar 13, 2026
- Fix wrong error messages: "init.FileData" → "init.File" to match actual field name
- Enforce MaxUploadSize in UploadReader to prevent unlimited chunked uploads
- Tenant-isolate lidia cache paths to prevent cross-tenant cache poisoning

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

@codex review

@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

@cursor-bugbot review

@korniltsev-grafanista korniltsev-grafanista marked this pull request as ready for review March 13, 2026 03:42
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 4 issues found in the latest run.

  • ✅ Fixed: filepath.Join produces wrong paths on non-Unix OS
    • Changed lidiaObjectPath to use path.Join so object-storage keys always use forward slashes across OSes.
  • ✅ Fixed: Removed not-found cache causes extra bucket lookups
    • Reintroduced the debuginfod not-found cache short-circuit in getLidiaBytes to avoid unnecessary object-store lookups for known-missing build IDs.
  • ✅ Fixed: IDE run config commits debug-level logging settings
    • Removed the debug logging/output-file/large upload-size flags from the shared .idea v2 run configuration and restored non-debug defaults.
  • ✅ Fixed: Reader drops bytes when current chunk ends with data
    • Reworked UploadReader.Read to return already-read bytes before advancing chunks and added tests covering n>0 with io.EOF behavior.

Create PR

Or push these changes by commenting:

@cursor push 5b9bdf720c
Preview (5b9bdf720c)
diff --git a/.idea/runConfigurations/v2.xml b/.idea/runConfigurations/v2.xml
--- a/.idea/runConfigurations/v2.xml
+++ b/.idea/runConfigurations/v2.xml
@@ -1,10 +1,9 @@
 <component name="ProjectRunConfigurationManager">
   <configuration default="false" name="v2" type="GoApplicationRunConfiguration" factoryName="Go Application">
-    <output_file path="intellij-pyroscope.log" is_save="true" />
     <module name="pyroscope" />
     <working_directory value="$PROJECT_DIR$" />
     <go_parameters value="-tags embedassets" />
-    <parameters value="-log.level=debug -storage.backend=filesystem -write-path=segment-writer -enable-query-backend=true -symbolizer.enabled=true -debug-info.max-upload-size=1000000000" />
+    <parameters value="-storage.backend=filesystem -write-path=segment-writer -enable-query-backend=true" />
     <envs>
       <env name="PYROSCOPE_V2" value="true" />
     </envs>

diff --git a/pkg/debuginfo/reader/reader.go b/pkg/debuginfo/reader/reader.go
--- a/pkg/debuginfo/reader/reader.go
+++ b/pkg/debuginfo/reader/reader.go
@@ -24,39 +24,39 @@
 }
 
 func (r *UploadReader) Read(p []byte) (int, error) {
-	if r.cur == nil {
-		var err error
-		r.cur, err = r.next()
-		if err == io.EOF {
-			return 0, io.EOF
+	for {
+		if r.cur == nil {
+			var err error
+			r.cur, err = r.next()
+			if err == io.EOF {
+				return 0, io.EOF
+			}
+			if err != nil {
+				return 0, fmt.Errorf("get upload chunk (%d bytes read so far): %w", r.size, err)
+			}
 		}
-		if err != nil {
-			return 0, fmt.Errorf("get first upload chunk: %w", err)
+
+		i, err := r.cur.Read(p)
+		if i > 0 {
+			r.size += uint64(i)
+			if r.maxSize > 0 && r.size > uint64(r.maxSize) {
+				return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.size, r.maxSize)
+			}
+			if err == io.EOF {
+				r.cur = nil
+			}
+			return i, nil
 		}
-	}
-	i, err := r.cur.Read(p)
-	if err != nil && err != io.EOF {
-		return 0, fmt.Errorf("read upload chunk (%d bytes read so far): %w", r.size, err)
-	}
-	if err == io.EOF {
-		r.cur, err = r.next()
+
 		if err == io.EOF {
-			return 0, io.EOF
+			r.cur = nil
+			continue
 		}
 		if err != nil {
-			return 0, fmt.Errorf("get next upload chunk (%d bytes read so far): %w", r.size, err)
+			return 0, fmt.Errorf("read upload chunk (%d bytes read so far): %w", r.size, err)
 		}
-		i, err = r.cur.Read(p)
-		if err != nil {
-			return 0, fmt.Errorf("read next upload chunk (%d bytes read so far): %w", r.size, err)
-		}
+		return 0, nil
 	}
-
-	r.size += uint64(i)
-	if r.maxSize > 0 && r.size > uint64(r.maxSize) {
-		return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.size, r.maxSize)
-	}
-	return i, nil
 }
 
 func (r *UploadReader) next() (io.Reader, error) {

diff --git a/pkg/debuginfo/reader/reader_test.go b/pkg/debuginfo/reader/reader_test.go
--- a/pkg/debuginfo/reader/reader_test.go
+++ b/pkg/debuginfo/reader/reader_test.go
@@ -22,6 +22,20 @@
 	}
 }
 
+type eofAfterDataReader struct {
+	data []byte
+	read bool
+}
+
+func (r *eofAfterDataReader) Read(p []byte) (int, error) {
+	if r.read {
+		return 0, io.EOF
+	}
+	r.read = true
+	n := copy(p, r.data)
+	return n, io.EOF
+}
+
 func TestUploadReader_Read(t *testing.T) {
 	t.Parallel()
 
@@ -102,6 +116,28 @@
 		require.Error(t, err)
 		assert.Contains(t, err.Error(), "broken")
 	})
+
+	t.Run("reader returning data with EOF keeps bytes", func(t *testing.T) {
+		t.Parallel()
+
+		r := New(context.Background(), chunksFunc(nil), 0)
+		r.cur = &eofAfterDataReader{data: []byte("abc")}
+
+		data, err := io.ReadAll(r)
+		require.NoError(t, err)
+		assert.Equal(t, "abc", string(data))
+	})
+
+	t.Run("reader returning data with EOF does not overwrite buffer", func(t *testing.T) {
+		t.Parallel()
+
+		r := New(context.Background(), chunksFunc([][]byte{[]byte("def")}), 0)
+		r.cur = &eofAfterDataReader{data: []byte("abc")}
+
+		data, err := io.ReadAll(r)
+		require.NoError(t, err)
+		assert.Equal(t, "abcdef", string(data))
+	})
 }
 
 func TestUploadReader_Size(t *testing.T) {

diff --git a/pkg/symbolizer/symbolizer.go b/pkg/symbolizer/symbolizer.go
--- a/pkg/symbolizer/symbolizer.go
+++ b/pkg/symbolizer/symbolizer.go
@@ -9,6 +9,7 @@
 	"fmt"
 	"io"
 	"net/http"
+	"path"
 	"path/filepath"
 	"time"
 
@@ -417,6 +418,15 @@
 }
 
 func (s *Symbolizer) getLidiaBytes(ctx context.Context, buildID string) ([]byte, error) {
+	if client, ok := s.client.(*DebuginfodHTTPClient); ok {
+		if sanitizedBuildID, err := sanitizeBuildID(buildID); err == nil {
+			if found, _ := client.notFoundCache.Get(sanitizedBuildID); found {
+				s.metrics.cacheOperations.WithLabelValues("not_found", "get", statusSuccess).Inc()
+				return nil, buildIDNotFoundError{buildID: buildID}
+			}
+		}
+	}
+
 	tenantID, err := tenant.TenantID(ctx)
 	if err != nil {
 		return nil, err
@@ -461,7 +471,7 @@
 }
 
 func lidiaObjectPath(tenantID, buildID string) string {
-	return filepath.Join(bucketPrefix, tenantID, buildID)
+	return path.Join(bucketPrefix, tenantID, buildID)
 }
 
 // fetchLidiaFromDebuginfod fetches debug info from debuginfod and converts to Lidia format

Comment thread pkg/symbolizer/symbolizer.go
Comment thread pkg/symbolizer/symbolizer.go
Comment thread .idea/runConfigurations/v2.xml Outdated
Comment thread pkg/debuginfo/reader/reader.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b33f1a7572

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/debuginfo/store.go
Comment on lines +168 to +172
if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), r); err != nil {
return connect.NewError(connect.CodeInternal, fmt.Errorf("upload debuginfo: %w", err))
}
md.State = debuginfov1alpha1.ObjectMetadata_STATE_UPLOADED
md.FinishedAt = timestamppb.New(time.Now())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject empty upload streams before marking metadata uploaded

If a client sends init and then closes the stream without any chunk, debuginforeader.New(...).Read returns EOF immediately, bucket.Upload succeeds with a zero-byte object, and this code still marks metadata as STATE_UPLOADED; from that point shouldInitiateUpload returns ReasonDebuginfoAlreadyExists, so the tenant is effectively stuck with an unusable debug-info object until manual cleanup.

Useful? React with 👍 / 👎.

// fetchLidiaFromDebuginfod fetches debug info from debuginfod and converts to Lidia format
func (s *Symbolizer) fetchLidiaFromDebuginfod(ctx context.Context, buildID string) ([]byte, error) {
debugReader, err := s.fetchFromDebuginfod(ctx, buildID)
debugReader, err := s.fetch(ctx, buildID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce symbol size limits for uploaded debug-info reads

Switching this call to s.fetch allows uploaded debug-info files to flow through fetchLidiaFromDebuginfod, but that path then does an unbounded io.ReadAll on the returned reader before limit checks; unlike the debuginfod HTTP client path, this bypasses SymbolizerMaxSymbolSizeBytes for uploaded files, so tenants with stricter symbol-size limits can still trigger large in-memory reads from object storage.

Useful? React with 👍 / 👎.

- Use path.Join instead of filepath.Join in lidiaObjectPath for
  OS-independent object storage keys
- Restore tenant-scoped not-found cache in getLidiaBytes to reduce
  redundant object storage lookups for unknown build IDs
- Extract MaxSizeReader from UploadReader into a composable wrapper
- Handle simultaneous data+EOF in UploadReader.Read per io.Reader contract
- Revert debug-specific IDE run configuration changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@korniltsev-grafanista
Copy link
Copy Markdown
Contributor Author

@cursor-bugbot review

Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Silently swallowed errors hide uploaded debug info failures
    • Added a debug log in fetch when uploaded debug info retrieval fails before falling back to debuginfod so fallback-causing errors are visible.
  • ✅ Fixed: Not-found cache blocks newly uploaded debug info lookups
    • Removed the tenant-scoped not-found cache short-circuit and write path from getLidiaBytes so newly uploaded debug info is not blocked by stale not-found entries.
  • ✅ Fixed: MaxSizeReader drops read bytes on limit exceeded
    • Updated MaxSizeReader.Read to return the actual bytes read with the size-limit error and added a test to verify partial-read behavior.

Create PR

Or push these changes by commenting:

@cursor push 2ec2c9ddbb
Preview (2ec2c9ddbb)
diff --git a/pkg/debuginfo/reader/reader.go b/pkg/debuginfo/reader/reader.go
--- a/pkg/debuginfo/reader/reader.go
+++ b/pkg/debuginfo/reader/reader.go
@@ -94,7 +94,7 @@
 	n, err := r.r.Read(p)
 	r.n += int64(n)
 	if r.maxSize > 0 && r.n > r.maxSize {
-		return 0, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.n, r.maxSize)
+		return n, fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.n, r.maxSize)
 	}
 	return n, err
 }

diff --git a/pkg/debuginfo/reader/reader_test.go b/pkg/debuginfo/reader/reader_test.go
--- a/pkg/debuginfo/reader/reader_test.go
+++ b/pkg/debuginfo/reader/reader_test.go
@@ -170,6 +170,21 @@
 		assert.Contains(t, err.Error(), "exceeds maximum allowed size")
 	})
 
+	t.Run("returns already-read bytes when limit exceeded", func(t *testing.T) {
+		t.Parallel()
+		r := New(context.Background(), chunksFunc([][]byte{
+			[]byte("abcdef"),
+		}))
+		lr := NewMaxSizeReader(r, 5)
+
+		buf := make([]byte, 16)
+		n, err := lr.Read(buf)
+		require.Error(t, err)
+		assert.Equal(t, 6, n)
+		assert.Equal(t, "abcdef", string(buf[:n]))
+		assert.Contains(t, err.Error(), "exceeds maximum allowed size")
+	})
+
 	t.Run("zero means unlimited", func(t *testing.T) {
 		t.Parallel()
 		r := New(context.Background(), chunksFunc([][]byte{

diff --git a/pkg/symbolizer/symbolizer.go b/pkg/symbolizer/symbolizer.go
--- a/pkg/symbolizer/symbolizer.go
+++ b/pkg/symbolizer/symbolizer.go
@@ -423,15 +423,6 @@
 		return nil, err
 	}
 
-	// Check not-found cache with tenant-scoped key to avoid repeated object storage lookups
-	// for build IDs that are known to be unavailable for this tenant.
-	notFoundCacheKey := tenantID + "/" + buildID
-	if httpClient, ok := s.client.(*DebuginfodHTTPClient); ok {
-		if found, _ := httpClient.notFoundCache.Get(notFoundCacheKey); found {
-			return nil, buildIDNotFoundError{buildID: buildID}
-		}
-	}
-
 	lidiaBytes, err := s.fetchLidiaFromObjectStore(ctx, tenantID, buildID)
 	if err == nil {
 		s.metrics.cacheOperations.WithLabelValues("object_storage", "get", statusSuccess).Inc()
@@ -441,13 +432,6 @@
 
 	lidiaBytes, err = s.fetchLidiaFromDebuginfod(ctx, buildID)
 	if err != nil {
-		// Cache not-found errors with tenant-scoped key to avoid repeated lookups.
-		var bnfErr buildIDNotFoundError
-		if errors.As(err, &bnfErr) {
-			if httpClient, ok := s.client.(*DebuginfodHTTPClient); ok {
-				httpClient.notFoundCache.SetWithTTL(notFoundCacheKey, true, 1, httpClient.cfg.NotFoundCacheTTL)
-			}
-		}
 		return nil, err
 	}
 
@@ -514,6 +498,8 @@
 func (s *Symbolizer) fetch(ctx context.Context, buildID string) (io.ReadCloser, error) {
 	if r, err := s.fetchFromUploadedDebugInfo(ctx, buildID); err == nil {
 		return r, nil
+	} else {
+		level.Debug(s.logger).Log("msg", "failed to fetch uploaded debug info, falling back to debuginfod", "buildID", buildID, "err", err)
 	}
 	return s.fetchFromDebuginfod(ctx, buildID)
 }

return r, nil
}
return s.fetchFromDebuginfod(ctx, buildID)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silently swallowed errors hide uploaded debug info failures

Medium Severity

The fetch method silently discards all errors from fetchFromUploadedDebugInfo, including non-transient errors like bucket connectivity issues or permission failures. When uploaded debug info exists but can't be read due to an infrastructure problem, the symbolizer silently falls through to debuginfod with no logging, making it impossible to diagnose why uploaded debug info is never used.

Fix in Cursor Fix in Web

Comment thread pkg/symbolizer/symbolizer.go Outdated
Comment thread pkg/debuginfo/reader/reader.go Outdated
- Remove MaxSizeReader wrapper; check r.Size() after upload and delete
  oversized objects in store.go
- Remove duplicate not-found cache in symbolizer (debuginfod client
  already caches not-found build IDs)
- Bound io.ReadAll in fetchLidiaFromDebuginfod with readAllWithLimit
  to enforce SymbolizerMaxSymbolSizeBytes for uploaded debug-info files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Upload size check happens after full upload completes
    • The upload now streams through an io.LimitReader capped at MaxUploadSize+1, preventing unbounded writes to object storage before size validation.
  • ✅ Fixed: Metadata left as UPLOADING after size-exceeded rejection
    • On size-exceeded rejection the code now deletes both the uploaded object and its metadata so retries are not blocked by stale STATE_UPLOADING metadata.

Create PR

Or push these changes by commenting:

@cursor push faf97c8dda
Preview (faf97c8dda)
diff --git a/pkg/debuginfo/store.go b/pkg/debuginfo/store.go
--- a/pkg/debuginfo/store.go
+++ b/pkg/debuginfo/store.go
@@ -165,11 +165,16 @@
 	}
 
 	r := debuginforeader.New(ctx, readChunksFromStream(stream))
-	if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), r); err != nil {
+	uploadReader := io.Reader(r)
+	if s.cfg.MaxUploadSize > 0 {
+		uploadReader = io.LimitReader(r, s.cfg.MaxUploadSize+1)
+	}
+	if err := s.bucket.Upload(ctx, ObjectPath(tenantID, id), uploadReader); err != nil {
 		return connect.NewError(connect.CodeInternal, fmt.Errorf("upload debuginfo: %w", err))
 	}
 	if s.cfg.MaxUploadSize > 0 && r.Size() > uint64(s.cfg.MaxUploadSize) {
 		_ = s.bucket.Delete(ctx, ObjectPath(tenantID, id))
+		_ = s.bucket.Delete(ctx, MetadataObjectPath(tenantID, id))
 		return connect.NewError(connect.CodeInvalidArgument,
 			fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize))
 	}

diff --git a/pkg/debuginfo/store_test.go b/pkg/debuginfo/store_test.go
--- a/pkg/debuginfo/store_test.go
+++ b/pkg/debuginfo/store_test.go
@@ -657,4 +657,83 @@
 		require.NoError(t, stream.CloseRequest())
 		require.NoError(t, stream.CloseResponse())
 	})
+
+	t.Run("oversized upload is rejected and upload state is cleaned up", func(t *testing.T) {
+		t.Parallel()
+		store, bucket := newTestStore(t, Config{
+			Enabled:           true,
+			MaxUploadDuration: time.Minute,
+			MaxUploadSize:     8,
+		})
+		client := startTestServer(t, store)
+
+		ctx := tenant.InjectTenantID(context.Background(), "test-tenant")
+		stream := client.Upload(ctx)
+
+		err := stream.Send(&debuginfov1alpha1.UploadRequest{
+			Data: &debuginfov1alpha1.UploadRequest_Init{
+				Init: &debuginfov1alpha1.ShouldInitiateUploadRequest{
+					File: &debuginfov1alpha1.FileMetadata{
+						GnuBuildId: "aabbccdd",
+						Name:       "my-binary",
+						Type:       debuginfov1alpha1.FileMetadata_TYPE_EXECUTABLE_FULL,
+					},
+				},
+			},
+		})
+		require.NoError(t, err)
+
+		resp, err := stream.Receive()
+		require.NoError(t, err)
+		initResp := resp.GetInit()
+		require.NotNil(t, initResp)
+		assert.True(t, initResp.ShouldInitiateUpload)
+		assert.Equal(t, ReasonFirstTimeSeen, initResp.Reason)
+
+		for _, chunk := range [][]byte{[]byte("12345"), []byte("67890")} {
+			err = stream.Send(&debuginfov1alpha1.UploadRequest{
+				Data: &debuginfov1alpha1.UploadRequest_Chunk{
+					Chunk: &debuginfov1alpha1.UploadChunk{Chunk: chunk},
+				},
+			})
+			require.NoError(t, err)
+		}
+
+		require.NoError(t, stream.CloseRequest())
+		_, err = stream.Receive()
+		require.Error(t, err)
+		assert.Equal(t, connect.CodeInvalidArgument, connect.CodeOf(err))
+		assert.Contains(t, err.Error(), "exceeds maximum allowed size")
+		require.NoError(t, stream.CloseResponse())
+
+		id := mustValidateGnuBuildID(t, "aabbccdd")
+		objects := bucket.Objects()
+		_, objectExists := objects[ObjectPath("test-tenant", id)]
+		_, metadataExists := objects[MetadataObjectPath("test-tenant", id)]
+		assert.False(t, objectExists)
+		assert.False(t, metadataExists)
+
+		retryStream := client.Upload(ctx)
+		err = retryStream.Send(&debuginfov1alpha1.UploadRequest{
+			Data: &debuginfov1alpha1.UploadRequest_Init{
+				Init: &debuginfov1alpha1.ShouldInitiateUploadRequest{
+					File: &debuginfov1alpha1.FileMetadata{
+						GnuBuildId: "aabbccdd",
+						Name:       "my-binary",
+						Type:       debuginfov1alpha1.FileMetadata_TYPE_EXECUTABLE_FULL,
+					},
+				},
+			},
+		})
+		require.NoError(t, err)
+
+		retryResp, err := retryStream.Receive()
+		require.NoError(t, err)
+		retryInitResp := retryResp.GetInit()
+		require.NotNil(t, retryInitResp)
+		assert.True(t, retryInitResp.ShouldInitiateUpload)
+		assert.Equal(t, ReasonFirstTimeSeen, retryInitResp.Reason)
+		require.NoError(t, retryStream.CloseRequest())
+		require.NoError(t, retryStream.CloseResponse())
+	})
 }

Comment thread pkg/debuginfo/store.go
_ = s.bucket.Delete(ctx, ObjectPath(tenantID, id))
return connect.NewError(connect.CodeInvalidArgument,
fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upload size check happens after full upload completes

Medium Severity

The MaxUploadSize validation occurs after s.bucket.Upload has already written the entire payload to object storage. An authenticated client can stream unlimited data (in 5MB chunks) to the bucket; only after the upload finishes is the size checked and the object deleted. This enables resource exhaustion of the storage backend since there is no streaming size limit enforced during the upload itself.

Fix in Cursor Fix in Web

Comment thread pkg/debuginfo/store.go
_ = s.bucket.Delete(ctx, ObjectPath(tenantID, id))
return connect.NewError(connect.CodeInvalidArgument,
fmt.Errorf("upload size %d exceeds maximum allowed size of %d bytes", r.Size(), s.cfg.MaxUploadSize))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata left as UPLOADING after size-exceeded rejection

Medium Severity

When an upload is rejected for exceeding MaxUploadSize, the object file is deleted via s.bucket.Delete but the metadata (written at line 163 as STATE_UPLOADING) is never cleaned up or updated. Subsequent upload attempts for the same build ID will see the orphaned STATE_UPLOADING metadata and be rejected with "upload in progress" until the staleness timer (MaxUploadDuration + 2 minutes) expires, effectively creating a temporary denial-of-service window for that build ID.

Additional Locations (1)
Fix in Cursor Fix in Web

@korniltsev-grafanista korniltsev-grafanista merged commit e9a5181 into main Mar 13, 2026
25 checks passed
@korniltsev-grafanista korniltsev-grafanista deleted the debuginfo-upload branch March 13, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants