Skip to content

[chore] repo: Sync fork with current upstream main#26

Merged
amir-jakoby merged 59 commits into
mainfrom
amiri/saw-6831-v0.148.0-replay
Mar 31, 2026
Merged

[chore] repo: Sync fork with current upstream main#26
amir-jakoby merged 59 commits into
mainfrom
amiri/saw-6831-v0.148.0-replay

Conversation

@amir-jakoby

@amir-jakoby amir-jakoby commented Mar 30, 2026

Copy link
Copy Markdown

repo: Sync fork with current upstream main

Merged the latest upstream main into the fork and kept the hotreloadprocessor lint surface green.

Description

  • synced the fork with current upstream main
  • resolved hotreloadprocessor lint regressions in the sync stack
  • reran the module tests and lint gate locally

Note

Medium Risk
Touches exporter resilience/queue configuration and introduces a new storage extension that is auto-used when compress_in_memory is enabled, which can affect runtime behavior and buffering. Hot reload and CI changes are mostly safety/robustness improvements with limited blast radius.

Overview
Restores loadbalancingexporter support for sending_queue.compress_in_memory by introducing an internal in-process storage backend and updating queue parsing/validation to treat sending_queue as opt-in (including explicit sending_queue.enabled) while auto-injecting an in-memory StorageID when compression-in-memory is requested.

Adds a new alpha extension/storage/inmemorystorage module (ephemeral shared storage) and registers it across repo metadata/config generators (builder config, distributions, versions, codecov components, issue templates, tidylist). Updates exporter docs/tests to reflect the new compatibility behavior and tightens goleak ignores for newer Windows syscalls.

Hardens hotreloadprocessor by rejecting multiple pipelines per signal, simplifying file-watcher construction, improving S3 iteration to continue past invalid configs, and ensuring decryptor resources are closed when supported; tests are updated to use t.Context().

Includes small CI/automation fixes: better codegen drift diagnostics in build-and-test.yml, fixes /workflow-approve command detection, stabilizes disk-space baseline reporting, and adjusts codeowner activity report generation to count owners via label mappings.

Written by Cursor Bugbot for commit 79c21e2. This will update automatically on new commits. Configure here.


Summary by cubic

Syncs our opentelemetry-collector-contrib fork to upstream v0.148.0, preserves Sawmills patches, restores sending_queue.compress_in_memory via a new inmemorystorage extension (auto‑used when enabled), and hardens hot‑reload and CI. Adds a small test flake guard for Windows DNS resolver leaks.

  • Dependencies

    • exporter/loadbalancing: queue disabled by default; parse sending_queue.enabled; require non‑none payload_compression when compression is set; auto‑inject in‑memory StorageID when compress_in_memory: true; tests/docs updated (ignore Windows DNS resolver goroutines in goleak); module deps tidied (add direct go.opentelemetry.io/collector/extension and local replace for extension/storage/inmemorystorage).
    • Added extension/storage/inmemorystorage (alpha, process‑local); implementation unexported; used automatically by the exporter; go.mod added.
    • processor/hotreloadprocessor: reject multiple pipelines per signal; close S3 decryptor when closable; continue past invalid S3 objects; simpler file watcher; tests use t.Context().
  • CI and Repo Automation

    • Fixed /workflow-approve command guard and set disk‑space baseline from the initial reading.
    • Stabilized code‑owner activity report by counting owners per component from label mappings.
    • Print codegen drift diagnostics (grouped git status/git diff) when generated code is stale.
    • Issue templates: added extension/storage/inmemorystorage to component dropdowns; changelog config: registered extension/storage/inmemorystorage in .chloggen components.

Linear: SAW-6831 upgrades the fork to v0.148.0 and preserves Sawmills patches; it unblocks downstream builder/config work in SAW-6833.

Written for commit 79c21e2. Summary will update on new commits.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added in-memory storage extension for ephemeral, process-memory-backed storage within the collector.
    • Load balancing exporter now supports in-memory queue compression with enhanced queue configuration handling.
  • Improvements

    • Hot reload processor now validates and enforces single pipeline per signal, preventing configuration conflicts.
  • Documentation

    • Updated documentation for load balancing exporter queue compression configuration and compatibility details.

Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex
Assisted-by: ChatGPT 5 Codex

@sawmills-staging sawmills-staging Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry Review

Advisory: 2 medium findings in changed files.

2 findings appear pre-existing in code touched by this diff.

Key findings:

  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:204 — hp.Logger.Info("Failed to apply config, trying next one") in iterateDayLevel fires when the newest S3 config object fails to apply and the processor falls back to an older one. This is a degraded-state signal: the intended config was rejected and the processor is silently rolling back. Info severity hides this from Warn-level log filters and error-rate dashboards.
  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:250 — hp.Logger.Info("Failed to iterate day level") in iterateAllDays fires when an entire day-prefix fails to iterate. The processor skips the whole day and tries the next one. This is a broader degraded-state signal than the per-object fallback: a full day's worth of configs is being bypassed. Info severity makes this invisible to Warn-level filters and alert rules.

These findings are advisory; address them before re-requesting review if applicable.

@sawmills-architect-review sawmills-architect-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — threads resolved, code is ready.

Three threads from coderabbitai/cursor on the inmemorystorage extension:

  • coderabbitai (test coverage gaps for Get/Set/Delete/Batch): Pre-existing gaps in the upstream extension, not regressions from this PR. Out of scope.
  • cursor (stability mismatch — metadata.yaml says alpha, factory.go uses StabilityLevelDevelopment): Valid — should be StabilityLevelAlpha. Log as a follow-up; doesn't block the SAW-6831 upgrade goal.
  • cursor (shared sync.Map in GetClient ignores kind/ID/name): Intentional for this in-memory test extension. All clients sharing one map is by design — the purpose is lightweight cross-component coordination in tests, not isolation.

All three resolved. Ready to merge.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: This PR involves a major upstream sync, the addition of a new internal storage extension, and significant logic changes to the loadbalancingexporter's queuing and compression mechanisms.

@sawmills-architect-review sawmills-architect-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ARCH-REVIEW] Re-review: ✅ APPROVE

Prior blocking issue resolved: client.go:Batch() now returns an error instead of panicking; TestClientBatchAfterCloseReturnsError confirms it. Prior concern resolved: test is end-to-end again (real testStorageHost + inmemorystorage extension, queue+codec path exercised with sink assertion). No unresolved threads.

Assisted-by: ChatGPT 5.2

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Syncing with upstream involves significant logic changes in loadbalancingexporter and hotreloadprocessor, plus a new storage extension. This requires human validation.

@sawmills-architect-review sawmills-architect-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ARCH-REVIEW] Re-review: APPROVE

Prior blocking issue resolved: client.go:41 panic("client already closed")return errors.New("client already closed"). Test TestClientBatchAfterCloseReturnsError validates the fix. All 7 threads resolved. Zero unresolved, zero new findings.

@sawmills-staging sawmills-staging Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry Review

Advisory: 1 high finding and 3 medium findings in changed files.

4 findings appear pre-existing in code touched by this diff.

Key findings:

  • [HIGH] [EXISTING] @ processor/hotreloadprocessor/processor.go:104 — In processor.go, applyConfigWithTelemetry emits three timestamp metrics (triggerNewestFileSuccessTimestamp, triggerNewestFileFailedTimestamp, triggerRollbackFileSuccessTimestamp) with per-recording labels key (an S3 object path containing a Unix timestamp, e.g. test-org-id/prefix-1/3000000000000/1234567890.yaml) and config_hash (a SHA-256 hex digest of the config YAML). Both values are effectively unbounded: key grows monotonically with each new config file written to S3, and config_hash changes with every config content change. Additionally, triggerReloadDuration also receives key as a label. triggerNewestFileFailedTimestamp further adds reason = err.Error(), which is a free-form error string. These labels are appended on top of the four fixed processorAttr labels, making the total label set: processor × configuration_file × destination × collector_version × key × config_hash × result × reload — a combinatorial explosion for any metric backend that indexes all label combinations.
  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:192 — iterateDayLevel logs at Info level for every S3 object whose applyConfig call fails, inside a per-object loop. On each refresh cycle, if K objects in the day-level prefix all fail validation, K Info lines are emitted. Because seenConfigs only suppresses re-fetching (not re-logging) for objects added in future cycles, and because the refresh ticker fires repeatedly, a sustained misconfiguration produces a continuous stream of per-object Info logs. Info is also the wrong severity for a degraded path where a config candidate was rejected — operators cannot distinguish this from a healthy informational event.
  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:237 — iterateAllDays logs at Info level for every day-prefix whose iterateDayLevel call returns an error, inside a per-day-prefix loop. The 'no valid config found' terminal error from iterateDayLevel is a normal outcome when a day-bucket has been exhausted, not an operator-actionable event. Combined with the inner loop in iterateDayLevel, a worst-case refresh cycle with D day-prefixes each containing N objects produces up to D×N Info log lines per tick of the refresh interval. The outer loop's Info log adds no triage value beyond what the inner loop already emits.
  • 1 additional finding in changed files.

These findings are advisory; address them before re-requesting review if applicable.

Assisted-by: ChatGPT 5.4

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Large sync with upstream that introduces a new storage extension and modifies core logic in the loadbalancingexporter and hotreloadprocessor.

@sawmills-architect-review sawmills-architect-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ARCH-REVIEW] Re-review: ✅ APPROVE

Prior blocking issue resolved: client.go:41 now returns errors.New("client already closed") instead of panicking. TestClientBatchAfterCloseReturnsError added to cover it. TestConsumeLogsWithQueueCompressionAndInMemoryStorage restored as a proper integration test using the new extension. 0 unresolved threads.

@sawmills-staging sawmills-staging Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry Review

Advisory: 4 medium findings in changed files.

4 findings appear pre-existing in code touched by this diff.

Key findings:

  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:204 — iterateDayLevel emits hp.Logger.Info("Failed to apply config, trying next one") for every S3 object whose applyConfig call fails, inside a loop that is re-executed on every refresh-interval tick. If a set of configs persistently fails (e.g. schema mismatch, validation error), each new unseen key will produce an Info log on every scan cycle. The seenConfigs map prevents re-fetching already-fetched keys but does not prevent re-logging for newly discovered failing keys. At Info level this floods the log stream with repetitive, low-triage-value entries.
  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:204 — In iterateDayLevel, a failed applyConfig call is logged at Info ('Failed to apply config, trying next one'). applyConfig failure is a service-owned event: the processor attempted to activate a config object from S3 and the activation failed. Logging this at Info means it is indistinguishable from normal scan progress in log-level filters and will not trigger Warn-level alerts. Operators cannot tell from dashboards or alert rules whether any config application failures occurred during a scan cycle.
  • [MEDIUM] [EXISTING] @ processor/hotreloadprocessor/s3.go:250 — In iterateAllDays, a failed iterateDayLevel call is logged at Info ('Failed to iterate day level'). This represents an entire day-bucket of configs being unusable — a broader failure than a single config object. Logging at Info hides this from Warn-level alert rules and log filters. Operators scanning for degraded config-reload behavior will miss it.
  • 1 additional finding in changed files.

These findings are advisory; address them before re-requesting review if applicable.

Assisted-by: ChatGPT 5.4

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: Changes default behavior in loadbalancingexporter, introduces a new storage extension, and modifies core logic in hotreloadprocessor; exceeds safe auto-approval scope.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: This PR introduces a new storage extension and modifies critical queueing/compression logic in the loadbalancing exporter, posing a medium risk to telemetry delivery.

Assisted-by: ChatGPT 5.4

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 5 files (changes from recent commits).

Requires human review: Large upstream sync with significant logic changes in loadbalancingexporter and a new storage extension. High impact on telemetry delivery and queueing mechanisms.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: Large upstream sync involving semantic changes to loadbalancingexporter configuration and hotreloadprocessor logic, which can affect production runtime behavior.

Assisted-by: ChatGPT 5.4

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Requires human review: This PR introduces a new storage extension, modifies core business logic in the loadbalancing exporter and hotreload processor, and updates CI workflows. These require human review.

@amir-jakoby amir-jakoby merged commit 8e1c59a into main Mar 31, 2026
201 checks passed
@amir-jakoby amir-jakoby deleted the amiri/saw-6831-v0.148.0-replay branch March 31, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant