Skip to content

feat(chproxy): refactor OTEL telemetry#2959

Merged
imeyer merged 1 commit intomainfrom
refactor-otel-telemetry
Mar 13, 2025
Merged

feat(chproxy): refactor OTEL telemetry#2959
imeyer merged 1 commit intomainfrom
refactor-otel-telemetry

Conversation

@imeyer
Copy link
Contributor

@imeyer imeyer commented Mar 13, 2025

  • Upgrade Go from 1.23 to 1.24
  • Separate files (batch.go, buffer.go) for easier understanding
  • Implement shared HTTP client with connection pooling
  • Enhance telemetry with request counter and atomic buffer operations
  • Add compression for log exports with minimum severity filtering
  • Improve graceful shutdown handling with in-flight request tracking
  • Increase default flush interval from 3s to 5s
  • Bump service version from 1.1.0 to 1.2.0

Summary by CodeRabbit

  • New Features

    • Upgraded the runtime environment and build system for enhanced performance and reliability.
    • Introduced an improved data buffering mechanism for more efficient request processing.
    • Added configurable options with extended flush intervals and optional debug logging.
    • Enhanced telemetry monitoring with new metrics and optimized log handling.
    • Service version updated to 1.2.0.
  • Chores

    • Streamlined request management and error logging for smoother operations and shutdowns.

- Upgrade Go from 1.23 to 1.24
- Separate files (batch.go, buffer.go) for easier understanding
- Implement shared HTTP client with connection pooling
- Enhance telemetry with request counter and atomic buffer operations
- Add compression for log exports with minimum severity filtering
- Improve graceful shutdown handling with in-flight request tracking
- Increase default flush interval from 3s to 5s
- Bump service version from 1.1.0 to 1.2.0

Signed-off-by: Ian Meyer (imeyer) <k@imeyer.io>
@changeset-bot
Copy link

changeset-bot bot commented Mar 13, 2025

⚠️ No Changeset found

Latest commit: ef0a738

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Mar 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 13, 2025 7:51am
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 13, 2025 7:51am
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 13, 2025 7:51am
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 13, 2025 7:51am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2025

📝 Walkthrough

Walkthrough

This pull request updates the chproxy service with several changes. The Dockerfile now uses a newer Golang base image and simplifies the build process. New files introduce batch processing capabilities with a Batch struct, a persist function, and a buffer processor that manages flushing based on time or size. The configuration is enhanced with a new debug logging flag, a longer flush interval, and a version bump. The main file now leverages the new buffering logic and incorporates a shared HTTP client and synchronization. Telemetry is also improved with atomic operations and an added request counter.

Changes

File(s) Change Summary
apps/chproxy/Dockerfile Updated to use golang:1.24-alpine instead of golang:1.23-alpine, removed copying of go.sum and go mod download steps, and adjusted CMD formatting.
apps/chproxy/batch.go
apps/chproxy/buffer.go
Introduced new batch processing logic. In batch.go, a Batch struct and persist function are added to handle sending batches via HTTP POST. In buffer.go, the startBufferProcessor function implements buffering, periodic flushing using a ticker, context cancellation handling, and error logging.
apps/chproxy/config.go Added a new LogDebug boolean field with conditional initialization via an environment variable, updated FlushInterval from 3s to 5s, and incremented ServiceVersion from 1.1.0 to 1.2.0.
apps/chproxy/go.mod Added new dependency: go.opentelemetry.io/contrib/processors/minsev v0.8.0.
apps/chproxy/main.go Removed the legacy persist function and Batch struct, integrated the new buffering logic via startBufferProcessor, added a shared HTTP client with defined timeout settings, introduced an inFlight sync.WaitGroup, and enhanced error handling and logging with detailed telemetry span attributes.
apps/chproxy/otel.go Enhanced telemetry configuration with thread-safe atomic operations for buffer size, added a new RequestCounter metric, updated log exporter configuration to use Gzip compression, and restructured the log processor to incorporate batch processing with improved error handling in metric initialization.

Sequence Diagram(s)

sequenceDiagram
    participant M as Main
    participant BP as BufferProcessor
    participant P as Persist
    participant CH as Clickhouse
    participant T as Telemetry

    M->>BP: Initialize and start buffering process
    BP->>BP: Collect incoming batch data
    BP->>T: Start telemetry span on flush trigger
    BP->>P: Invoke persist for batch flush
    P->>CH: Build and execute HTTP POST request with batch data
    CH-->>P: Return HTTP response (success/error)
    P-->>T: Update telemetry metrics and log results
    P-->>BP: Return status of flush operation
Loading

Possibly related PRs

Suggested reviewers

  • mcstepp
  • perkinsjr
  • MichaelUnkey
  • chronark

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 golangci-lint (1.62.2)

Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.0)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.0)

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2025

Thank you for following the naming conventions for pull request titles! 🙏

@vercel vercel bot temporarily deployed to Preview – www March 13, 2025 07:50 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard March 13, 2025 07:50 Inactive
@vercel vercel bot temporarily deployed to Preview – play March 13, 2025 07:50 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering March 13, 2025 07:51 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (18)
apps/chproxy/Dockerfile (3)

7-7: Consider adding build flags.
To optimize binary size and remove extra symbols, consider using flags like -trimpath or -ldflags="-s -w".

-RUN go build -o bin/chproxy
+RUN go build -ldflags="-s -w" -trimpath -o bin/chproxy

9-9: Evaluate smaller final image.
Using golang:1.24-alpine as the final image is convenient, but a minimal base image could further reduce the overall container size if you don't need Go tooling at runtime.


15-15: Confirm Docker command usage.
CMD ["/usr/local/bin/chproxy"] is fine, but consider using ENTRYPOINT if you expect to pass additional parameters or want a more robust container invocation pattern.

apps/chproxy/config.go (2)

11-11: Check naming consistency for debug logging field.
The new LogDebug field is clear. Ensure naming stays consistent throughout the codebase (e.g., EnableDebugLogs or LogDebug) for better clarity.


45-47: Allow dynamic debug logging via environment variable.
Enabling LogDebug based on OTEL_EXPORTER_LOG_DEBUG is helpful. Consider logging a message when debug logging is turned on, to aid in diagnosing environment-based behavior.

 if debug := os.Getenv("OTEL_EXPORTER_LOG_DEBUG"); debug == "true" {
     config.LogDebug = true
+    config.Logger.Info("Debug logging enabled via OTEL_EXPORTER_LOG_DEBUG")
 }
apps/chproxy/buffer.go (2)

11-19: Clarify function documentation and return value.
startBufferProcessor returns a channel that signals completion. This is excellent for graceful shutdown. Clarify in the doc comment how callers should handle the returned channel.


81-127: Ensure readiness for large bursts of incoming batches.
The infinite for loop reads from the buffer channel. For high-volume scenarios, confirm that:

  1. Max buffer size is never exceeded due to concurrency.
  2. Logging is not too frequent and doesn't slow down throughput.
  3. Potential backpressure mechanisms exist if buffer is unbounded.

Overall, the logic properly enforces MaxBatchSize flushes, but consider robust testing under load.

Do you want help setting up a performance test script to verify the system under load?

apps/chproxy/batch.go (5)

1-13: Use package-level documentation.
Consider adding a brief package-level comment explaining the purpose of the batch.go file and how it integrates with the rest of the system. This improves overall readability and maintainability.


15-18: Consider validation for Batch fields.
While Rows may be validated within persist, you could also add checks or constraints around Params (e.g., ensuring mandatory parameters exist) to improve robustness.


28-29: Expose metrics usage in tests.
The counter increments are correct; however, consider verifying them in unit or integration tests to ensure proper metric tracking.


54-65: Remind users to store credentials securely.
Storing credentials directly in the URL or environment variables is typical, but remind users to maintain them securely. Restrict logs from containing sensitive data.


78-101: Validate ClickHouse responses thoroughly.
The non-OK response handling is good. If the target server returns additional info in headers, consider logging them to aid debugging.

apps/chproxy/otel.go (2)

31-36: Added RequestCounter metric.
Tracking HTTP requests is a valuable metric. Ensure it’s consistently incremented across all endpoints (not just inserts) if that fits your telemetry needs.


151-155: New HTTP request counter metric.
This naming convention clickhouse_http_requests_total is consistent. Ensure the label or description references the service’s scope if it might be ambiguous.

apps/chproxy/main.go (4)

27-29: Global variables usage.
While this is fine for simplicity, consider grouping them into a struct or using init() functions for improved modularity and testability.


101-101: Incrementing request counter.
Ensures each HTTP request is tracked. Consider incrementing in middlewares if more endpoints are added.


126-127: Logging invalid queries.
Logging query info allows debugging. Confirm that PII or sensitive data isn’t logged inadvertently.


152-159: Constructing and enqueueing the Batch.
Provides a clear approach to batch insertion. The channel-based buffer approach is easy to follow, but watch for potential blocking if buffer is full.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d6f3fc and ef0a738.

⛔ Files ignored due to path filters (1)
  • apps/chproxy/go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • apps/chproxy/Dockerfile (1 hunks)
  • apps/chproxy/batch.go (1 hunks)
  • apps/chproxy/buffer.go (1 hunks)
  • apps/chproxy/config.go (3 hunks)
  • apps/chproxy/go.mod (1 hunks)
  • apps/chproxy/main.go (7 hunks)
  • apps/chproxy/otel.go (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Test Packages / Test ./packages/api
  • GitHub Check: Test Packages / Test ./packages/nextjs
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test Packages / Test ./packages/hono
  • GitHub Check: Test Packages / Test ./packages/cache
  • GitHub Check: Test Packages / Test ./packages/rbac
  • GitHub Check: Test Packages / Test ./internal/clickhouse
  • GitHub Check: Test Packages / Test ./internal/keys
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Build / Build
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (25)
apps/chproxy/go.mod (1)

7-7:

Details

✅ Verification successful

Consider verifying the version stability of the new dependency.
You've introduced go.opentelemetry.io/contrib/processors/minsev v0.8.0. Ensure this version is suitable for your environment and confirm that there are no pending security advisories.

Run the following script to check for any known issues or newer releases:


🏁 Script executed:

#!/bin/bash
# Verify usage of minsev v0.8.0
go list -m -json go.opentelemetry.io/contrib/processors/minsev@v0.8.0

Length of output: 562


Version Stability Verified for minsev Dependency
The output from the go list -m -json command confirms that version v0.8.0 of go.opentelemetry.io/contrib/processors/minsev is correctly resolved and appears to be a stable release with a recent timestamp. There are no indications of pending security advisories or issues. No further action is required for this dependency.

apps/chproxy/Dockerfile (1)

1-1: Verify the Docker base image update.
You've upgraded to golang:1.24-alpine. Confirm that this version aligns with your dependencies and that you won't run into unexpected backward-compatibility issues or Alpine-specific constraints.

apps/chproxy/config.go (1)

26-27: Review updated defaults.
You're now setting:

  • LogDebug: false by default
  • FlushInterval to 5 seconds (increased from 3)
  • ServiceVersion to "1.2.0"

Verify that the new flush interval aligns with performance/throughput expectations, and confirm that versioning changes correctly reflect backward-incompatible updates.

Also applies to: 32-32

apps/chproxy/buffer.go (2)

1-9: Imports and package structure look good.
This file neatly organizes context and telemetry dependencies. Just ensure that any unused or redundant imports are removed to keep the file clean.


21-79: Check concurrency safety in flushAndReset.
Your approach of using a closure (flushAndReset) within the goroutine is convenient. However, confirm that no data races occur around shared state like buffered or batchesByParams. The current design appears consistent, but concurrency can be tricky if future modifications introduce parallel calls.

apps/chproxy/batch.go (4)

20-26: Early return logic is good.
The straightforward check and return when batch.Rows is empty prevents unnecessary overhead. This is clean and efficient.


36-44: Ensure config parsing is robust.
If config.ClickhouseURL is invalid or empty, the error handling is correct. Just confirm that any fallback or default behavior is accounted for if the URL is absent in configuration.


46-52: Request creation is straightforward.
Creation of the POST request and usage of strings.Join for batch.Rows is correct. No further changes recommended.


103-113: Success logging is clear.
Logging the persisted row count and adding relevant telemetry attributes improves traceability. No changes needed here.

apps/chproxy/otel.go (8)

7-7: Atomic usage is helpful for concurrency.
Importing sync/atomic is appropriate for thread-safe buffer size updates.


11-11: Min-severity log processor is beneficial.
Including minsev helps filter out unnecessary logs at runtime, improving clarity of logs in production.


46-46: Thread-safe setter is correct.
Storing the buffer size via atomic.StoreInt64 prevents data races. Good job.


51-51: Thread-safe getter is correct.
Using atomic.LoadInt64 is a straightforward choice to avoid concurrency issues.


86-89: Compression for logs is beneficial.
Using Gzip compression can reduce bandwidth usage. Validate whether your logging infrastructure supports decompressing them gracefully.


94-100: Dynamic severity configuration.
Switching between SeverityInfo and SeverityDebug based on config.LogDebug is a straightforward approach to fine-tune log verbosity.


127-129: Essential metric creation checks.
Defining separate error variables and combining them in one loop is a clean approach to identify metric initialization failures.


157-157: Consolidated error checking.
Iterating over all creation errors in a single loop simplifies error handling. This is a neat pattern.

apps/chproxy/main.go (8)

14-14: Synchronized concurrency.
Importing sync ensures proper concurrency handling with WaitGroup. This preps the code nicely for graceful shutdown.


38-45: HTTP client connection pooling.
This shared httpClient with tuned timeouts and idle connections is a great performance improvement.


78-79: Asynchronous buffer processing.
Calling startBufferProcessor in a separate goroutine clarifies separation of concerns between request handling and batch persistence.


85-89: Liveness check instrumentation.
Tracking method and path in the span attributes is beneficial for debugging and analytics.


91-91: OK status code annotation.
Using span.SetStatus(codes.Ok, "") is consistent with OTEL best practices.


124-125: Error metric incrementation.
The call to telemetry.Metrics.ErrorCounter is consistent with the established pattern. Good job.


162-167: Additional span details upon success.
Recording row count and table name in OTEL helps with debugging. This is a good practice.


192-196: Wait for in-flight requests before shutdown.
Ensuring all requests complete before final shutdown is crucial for clean termination. Well implemented.

@imeyer imeyer added this pull request to the merge queue Mar 13, 2025
Merged via the queue into main with commit 0e8f9bd Mar 13, 2025
29 of 31 checks passed
@imeyer imeyer deleted the refactor-otel-telemetry branch March 13, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants