Skip to content

feat: use otel#2901

Merged
chronark merged 7 commits intomainfrom
otel
Feb 26, 2025
Merged

feat: use otel#2901
chronark merged 7 commits intomainfrom
otel

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Feb 20, 2025

Summary by CodeRabbit

  • New Features

    • Integrated enhanced observability with OpenTelemetry, including Grafana metrics for improved system monitoring.
    • Added cache performance metrics to track latency, hit/miss ratios, and evictions for better operational insight.
    • Enhanced shutdown handling and cluster membership configuration to improve service resiliency and node management.
  • Refactor

    • Refined configuration and documentation for clearer, more descriptive settings.
    • Updated tracing implementation and dependency management to support the new observability features.

@vercel
Copy link

vercel bot commented Feb 20, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 26, 2025 3:11pm
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 26, 2025 3:11pm
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 26, 2025 3:11pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 26, 2025 3:11pm

@changeset-bot
Copy link

changeset-bot bot commented Feb 20, 2025

⚠️ No Changeset found

Latest commit: 96eb33b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 20, 2025

📝 Walkthrough

Walkthrough

The changes update configuration documentation, refine shutdown and initialization flows, and improve observability. The API server now integrates OpenTelemetry instrumentation with Grafana initialization and a refined shutdown mechanism using a dedicated shutdown function type. Several tracing imports have been shifted from legacy paths to new OpenTelemetry-based paths, and membership configuration is enhanced with renamed fields and additional methods. New metrics collection has been added to the cache, and utilities for free port allocation and structured shutdown have been introduced. The environment variable management package has been removed, and dependency definitions in go.mod have been adjusted accordingly.

Changes

File(s) Change Summary
go/cmd/api/config.go Updated struct field descriptions for clarity; added new Otel struct for observability configuration.
go/cmd/api/main.go Integrated OpenTelemetry and Grafana initialization; updated shutdown function type from func(ctx) error to shutdown.ShutdownFn.
go/cmd/healthcheck/main.go Added // nolint:gosec comment before HTTP GET request to suppress lint warnings.
go/go.mod Removed deprecated dependencies and added OpenTelemetry-related packages.
Multiple files:
go/internal/services/ratelimit/peers.go,
.../replay.go,
.../sliding_window.go,
go/pkg/cache/middleware/tracing.go,
go/pkg/circuitbreaker/lib.go,
go/pkg/events/topic.go,
go/pkg/zen/middleware_tracing.go
Updated tracing import paths to new OpenTelemetry locations and simplified span naming in some cases.
Multiple files:
go/pkg/cache/cache.go,
go/pkg/otel/metrics/metrics.go,
go/pkg/otel/grafana.go,
go/pkg/otel/tracing/trace.go,
go/pkg/otel/schema.go
Introduced cache metrics collection, new Grafana configuration, metrics package for OpenTelemetry, and new tracing functions; package renaming from tracing to otel where applicable.
Multiple files:
go/pkg/membership/interface.go,
go/pkg/membership/memberlist.go,
go/pkg/membership/membership_test.go,
go/pkg/membership/bus.go,
go/pkg/membership/logger.go,
go/pkg/membership/member.go
Updated membership configuration: added Addr() methods, renamed fields (e.g., NodeId to NodeID, SerfAddr to Addr), and improved documentation/comments.
go/pkg/config/json_test.go Updated import path for the config package.
go/pkg/discovery/static.go Changed the Discover method receiver from pointer to value.
Multiple files:
go/pkg/env/env.go,
go/pkg/env/env_test.go
Removed environment variable management package and its tests.
Multiple files:
go/pkg/port/free.go,
go/pkg/shutdown/shutdown.go
Added a new free port utility and a new shutdown function type (ShutdownFn).
go/pkg/tracing/axiom.go Removed Axiom tracer initialization code.

Sequence Diagram(s)

sequenceDiagram
    participant API as API Server
    participant Config as Config Loader
    participant Grafana as Grafana Init
    participant Shutdown as Shutdown Manager
    API->>Config: Load configuration (with Otel settings)
    alt Otel config available
        API->>Grafana: Initialize Grafana and OTLP exporters
        Grafana-->>API: Return shutdown functions
    else No Otel config
        API->>API: Skip observability setup
    end
    API->>Shutdown: Append shutdown functions and invoke gracefulShutdown()
Loading
sequenceDiagram
    participant Cache as Cache Instance
    participant Metrics as OpenTelemetry Metrics
    Cache->>Cache: Start collectMetrics goroutine (every 5s)
    Cache->>Metrics: Record cache metrics (size, hits, misses, etc.)
    Metrics-->>Cache: Acknowledge metric updates
Loading

Tip

CodeRabbit's docstrings feature is now available as part of our Pro Plan! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 20, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (10)
go/pkg/otel/tracing/trace.go (2)

17-19: Add documentation comment for exported function

Following Go conventions, exported functions should have documentation comments explaining their purpose, parameters, and return values.

+// SetGlobalTraceProvider sets the global tracer provider to the provided implementation.
+// This allows switching from the default no-op tracer to a real tracer implementation.
 func SetGlobalTraceProvider(t trace.TracerProvider) {
 	globalTracer = t
 }

27-32: Add documentation comment for exported function

Following Go conventions, exported functions should have documentation comments explaining their purpose, parameters, and return values.

+// RecordError records an error on the given span by setting its status to Error
+// with the error message. If the error is nil, this function does nothing.
 func RecordError(span trace.Span, err error) {
 	if err == nil {
 		return
 	}
 	span.SetStatus(codes.Error, err.Error())
 }
go/pkg/port/free.go (2)

31-38: Consider avoiding panic in public API methods.

While the implementation works, using panic in public API methods can make error handling more difficult for callers. Consider having all methods return errors explicitly.

-func (f *FreePort) Get() int {
-	port, err := f.GetWithError()
-	if err != nil {
-		panic(err)
-	}
-
-	return port
+func (f *FreePort) Get() (int, error) {
+	return f.GetWithError()
}

40-66: Consider addressing race condition in port allocation.

There's an inherent race condition between checking if a port is free and actually binding to it. While this is common in port allocation utilities, consider adding a comment acknowledging this limitation.

-// Get returns a free port.
+// GetWithError returns a free port.
+// Note: There's an inherent race condition between checking if a port is free
+// and actually binding to it in your application. This method makes a best effort
+// to find an available port, but cannot guarantee it will remain available.
 func (f *FreePort) GetWithError() (int, error) {
go/pkg/cache/cache.go (2)

97-117: Graceful termination of the ticker is recommended.
While collectMetrics() is effective for periodic collection, consider stopping the Ticker in a defer block or at shutdown to avoid dangling goroutines.

 func (c *cache[K, V]) collectMetrics() {
   t := time.NewTicker(time.Second * 5)
+  defer t.Stop()
   for range t.C {
       ...
   }
 }

150-160: Consider measuring read latency using a time duration.
Relying on UnixMilli() is workable but can lose sub-millisecond precision. time.Since(t0) might provide finer granularity.

- metrics.Cache.ReadLatency.Record(ctx, t1.UnixMilli()-t0.UnixMilli(), ...)
+ durationMs := time.Since(t0).Milliseconds()
+ metrics.Cache.ReadLatency.Record(ctx, durationMs, ...)
go/pkg/otel/grafana.go (1)

32-72: Consider contextual shutdown for trace and metric providers.
These providers run until explicitly shut down. Ensure the returned shutdowns are called, and consider whether a context-based approach might better integrate with app-level cancellation.

go/pkg/otel/metrics/metrics.go (1)

111-181: Minor naming consistency could improve clarity.
Metric names (e.g., "http_request" vs. "cache_hit") are fine but might be more consistent as plural or with a shared namespace prefix. Overall, the initialization logic is solid.

go/cmd/api/main.go (1)

116-127: Ensure fallback logging on Grafana initialization failure.
The code correctly returns an error if otel.InitGrafana fails, but it might be beneficial to log the error before returning to aid future debugging.

		shutdownOtel, grafanaErr := otel.InitGrafana(ctx, otel.Config{
			GrafanaEndpoint: cfg.Otel.OtlpEndpoint,
			Application:     "api",
			Version:         version.Version,
		})
		if grafanaErr != nil {
+			logger.Error(ctx, "Grafana initialization failed", slog.Any("error", grafanaErr))
			return fmt.Errorf("unable to init grafana: %w", grafanaErr)
		}
go/pkg/membership/membership_test.go (1)

38-55: TestJoin2Nodes initialization logic.
Starting the first node with an empty static address array, then verifying membership, is straightforward. The second node then dynamically discovers and joins the cluster. The membership checks are thorough, although consider adding explicit logs or potential error details for more test clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b043686 and 32c623a.

⛔ Files ignored due to path filters (1)
  • go/go.sum is excluded by !**/*.sum
📒 Files selected for processing (30)
  • go/cmd/api/config.go (1 hunks)
  • go/cmd/api/main.go (5 hunks)
  • go/cmd/healthcheck/main.go (1 hunks)
  • go/go.mod (3 hunks)
  • go/internal/services/ratelimit/peers.go (1 hunks)
  • go/internal/services/ratelimit/replay.go (1 hunks)
  • go/internal/services/ratelimit/sliding_window.go (1 hunks)
  • go/pkg/cache/cache.go (5 hunks)
  • go/pkg/cache/middleware/tracing.go (1 hunks)
  • go/pkg/circuitbreaker/lib.go (4 hunks)
  • go/pkg/config/json_test.go (1 hunks)
  • go/pkg/discovery/static.go (1 hunks)
  • go/pkg/env/env.go (0 hunks)
  • go/pkg/env/env_test.go (0 hunks)
  • go/pkg/events/topic.go (2 hunks)
  • go/pkg/membership/bus.go (2 hunks)
  • go/pkg/membership/interface.go (2 hunks)
  • go/pkg/membership/logger.go (1 hunks)
  • go/pkg/membership/member.go (1 hunks)
  • go/pkg/membership/memberlist.go (3 hunks)
  • go/pkg/membership/membership_test.go (7 hunks)
  • go/pkg/otel/grafana.go (1 hunks)
  • go/pkg/otel/metrics/metrics.go (1 hunks)
  • go/pkg/otel/schema.go (1 hunks)
  • go/pkg/otel/tracing/trace.go (2 hunks)
  • go/pkg/otel/util.go (1 hunks)
  • go/pkg/port/free.go (1 hunks)
  • go/pkg/shutdown/shutdown.go (1 hunks)
  • go/pkg/tracing/axiom.go (0 hunks)
  • go/pkg/zen/middleware_tracing.go (1 hunks)
💤 Files with no reviewable changes (3)
  • go/pkg/tracing/axiom.go
  • go/pkg/env/env_test.go
  • go/pkg/env/env.go
✅ Files skipped from review due to trivial changes (12)
  • go/pkg/membership/logger.go
  • go/cmd/healthcheck/main.go
  • go/pkg/otel/schema.go
  • go/internal/services/ratelimit/sliding_window.go
  • go/pkg/shutdown/shutdown.go
  • go/pkg/otel/util.go
  • go/pkg/membership/bus.go
  • go/pkg/cache/middleware/tracing.go
  • go/internal/services/ratelimit/replay.go
  • go/pkg/config/json_test.go
  • go/internal/services/ratelimit/peers.go
  • go/pkg/membership/member.go
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Test Go API Local / test_agent_local
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Build / Build
🔇 Additional comments (42)
go/pkg/discovery/static.go (1)

9-9: Consider updating the type assertion to match the method signature

You've changed the Discover method from a pointer receiver to a value receiver, but the type assertion on line 7 still uses a pointer type (*Static)(nil). This could lead to confusion since the assertion suggests the Discoverer interface expects a pointer receiver implementation.

Consider either:

  1. Updating the type assertion to var _ Discoverer = Static{}, or
  2. Reverting to a pointer receiver if that was the original intent

Note that changing from pointer to value receiver is also a change in behavior - the method now operates on a copy of the struct rather than a reference.

go/pkg/zen/middleware_tracing.go (1)

3-3: LGTM - Import path updated for OpenTelemetry integration

The import path change aligns with the PR objective of migrating to OpenTelemetry for tracing.

go/pkg/membership/interface.go (2)

6-8: LGTM - Documentation improvement

The enhanced documentation clarifies the implementation details, which improves code readability and understanding.


47-48: LGTM - Addition of Addr() method

Adding this method to expose the node's advertised address is a valuable addition for diagnostics and routing purposes.

go/pkg/membership/memberlist.go (3)

19-28: Good documentation improvements for Config struct.

The added comments for the Config struct and its fields provide clear explanations of each component's purpose, which improves code readability and maintainability.


45-48: Good function documentation added.

The detailed comments for the New function clearly explain its purpose, initialization process, and return values.


156-158: Good addition of the Addr method.

This accessor method provides a clean way to retrieve the node's address. The implementation is straightforward and follows good design practices.

go/pkg/events/topic.go (2)

8-8: Import path updated for OpenTelemetry integration.

The change updates the tracing package import path to use the OpenTelemetry-based implementation, which is consistent with the PR objective of using OpenTelemetry (otel).


55-55: Simplified span creation with inline declaration.

The span creation has been simplified by using an inline declaration, improving code conciseness while maintaining the same functionality.

go/pkg/circuitbreaker/lib.go (2)

12-12: Import path updated for OpenTelemetry integration.

The change updates the tracing package import path to use the OpenTelemetry-based implementation, aligned with the PR objective.


151-152: Simplified span naming with consistent format.

The span creation has been improved to use a more consistent naming format with fmt.Sprintf, which enhances readability and improves trace context.

Also applies to: 159-161, 171-172, 208-209

go/pkg/port/free.go (2)

10-30: Good implementation of a free port utility.

The FreePort struct and its constructor are well-designed with reasonable defaults and thread-safety considerations.


47-49: Good use of comments to explain linter suppression.

The comment explaining why the gosec linter is suppressed is helpful for maintainers. This is a good practice.

go/pkg/cache/cache.go (4)

16-18: No major concerns with these telemetry imports.
The added imports align with the new metrics approach and look appropriate to the usage in this file.


68-68: Verify that stale duration correctly matches the overall cache lifecycle.
Using config.Stale as TTL is good, but ensure it’s always greater or equal to the "fresh" duration and that all callers expect items to be evicted at this time.


120-120: Refactoring the retrieval logic into a private get() method is neat.
Centralizing retrieval and metric recording helps maintainable code.


266-266: Good consistency in calling the new get() method within SWR.
It unifies read checks, metric recording, and code reuse.

go/pkg/otel/grafana.go (1)

1-30: Struct-based configuration looks well-defined.
Fields like GrafanaEndpoint, Application, and Version are clear for initialization.

go/pkg/otel/metrics/metrics.go (1)

1-4: Package declaration and doc comments look good.
No issues noted; the default no-op approach is a safe fallback.

go/cmd/api/main.go (4)

28-29: Imports look good.
These added dependencies for OpenTelemetry and the new shutdown package are clear and consistent with the rest of the file.


59-59: Consistent use of typed shutdown function.
Initializing shutdowns as []shutdown.ShutdownFn{} aligns well with the updated gracefulShutdown signature. Good improvement for maintainability.


206-206: Graceful shutdown signature updated.
Adopting the shutdown.ShutdownFn type promotes standardized and robust shutdown handling.


229-233: setupCluster’s typed shutdown promotes better clarity.
Returning a typed slice of shutdown.ShutdownFn clarifies the shutdown responsibilities.

go/pkg/membership/membership_test.go (9)

10-13: Streamlined imports.
The updated import lines for discovery, logging, membership, and port look appropriate for the new membership flow.


23-26: NodeID, Addr, and GossipPort are clearly defined.
Good job setting each field; the clarity in these fields aids debugging.


31-35: Second node mirrors node 1’s configuration style.
Consistent field assignments ensure test uniformity.


56-61: Verifying second node membership.
Using require.Eventually is good practice for asynchronous membership. The code is correct, but consider logging membership states if debugging is needed.


96-99: Loop-based NodeID assignments in joined cluster.
Each node is consistently assigned a unique NodeID and port combination. This organizes the multi-node tests well.


112-112: Join event map keyed by address.
Tracing join events through the Addr field clarifies cluster membership changes. This is a logical shift from referencing NodeIDs.


117-122: Dynamic peerAddrs for progressive node starts.
Building up peerAddrs as each node starts is a clean approach, ensuring each newly started node sees all peers.


182-186: runMany: consistent membership config.
The approach here is nearly identical to TestJoin2Nodes. This consistency in usage fosters maintainable, predictable tests.


191-196: runMany: sequential starts and dynamic peer address updates.
This process parallels the earlier method, reinforcing a stable and tested pattern for cluster membership.

go/go.mod (3)

28-35: Added OpenTelemetry dependencies.
These are crucial for tracing and metrics instrumentation. Be sure to track versions carefully for any known stability or compatibility issues.

If you’d like, I can generate a script to cross-check known CVEs for these modules.


51-51: Reintroduced axiom-go as an indirect dependency.
Ensure that no legacy references to the removed version remain.


90-90: Reinstated github.com/google/uuid as an indirect dependency.
Confirm that all direct references are removed if this is purely for a transitive dependency.

go/cmd/api/config.go (7)

4-6: Enhanced field descriptions improve configuration clarity.

The improved descriptions for Platform and Image fields now provide specific examples and clearer context, making the configuration more self-documenting.


10-12: Heartbeat configuration documentation clarified.

The updated description for the URL field now explicitly indicates it should be a complete endpoint URL, reducing potential configuration errors.


15-22: Cluster configuration documentation significantly improved.

The enhanced descriptions for NodeID, AdvertiseAddr components, RpcPort, and GossipPort provide much clearer guidance on their purposes and requirements, which will help with proper cluster configuration.


24-29: Discovery configuration documentation enhanced.

The improved descriptions for both static and Redis-based discovery methods make their intended usage more explicit and clear.


33-37: Logs and Clickhouse configuration documentation clarified.

The enhanced descriptions provide better guidance on these configuration options.


40-42: Database configuration documentation improved.

The updated descriptions now clearly distinguish between the purposes of primary and readonly replica connections, which is important for proper database configuration.


44-46: New OpenTelemetry configuration section added.

This new configuration section properly implements the PR objective to integrate OpenTelemetry. The field description clearly indicates its purpose for connecting to an OTLP endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant