Skip to content

fix: improve router.http.request.duration_milliseconds#2564

Open
alepane21 wants to merge 27 commits intomainfrom
ale/eng-8915-router-router_http_request_duration-and-other-request
Open

fix: improve router.http.request.duration_milliseconds#2564
alepane21 wants to merge 27 commits intomainfrom
ale/eng-8915-router-router_http_request_duration-and-other-request

Conversation

@alepane21
Copy link
Copy Markdown
Contributor

@alepane21 alepane21 commented Feb 26, 2026

When a query plan makes parallel fetches, the metric router.http.request.duration_milliseconds is recorded incorrectly: it reported as latency the time of the slower fetch. With this change I set as duration the timing that I get from the engine for the fetch, that doesn't have this issue.

Summary by CodeRabbit

  • Tests

    • Added an integration test ensuring parallel subgraph request durations are reported to Prometheus.
  • Bug Fixes

    • Improved latency measurement: logging, telemetry and metrics now use fetch timing for more accurate duration reporting.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c901ce and 3965823.

📒 Files selected for processing (1)
  • router/core/engine_loader_hooks.go

Walkthrough

Adds an integration test for Prometheus metrics of parallel subgraph durations and refactors engine loader hooks to use fetch-timing from context instead of a stored start time for latency reporting.

Changes

Cohort / File(s) Summary
Prometheus Metrics Test
router-tests/prometheus_parallel_subgraph_metrics_test.go
New integration test TestPrometheusParallelSubgraphRequestDurationMetrics that exercises parallel subgraph requests, collects Prometheus metrics, and validates per-subgraph router_http_request_duration_milliseconds histogram samples and counts for employees and products.
Engine Loader Hooks — Latency Refactor
router/core/engine_loader_hooks.go
Removed engineLoaderHooksRequestContext and start-time tracking in OnLoad; OnFinished now reads fetch timing from FetchTimingKey (fetchLatency) and uses it for access logging, telemetry, and latency metrics instead of computing latency from stored start time. Minor control-flow and logging/metric updates to propagate fetchLatency.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: improving the router.http.request.duration_milliseconds metric to correctly capture parallel fetch timings using engine-provided fetch latency instead of incorrectly using the slower fetch duration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 26, 2026

❌ Internal Query Planner CI failed to run.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.52%. Comparing base (d376585) to head (f185aab).
⚠️ Report is 40 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2564       +/-   ##
===========================================
+ Coverage   45.74%   63.52%   +17.78%     
===========================================
  Files        1035      251      -784     
  Lines      139075    26759   -112316     
  Branches     8631        0     -8631     
===========================================
- Hits        63613    16998    -46615     
+ Misses      73735     8400    -65335     
+ Partials     1727     1361      -366     
Files with missing lines Coverage Δ
router/core/engine_loader_hooks.go 93.03% <100.00%> (+1.06%) ⬆️

... and 789 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
router/core/engine_loader_hooks.go (1)

55-57: Remove unused startTime field and engineLoaderHooksRequestContext struct—both are dead code from the refactoring to use FetchTimingKey.

The startTime field is declared but never populated (line 117 creates an empty struct) or accessed. More significantly, the entire engineLoaderHooksRequestContext struct is unused—it's stored in the context at line 117 but never retrieved anywhere in the codebase. The latency measurement now relies entirely on FetchTimingKey (lines 102–103 and 167–173), making this struct and its associated context key EngineLoaderHooksContextKey redundant.

♻️ Proposed cleanup

Remove the struct and its context key registration:

-type engineLoaderHooksRequestContext struct {
-	startTime time.Time
-}

And update line 117:

-	return context.WithValue(ctx, rcontext.EngineLoaderHooksContextKey, &engineLoaderHooksRequestContext{})
+	return ctx

Also remove the EngineLoaderHooksContextKey from router/internal/context/keys.go if it has no other usages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/core/engine_loader_hooks.go` around lines 55 - 57, Remove the dead
engineLoaderHooksRequestContext struct and its associated context key
EngineLoaderHooksContextKey: delete the type declaration for
engineLoaderHooksRequestContext, remove the code that stores an empty instance
into context (the place creating/storing engineLoaderHooksRequestContext in
request context), and remove the EngineLoaderHooksContextKey from
router/internal/context/keys.go; ensure any latency logic continues to use
FetchTimingKey (do not change FetchTimingKey usages) and run a project-wide
search to confirm no remaining references to engineLoaderHooksRequestContext or
EngineLoaderHooksContextKey before committing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@router/core/engine_loader_hooks.go`:
- Around line 55-57: Remove the dead engineLoaderHooksRequestContext struct and
its associated context key EngineLoaderHooksContextKey: delete the type
declaration for engineLoaderHooksRequestContext, remove the code that stores an
empty instance into context (the place creating/storing
engineLoaderHooksRequestContext in request context), and remove the
EngineLoaderHooksContextKey from router/internal/context/keys.go; ensure any
latency logic continues to use FetchTimingKey (do not change FetchTimingKey
usages) and run a project-wide search to confirm no remaining references to
engineLoaderHooksRequestContext or EngineLoaderHooksContextKey before
committing.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08efda8 and 3c901ce.

📒 Files selected for processing (1)
  • router/core/engine_loader_hooks.go

@alepane21 alepane21 changed the title Ale/eng 8915 router router http request duration and other request fix: improve router.http.request.duration_milliseconds Mar 2, 2026
@alepane21 alepane21 marked this pull request as ready for review March 3, 2026 17:23
if fetchTiming, ok := value.(*atomic.Int64); ok {
exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(fetchTiming.Load())
fetchLatency = time.Duration(fetchTiming.Load())
exprCtx.Subgraph.Request.ClientTrace.FetchDuration = fetchLatency
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This measures the low level round trip, whereas the current measurement should measure this plus the engine hook logic that would add to the request duration.

However more importantly, would it work with GRPC data sources?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! It would not work with gRPC data sources.

@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Mar 27, 2026
@github-actions github-actions Bot removed the Stale label Apr 4, 2026
@github-actions
Copy link
Copy Markdown

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions Bot added the Stale label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants