Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds a Prometheus integration test for parallel subgraph durations, adjusts engine loader hooks to use per‑subgraph fetch timing for logging and metrics, and makes one test assertion JSON‑structure aware. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
router-tests/prometheus_parallel_subgraph_metrics_test.go (1)
71-73: Consider documenting the tolerance values.The magic numbers (250ms tolerance, 400ms minimum gap) work but their rationale isn't immediately clear. A brief comment would help future maintainers understand the expected timing bounds.
📝 Suggested documentation
employeesDurationMs := employeesHistogram.GetSampleSum() productsDurationMs := productsHistogram.GetSampleSum() + // Products should complete close to productsDelay (within 250ms tolerance for test overhead) require.Greater(t, productsDurationMs, float64(productsDelay.Milliseconds()-250)) + // Employees (no delay) should complete much faster than Products require.Less(t, employeesDurationMs, float64(productsDelay.Milliseconds()/2)) + // Verify meaningful separation between parallel subgraph latencies require.Greater(t, productsDurationMs-employeesDurationMs, 400.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router-tests/prometheus_parallel_subgraph_metrics_test.go` around lines 71 - 73, The assertions in the test using magic numbers (the 250ms tolerance and 400ms minimum gap) lack explanation; add a brief comment above the three assertions explaining why those tolerances were chosen (e.g., expected scheduling/jitter buffer, processing overhead, and intended concurrency gap between products and employees requests) and reference the variables used: productsDurationMs, employeesDurationMs, and productsDelay.Milliseconds() so future maintainers understand the relationship and how the values were derived.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@router-tests/prometheus_parallel_subgraph_metrics_test.go`:
- Around line 71-73: The assertions in the test using magic numbers (the 250ms
tolerance and 400ms minimum gap) lack explanation; add a brief comment above the
three assertions explaining why those tolerances were chosen (e.g., expected
scheduling/jitter buffer, processing overhead, and intended concurrency gap
between products and employees requests) and reference the variables used:
productsDurationMs, employeesDurationMs, and productsDelay.Milliseconds() so
future maintainers understand the relationship and how the values were derived.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2540 +/- ##
=======================================
Coverage 62.21% 62.22%
=======================================
Files 241 241
Lines 25499 25503 +4
=======================================
+ Hits 15864 15868 +4
- Misses 8297 8298 +1
+ Partials 1338 1337 -1
🚀 New features to boost your workflow:
|
…outer_http_request_duration-and-other-request
…tion-and-other-request
…tion-and-other-request
…request_duration-and-other-request' into ale/eng-8915-router-router_http_request_duration-and-other-request
…tion-and-other-request
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@router/core/engine_loader_hooks.go`:
- Around line 177-187: The current code sets subgraphFetchLatency from
ctx.Value(rcontext.FetchTimingKey) and assigns it to
exprCtx.Subgraph.Request.ClientTrace.FetchDuration only when a fetchTiming
exists, but after the fallback path (when subgraphFetchLatency = latency) the
FetchDuration remains zero; update the fallback branch so that after assigning
subgraphFetchLatency = latency you also set
exprCtx.Subgraph.Request.ClientTrace.FetchDuration = subgraphFetchLatency (i.e.,
ensure exprCtx.Subgraph.Request.ClientTrace.FetchDuration is always assigned to
subgraphFetchLatency whether it came from fetchTiming or the latency fallback).
| employeesDurationMs := employeesHistogram.GetSampleSum() | ||
| productsDurationMs := productsHistogram.GetSampleSum() | ||
|
|
||
| require.Greater(t, productsDurationMs, float64(productsDelay.Milliseconds()-250)) |
There was a problem hiding this comment.
That's a source for flaky tests.
|
The issue is not this: OnFinished should be called right after the subgraph is complete! |
When during the same query plan there where more than one fetch toward subgraph in parallel, the longer timing was applied to all the subgraphs metric.
Also, after the merge of last engine release, a test was flaky.
Summary by CodeRabbit
Tests
Bug Fixes
Checklist