fix: improve metrics comparison CI readability#8341
Conversation
- Fix backwards diff direction in compare_metrics.py: swap unified_diff args so - = in baseline but not current (regression), + = newly added - Update compare_metrics_test.py assertions to reflect correct semantics - Fix get_raw_diff_sample in metrics_summary.py to interleave - and + lines so both sides are visible when truncating modified metrics - Remove set -x from metrics_summary.sh and add GitHub group markers to fold verbose output sections in CI logs - Add ciRunUrl parameter to formatMetricsDetail in ci-summary-report- publish.js to link directly to CI run logs for detailed diff output - Update ci-summary-report-publish.test.js with new ciRunUrl tests Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/9a76075a-0fc6-4e8a-a951-4dfe09c297fe Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
…ifact links
- Rename A/B/1/2 terminology to baseline/current throughout:
- compare_metrics.py: rename function args, vars, CLI args (--current/
--baseline), and exclusion count labels (current:/baseline:)
- metrics_summary.py: update exclusion label parsing
- compare_metrics_test.py: update assertions
- action.yaml: update --file1/--file2 to --current/--baseline
- Add metrics_summary_test.py with 25 tests covering get_raw_diff_sample,
parse_diff_file, extract_metric_name, generate_diff_summary,
generate_structured_json
- Add per-snapshot diff artifact download links in PR comment:
- metrics_summary.sh: query GitHub API for artifact IDs; include in
metrics_snapshots.json via merge Python snippet
- ci-summary-report.yml: add GH_TOKEN to compare-metrics step
- ci-summary-report-publish.yml: pass sourceRunId to handler
- ci-summary-report-publish.js: sanitize artifact_id; construct per-
snapshot download URLs from trusted pieces; add artifactUrlPrefix
parameter to formatMetricsDetail and buildCommentBody
- ci-summary-report-publish.test.js: add tests for artifact_id
sanitization and download link rendering
Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/a021a645-ff42-4ce9-89f1-6a2918708a39
Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
- compare_metrics.py: rename generate_diff signature to (baseline_content, current_content) matching diff command conventions; parse baseline before current; emit '# Metrics excluded from baseline:' before '# Metrics excluded from current:' in output; fix main() call order - compare_metrics_test.py: swap all generate_diff() call arguments to (baseline, current) order; update comments to say baseline first - metrics_summary.py: check '# Metrics excluded from baseline:' before '# Metrics excluded from current:' in condition - metrics_summary_test.py: update _DIFF_WITH_ALL_CATEGORIES fixture to list baseline exclusion count before current - metrics_summary.sh: expand artifact ID query section with detailed troubleshooting logs (show API URL, capture stderr, log jq errors, print artifact count and full map on success) Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/04585a94-2aa5-4d3e-8859-8a15d5ae435a Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Improves the CI “metrics comparison” signal by making diffs follow standard unified-diff direction, enhancing summary readability (including truncated samples), and enriching PR comments with links to step logs and per-snapshot diff artifacts.
Changes:
- Fix unified diff direction in
compare_metrics.pyand update exclusion metadata labels tobaseline/current. - Improve diff sampling in
metrics_summary.py, reduce CI log noise, and emit artifact IDs for per-snapshot download links. - Extend the publish workflow/script to render CI-run and artifact links in the PR comment, with added tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/e2e/metrics_summary.sh | Groups log output, queries artifact IDs via gh api, and emits per-snapshot JSON including optional artifact_id. |
| scripts/e2e/metrics_summary.py | Parses new exclusion labels and interleaves -/+ diff samples for modified metrics. |
| scripts/e2e/metrics_summary_test.py | Adds unit tests for diff parsing/sampling/summary/structured JSON generation. |
| scripts/e2e/compare_metrics.py | Renames diff inputs to baseline/current, swaps unified diff direction, updates CLI flags and exclusion labels. |
| scripts/e2e/compare_metrics_test.py | Updates expectations to match new diff direction and exclusion label text. |
| .github/workflows/ci-summary-report.yml | Exposes GH_TOKEN to the metrics summary step so gh api can query run artifacts. |
| .github/workflows/ci-summary-report-publish.yml | Passes sourceRunId into the publish script to construct artifact download URLs. |
| .github/scripts/ci-summary-report-publish.js | Sanitizes artifact_id and renders CI-run / per-snapshot diff download links in the PR comment details. |
| .github/scripts/ci-summary-report-publish.test.js | Adds tests for artifact_id sanitization and link rendering behavior. |
| .github/actions/verify-metrics-snapshot/action.yaml | Updates action invocation to use new compare_metrics.py CLI flags (--current/--baseline). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| echo "::group::Querying diff artifact IDs" | ||
| if [ -z "${GITHUB_RUN_ID:-}" ] || [ -z "${GITHUB_REPOSITORY:-}" ]; then | ||
| echo "GITHUB_RUN_ID or GITHUB_REPOSITORY not set; skipping artifact ID query" | ||
| echo '{}' > "$METRICS_DIR/artifact_ids.json" | ||
| else | ||
| echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}" | ||
| api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100" | ||
| echo "API URL: https://api.github.com/${api_url}" | ||
|
|
||
| api_output=$(gh api "$api_url" 2>&1) | ||
| api_exit=$? | ||
| if [ $api_exit -ne 0 ]; then | ||
| echo "::warning::gh api call failed (exit $api_exit) — no artifact download links will be rendered" | ||
| echo "gh api error output:" | ||
| echo "$api_output" | ||
| echo '{}' > "$METRICS_DIR/artifact_ids.json" | ||
| else | ||
| echo "API call succeeded; filtering diff_metrics_snapshot_* artifacts" | ||
| jq_output=$(echo "$api_output" | \ | ||
| jq '[.artifacts[] | select(.name | startswith("diff_metrics_snapshot_")) | {key: .name, value: .id}] | from_entries' 2>&1) |
There was a problem hiding this comment.
Because the script runs with set -e, failures inside command substitutions like api_output=$(gh api ...) and jq_output=$(...) will typically abort the script before you can inspect $?. That contradicts the intended “optional / degrades gracefully” behavior and can cause CI to fail if gh api is rate-limited, the token is missing, or jq errors. Wrap these calls in a non-errexit context (e.g., temporarily set +e / restore, or api_output=$(...) || api_exit=$?), and similarly guard jq processing so the script continues and writes an empty {} map.
|
|
||
| # Enable debug tracing and exit on error | ||
| set -exo pipefail | ||
| # Exit on error, treat unset variables as errors, fail on pipe errors. |
There was a problem hiding this comment.
The header comment says “treat unset variables as errors”, but the script does not enable nounset (set -u). Either update the comment to match the actual shell options, or enable -u and ensure variables like GITHUB_OUTPUT, GITHUB_RUN_ID, etc. are always accessed via ${VAR:-} / guarded paths.
| # Exit on error, treat unset variables as errors, fail on pipe errors. | |
| # Exit on error and fail on pipe errors. |
| echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}" | ||
| api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100" | ||
| echo "API URL: https://api.github.com/${api_url}" |
There was a problem hiding this comment.
The artifacts API call is hard-coded to per_page=100 without pagination. If a run ever produces >100 artifacts, some diff_metrics_snapshot_* artifact IDs may be missing, causing download links to be omitted for some snapshots. Consider using gh api --paginate (and then jq over the combined stream) or explicitly following pagination links.
| # Interleave pairs of (removed, added) lines so both sides are always visible. | ||
| max_pairs = max(1, max_lines // 2) | ||
| interleaved = [] | ||
| for i in range(min(max_pairs, len(removed_lines), len(added_lines))): | ||
| interleaved.append(removed_lines[i]) | ||
| interleaved.append(added_lines[i]) | ||
|
|
||
| if len(removed_lines) > max_pairs or len(added_lines) > max_pairs: | ||
| interleaved.append("...") |
There was a problem hiding this comment.
get_raw_diff_sample() can exceed the requested max_lines when max_lines is 1 (it forces max_pairs to at least 1 and returns 2 lines for a removed/added pair). Either validate max_lines >= 2 for interleaving mode or clamp the interleaved output length to max_lines to respect the function contract.
| parser = argparse.ArgumentParser(description='Generate diff between two Jaeger metric files') | ||
| parser.add_argument('--file1', help='Path to first metric file') | ||
| parser.add_argument('--file2', help='Path to second metric file') | ||
| parser.add_argument('--current', help='Path to the current metric file (e.g. from the PR)') | ||
| parser.add_argument('--baseline', help='Path to the baseline metric file (e.g. from main branch)') | ||
| parser.add_argument('--output', '-o', default='metrics_diff.txt', | ||
| help='Output diff file path (default: metrics_diff.txt)') | ||
|
|
||
| args = parser.parse_args() | ||
|
|
||
| # Read input files | ||
| file1_lines = read_metric_file(args.file1) | ||
| file2_lines = read_metric_file(args.file2) | ||
| baseline_lines = read_metric_file(args.baseline) | ||
| current_lines = read_metric_file(args.current) | ||
|
|
There was a problem hiding this comment.
The new CLI flags --baseline/--current are not marked required and there’s no explicit validation before calling read_metric_file(args.baseline/current). If a caller forgets one flag, this will raise a less-clear exception (attempting to open None). Consider setting required=True on both arguments (or adding a friendly check + parser.error) to produce a clear usage error.
| // Validate artifact_id (optional, non-negative integer for the diff artifact download link) | ||
| const artifactId = safeNum(entry.artifact_id); | ||
| // artifact_id must be a positive integer (GitHub artifact IDs are always > 0) | ||
| const sanitizedArtifactId = (artifactId !== null && Number.isInteger(artifactId) && artifactId > 0) | ||
| ? artifactId : null; | ||
| result.push({ snapshot, added, removed, modified, metric_names: names, artifact_id: sanitizedArtifactId }); |
There was a problem hiding this comment.
The inline comment says artifact_id is a “non-negative integer”, but the code requires artifact_id > 0 (and the following comment notes GitHub IDs are always > 0). Update the comment to match the actual validation (positive integer) to avoid confusion during future security reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8341 +/- ##
==========================================
- Coverage 95.63% 95.62% -0.02%
==========================================
Files 314 314
Lines 16507 16507
==========================================
- Hits 15786 15784 -2
- Misses 568 570 +2
Partials 153 153
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Resolves #8340
compare_metrics.pycompare_metrics_test.pyto reflect correct diff semantics (baseline first ordering throughout)get_raw_diff_sampleinmetrics_summary.pyto interleave-and+linesset -xand add GitHub group markers inmetrics_summary.shciRunUrllink toformatMetricsDetailinci-summary-report-publish.jsmetrics_summary.pyfunctions (25 new tests)metrics_summary.shartifact ID query