fix: improve metrics comparison CI readability by Copilot · Pull Request #8341 · jaegertracing/jaeger

Copilot · 2026-04-10T14:13:27Z

Resolves #8340

Fix backwards diff direction in compare_metrics.py
Update tests in compare_metrics_test.py to reflect correct diff semantics (baseline first ordering throughout)
Fix get_raw_diff_sample in metrics_summary.py to interleave - and + lines
Remove set -x and add GitHub group markers in metrics_summary.sh
Add ciRunUrl link to formatMetricsDetail in ci-summary-report-publish.js
Rename A/B/1/2 terminology → baseline/current throughout, with baseline-first ordering in all signatures, labels, comments, and tests
Add tests for metrics_summary.py functions (25 new tests)
Add per-snapshot diff artifact download links in PR comments
Add detailed troubleshooting logs in metrics_summary.sh artifact ID query

- Fix backwards diff direction in compare_metrics.py: swap unified_diff args so - = in baseline but not current (regression), + = newly added - Update compare_metrics_test.py assertions to reflect correct semantics - Fix get_raw_diff_sample in metrics_summary.py to interleave - and + lines so both sides are visible when truncating modified metrics - Remove set -x from metrics_summary.sh and add GitHub group markers to fold verbose output sections in CI logs - Add ciRunUrl parameter to formatMetricsDetail in ci-summary-report- publish.js to link directly to CI run logs for detailed diff output - Update ci-summary-report-publish.test.js with new ciRunUrl tests Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/9a76075a-0fc6-4e8a-a951-4dfe09c297fe Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>

…ifact links - Rename A/B/1/2 terminology to baseline/current throughout: - compare_metrics.py: rename function args, vars, CLI args (--current/ --baseline), and exclusion count labels (current:/baseline:) - metrics_summary.py: update exclusion label parsing - compare_metrics_test.py: update assertions - action.yaml: update --file1/--file2 to --current/--baseline - Add metrics_summary_test.py with 25 tests covering get_raw_diff_sample, parse_diff_file, extract_metric_name, generate_diff_summary, generate_structured_json - Add per-snapshot diff artifact download links in PR comment: - metrics_summary.sh: query GitHub API for artifact IDs; include in metrics_snapshots.json via merge Python snippet - ci-summary-report.yml: add GH_TOKEN to compare-metrics step - ci-summary-report-publish.yml: pass sourceRunId to handler - ci-summary-report-publish.js: sanitize artifact_id; construct per- snapshot download URLs from trusted pieces; add artifactUrlPrefix parameter to formatMetricsDetail and buildCommentBody - ci-summary-report-publish.test.js: add tests for artifact_id sanitization and download link rendering Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/a021a645-ff42-4ce9-89f1-6a2918708a39 Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>

- compare_metrics.py: rename generate_diff signature to (baseline_content, current_content) matching diff command conventions; parse baseline before current; emit '# Metrics excluded from baseline:' before '# Metrics excluded from current:' in output; fix main() call order - compare_metrics_test.py: swap all generate_diff() call arguments to (baseline, current) order; update comments to say baseline first - metrics_summary.py: check '# Metrics excluded from baseline:' before '# Metrics excluded from current:' in condition - metrics_summary_test.py: update _DIFF_WITH_ALL_CATEGORIES fixture to list baseline exclusion count before current - metrics_summary.sh: expand artifact ID query section with detailed troubleshooting logs (show API URL, capture stderr, log jq errors, print artifact count and full map on success) Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/04585a94-2aa5-4d3e-8859-8a15d5ae435a Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>

Copilot

Pull request overview

Improves the CI “metrics comparison” signal by making diffs follow standard unified-diff direction, enhancing summary readability (including truncated samples), and enriching PR comments with links to step logs and per-snapshot diff artifacts.

Changes:

Fix unified diff direction in compare_metrics.py and update exclusion metadata labels to baseline/current.
Improve diff sampling in metrics_summary.py, reduce CI log noise, and emit artifact IDs for per-snapshot download links.
Extend the publish workflow/script to render CI-run and artifact links in the PR comment, with added tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
scripts/e2e/metrics_summary.sh	Groups log output, queries artifact IDs via `gh api`, and emits per-snapshot JSON including optional `artifact_id`.
scripts/e2e/metrics_summary.py	Parses new exclusion labels and interleaves `-`/`+` diff samples for modified metrics.
scripts/e2e/metrics_summary_test.py	Adds unit tests for diff parsing/sampling/summary/structured JSON generation.
scripts/e2e/compare_metrics.py	Renames diff inputs to `baseline/current`, swaps unified diff direction, updates CLI flags and exclusion labels.
scripts/e2e/compare_metrics_test.py	Updates expectations to match new diff direction and exclusion label text.
.github/workflows/ci-summary-report.yml	Exposes `GH_TOKEN` to the metrics summary step so `gh api` can query run artifacts.
.github/workflows/ci-summary-report-publish.yml	Passes `sourceRunId` into the publish script to construct artifact download URLs.
.github/scripts/ci-summary-report-publish.js	Sanitizes `artifact_id` and renders CI-run / per-snapshot diff download links in the PR comment details.
.github/scripts/ci-summary-report-publish.test.js	Adds tests for `artifact_id` sanitization and link rendering behavior.
.github/actions/verify-metrics-snapshot/action.yaml	Updates action invocation to use new `compare_metrics.py` CLI flags (`--current/--baseline`).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T17:57:41Z

+echo "::group::Querying diff artifact IDs"
+if [ -z "${GITHUB_RUN_ID:-}" ] || [ -z "${GITHUB_REPOSITORY:-}" ]; then
+    echo "GITHUB_RUN_ID or GITHUB_REPOSITORY not set; skipping artifact ID query"
+    echo '{}' > "$METRICS_DIR/artifact_ids.json"
+else
+    echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}"
+    api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100"
+    echo "API URL: https://api.github.com/${api_url}"
+
+    api_output=$(gh api "$api_url" 2>&1)
+    api_exit=$?
+    if [ $api_exit -ne 0 ]; then
+        echo "::warning::gh api call failed (exit $api_exit) — no artifact download links will be rendered"
+        echo "gh api error output:"
+        echo "$api_output"
+        echo '{}' > "$METRICS_DIR/artifact_ids.json"
+    else
+        echo "API call succeeded; filtering diff_metrics_snapshot_* artifacts"
+        jq_output=$(echo "$api_output" | \
+            jq '[.artifacts[] | select(.name | startswith("diff_metrics_snapshot_")) | {key: .name, value: .id}] | from_entries' 2>&1)


Because the script runs with set -e, failures inside command substitutions like api_output=$(gh api ...) and jq_output=$(...) will typically abort the script before you can inspect $?. That contradicts the intended “optional / degrades gracefully” behavior and can cause CI to fail if gh api is rate-limited, the token is missing, or jq errors. Wrap these calls in a non-errexit context (e.g., temporarily set +e / restore, or api_output=$(...) || api_exit=$?), and similarly guard jq processing so the script continues and writes an empty {} map.

Copilot · 2026-04-10T17:57:42Z


-# Enable debug tracing and exit on error
-set -exo pipefail
+# Exit on error, treat unset variables as errors, fail on pipe errors.


The header comment says “treat unset variables as errors”, but the script does not enable nounset (set -u). Either update the comment to match the actual shell options, or enable -u and ensure variables like GITHUB_OUTPUT, GITHUB_RUN_ID, etc. are always accessed via ${VAR:-} / guarded paths.

Suggested change

# Exit on error, treat unset variables as errors, fail on pipe errors.

# Exit on error and fail on pipe errors.

Copilot · 2026-04-10T17:57:42Z

+    echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}"
+    api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100"
+    echo "API URL: https://api.github.com/${api_url}"


The artifacts API call is hard-coded to per_page=100 without pagination. If a run ever produces >100 artifacts, some diff_metrics_snapshot_* artifact IDs may be missing, causing download links to be omitted for some snapshots. Consider using gh api --paginate (and then jq over the combined stream) or explicitly following pagination links.

Copilot · 2026-04-10T17:57:42Z

+    # Interleave pairs of (removed, added) lines so both sides are always visible.
+    max_pairs = max(1, max_lines // 2)
+    interleaved = []
+    for i in range(min(max_pairs, len(removed_lines), len(added_lines))):
+        interleaved.append(removed_lines[i])
+        interleaved.append(added_lines[i])
+
+    if len(removed_lines) > max_pairs or len(added_lines) > max_pairs:
+        interleaved.append("...")


get_raw_diff_sample() can exceed the requested max_lines when max_lines is 1 (it forces max_pairs to at least 1 and returns 2 lines for a removed/added pair). Either validate max_lines >= 2 for interleaving mode or clamp the interleaved output length to max_lines to respect the function contract.

Copilot · 2026-04-10T17:57:43Z

    parser = argparse.ArgumentParser(description='Generate diff between two Jaeger metric files')
-    parser.add_argument('--file1', help='Path to first metric file')
-    parser.add_argument('--file2', help='Path to second metric file')
+    parser.add_argument('--current', help='Path to the current metric file (e.g. from the PR)')
+    parser.add_argument('--baseline', help='Path to the baseline metric file (e.g. from main branch)')
    parser.add_argument('--output', '-o', default='metrics_diff.txt',
                       help='Output diff file path (default: metrics_diff.txt)')
-    
+
    args = parser.parse_args()
-    
+
    # Read input files
-    file1_lines = read_metric_file(args.file1)
-    file2_lines = read_metric_file(args.file2)
-    
+    baseline_lines = read_metric_file(args.baseline)
+    current_lines = read_metric_file(args.current)
+


The new CLI flags --baseline/--current are not marked required and there’s no explicit validation before calling read_metric_file(args.baseline/current). If a caller forgets one flag, this will raise a less-clear exception (attempting to open None). Consider setting required=True on both arguments (or adding a friendly check + parser.error) to produce a clear usage error.

Copilot · 2026-04-10T17:57:43Z

+    // Validate artifact_id (optional, non-negative integer for the diff artifact download link)
+    const artifactId = safeNum(entry.artifact_id);
+    // artifact_id must be a positive integer (GitHub artifact IDs are always > 0)
+    const sanitizedArtifactId = (artifactId !== null && Number.isInteger(artifactId) && artifactId > 0)
+      ? artifactId : null;
+    result.push({ snapshot, added, removed, modified, metric_names: names, artifact_id: sanitizedArtifactId });


The inline comment says artifact_id is a “non-negative integer”, but the code requires artifact_id > 0 (and the following comment notes GitHub IDs are always > 0). Update the comment to match the actual validation (positive integer) to avoid confusion during future security reviews.

codecov · 2026-04-10T18:15:58Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.62%. Comparing base (ea88ef2) to head (8baaccc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8341      +/-   ##
==========================================
- Coverage   95.63%   95.62%   -0.02%     
==========================================
  Files         314      314              
  Lines       16507    16507              
==========================================
- Hits        15786    15784       -2     
- Misses        568      570       +2     
  Partials      153      153

Flag	Coverage Δ
badger_direct	`9.30% <ø> (ø)`
badger_e2e	`1.07% <ø> (ø)`
cassandra-4.x-direct-manual	`13.63% <ø> (ø)`
cassandra-4.x-e2e-auto	`1.06% <ø> (ø)`
cassandra-4.x-e2e-manual	`1.06% <ø> (ø)`
cassandra-5.x-direct-manual	`13.63% <ø> (ø)`
cassandra-5.x-e2e-auto	`1.06% <ø> (ø)`
cassandra-5.x-e2e-manual	`1.06% <ø> (ø)`
clickhouse	`1.20% <ø> (ø)`
elasticsearch-6.x-direct	`17.49% <ø> (ø)`
elasticsearch-7.x-direct	`17.52% <ø> (ø)`
elasticsearch-8.x-direct	`17.68% <ø> (ø)`
elasticsearch-8.x-e2e	`1.07% <ø> (ø)`
elasticsearch-9.x-e2e	`1.07% <ø> (ø)`
grpc_direct	`8.09% <ø> (ø)`
grpc_e2e	`1.07% <ø> (ø)`
kafka-3.x-v2	`1.07% <ø> (ø)`
memory_v2	`1.07% <ø> (ø)`
opensearch-1.x-direct	`17.57% <ø> (ø)`
opensearch-2.x-direct	`17.57% <ø> (ø)`
opensearch-2.x-e2e	`1.07% <ø> (ø)`
opensearch-3.x-e2e	`1.07% <ø> (ø)`
query	`1.07% <ø> (ø)`
tailsampling-processor	`0.54% <ø> (ø)`
unittests	`94.26% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Initial plan

4ed916d

Copilot AI assigned Copilot and yurishkuro Apr 10, 2026

Copilot started work on behalf of yurishkuro April 10, 2026 14:13 View session

Copilot AI linked an issue Apr 10, 2026 that may be closed by this pull request

[chore]: Metrics comparison CI check is difficult to read #8340

Closed

Copilot AI changed the title ~~[WIP] Fix metrics comparison CI check readability~~ fix: improve metrics comparison CI readability Apr 10, 2026

Copilot finished work on behalf of yurishkuro April 10, 2026 14:27

Copilot AI requested a review from yurishkuro April 10, 2026 14:27

yurishkuro requested changes Apr 10, 2026

View reviewed changes

Comment thread scripts/e2e/compare_metrics.py Outdated

Comment thread scripts/e2e/compare_metrics_test.py Outdated

Comment thread scripts/e2e/metrics_summary.py

Comment thread .github/scripts/ci-summary-report-publish.js

Copilot started work on behalf of yurishkuro April 10, 2026 15:19 View session

Copilot finished work on behalf of yurishkuro April 10, 2026 15:33

Copilot AI requested a review from yurishkuro April 10, 2026 15:33

yurishkuro requested changes Apr 10, 2026

View reviewed changes

Comment thread scripts/e2e/compare_metrics.py Outdated

Comment thread scripts/e2e/compare_metrics.py Outdated

Comment thread scripts/e2e/compare_metrics_test.py Outdated

Comment thread scripts/e2e/metrics_summary.py Outdated

Comment thread scripts/e2e/metrics_summary.sh

Copilot started work on behalf of yurishkuro April 10, 2026 17:30 View session

Copilot finished work on behalf of yurishkuro April 10, 2026 17:36

Copilot AI requested a review from yurishkuro April 10, 2026 17:36

yurishkuro added the changelog:ci Change related to continuous integration / testing label Apr 10, 2026

Merge branch 'main' into copilot/chore-fix-metrics-comparison-ci

8baaccc

yurishkuro marked this pull request as ready for review April 10, 2026 17:53

yurishkuro requested a review from a team as a code owner April 10, 2026 17:53

Copilot AI review requested due to automatic review settings April 10, 2026 17:53

Copilot started reviewing on behalf of yurishkuro April 10, 2026 17:54 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

yurishkuro approved these changes Apr 10, 2026

View reviewed changes

jaegertracingbot approved these changes Apr 10, 2026

View reviewed changes

Copilot started work on behalf of yurishkuro April 10, 2026 18:04 View session

Copilot finished work on behalf of yurishkuro April 10, 2026 18:06

Copilot AI requested a review from yurishkuro April 10, 2026 18:06

yurishkuro merged commit 22442ce into main Apr 10, 2026
81 checks passed

yurishkuro deleted the copilot/chore-fix-metrics-comparison-ci branch April 10, 2026 18:28

	# Exit on error, treat unset variables as errors, fail on pipe errors.
	# Exit on error and fail on pipe errors.

Conversation

Copilot AI commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Apr 10, 2026 •

edited

Loading

codecov Bot commented Apr 10, 2026 •

edited

Loading