Skip to content

fix: improve metrics comparison CI readability#8341

Merged
yurishkuro merged 5 commits intomainfrom
copilot/chore-fix-metrics-comparison-ci
Apr 10, 2026
Merged

fix: improve metrics comparison CI readability#8341
yurishkuro merged 5 commits intomainfrom
copilot/chore-fix-metrics-comparison-ci

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 10, 2026

Resolves #8340

  • Fix backwards diff direction in compare_metrics.py
  • Update tests in compare_metrics_test.py to reflect correct diff semantics (baseline first ordering throughout)
  • Fix get_raw_diff_sample in metrics_summary.py to interleave - and + lines
  • Remove set -x and add GitHub group markers in metrics_summary.sh
  • Add ciRunUrl link to formatMetricsDetail in ci-summary-report-publish.js
  • Rename A/B/1/2 terminology → baseline/current throughout, with baseline-first ordering in all signatures, labels, comments, and tests
  • Add tests for metrics_summary.py functions (25 new tests)
  • Add per-snapshot diff artifact download links in PR comments
  • Add detailed troubleshooting logs in metrics_summary.sh artifact ID query

- Fix backwards diff direction in compare_metrics.py: swap unified_diff
  args so - = in baseline but not current (regression), + = newly added
- Update compare_metrics_test.py assertions to reflect correct semantics
- Fix get_raw_diff_sample in metrics_summary.py to interleave - and +
  lines so both sides are visible when truncating modified metrics
- Remove set -x from metrics_summary.sh and add GitHub group markers
  to fold verbose output sections in CI logs
- Add ciRunUrl parameter to formatMetricsDetail in ci-summary-report-
  publish.js to link directly to CI run logs for detailed diff output
- Update ci-summary-report-publish.test.js with new ciRunUrl tests

Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/9a76075a-0fc6-4e8a-a951-4dfe09c297fe

Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix metrics comparison CI check readability fix: improve metrics comparison CI readability Apr 10, 2026
Copilot AI requested a review from yurishkuro April 10, 2026 14:27
Comment thread scripts/e2e/compare_metrics.py Outdated
Comment thread scripts/e2e/compare_metrics_test.py Outdated
Comment thread scripts/e2e/metrics_summary.py
Comment thread .github/scripts/ci-summary-report-publish.js
…ifact links

- Rename A/B/1/2 terminology to baseline/current throughout:
  - compare_metrics.py: rename function args, vars, CLI args (--current/
    --baseline), and exclusion count labels (current:/baseline:)
  - metrics_summary.py: update exclusion label parsing
  - compare_metrics_test.py: update assertions
  - action.yaml: update --file1/--file2 to --current/--baseline
- Add metrics_summary_test.py with 25 tests covering get_raw_diff_sample,
  parse_diff_file, extract_metric_name, generate_diff_summary,
  generate_structured_json
- Add per-snapshot diff artifact download links in PR comment:
  - metrics_summary.sh: query GitHub API for artifact IDs; include in
    metrics_snapshots.json via merge Python snippet
  - ci-summary-report.yml: add GH_TOKEN to compare-metrics step
  - ci-summary-report-publish.yml: pass sourceRunId to handler
  - ci-summary-report-publish.js: sanitize artifact_id; construct per-
    snapshot download URLs from trusted pieces; add artifactUrlPrefix
    parameter to formatMetricsDetail and buildCommentBody
  - ci-summary-report-publish.test.js: add tests for artifact_id
    sanitization and download link rendering

Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/a021a645-ff42-4ce9-89f1-6a2918708a39

Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Copilot AI requested a review from yurishkuro April 10, 2026 15:33
Comment thread scripts/e2e/compare_metrics.py Outdated
Comment thread scripts/e2e/compare_metrics.py Outdated
Comment thread scripts/e2e/compare_metrics_test.py Outdated
Comment thread scripts/e2e/metrics_summary.py Outdated
Comment thread scripts/e2e/metrics_summary.sh
- compare_metrics.py: rename generate_diff signature to (baseline_content,
  current_content) matching diff command conventions; parse baseline before
  current; emit '# Metrics excluded from baseline:' before '# Metrics
  excluded from current:' in output; fix main() call order
- compare_metrics_test.py: swap all generate_diff() call arguments to
  (baseline, current) order; update comments to say baseline first
- metrics_summary.py: check '# Metrics excluded from baseline:' before
  '# Metrics excluded from current:' in condition
- metrics_summary_test.py: update _DIFF_WITH_ALL_CATEGORIES fixture to
  list baseline exclusion count before current
- metrics_summary.sh: expand artifact ID query section with detailed
  troubleshooting logs (show API URL, capture stderr, log jq errors,
  print artifact count and full map on success)

Agent-Logs-Url: https://github.com/jaegertracing/jaeger/sessions/04585a94-2aa5-4d3e-8859-8a15d5ae435a

Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Copilot AI requested a review from yurishkuro April 10, 2026 17:36
@yurishkuro yurishkuro added the changelog:ci Change related to continuous integration / testing label Apr 10, 2026
@yurishkuro yurishkuro marked this pull request as ready for review April 10, 2026 17:53
@yurishkuro yurishkuro requested a review from a team as a code owner April 10, 2026 17:53
Copilot AI review requested due to automatic review settings April 10, 2026 17:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the CI “metrics comparison” signal by making diffs follow standard unified-diff direction, enhancing summary readability (including truncated samples), and enriching PR comments with links to step logs and per-snapshot diff artifacts.

Changes:

  • Fix unified diff direction in compare_metrics.py and update exclusion metadata labels to baseline/current.
  • Improve diff sampling in metrics_summary.py, reduce CI log noise, and emit artifact IDs for per-snapshot download links.
  • Extend the publish workflow/script to render CI-run and artifact links in the PR comment, with added tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scripts/e2e/metrics_summary.sh Groups log output, queries artifact IDs via gh api, and emits per-snapshot JSON including optional artifact_id.
scripts/e2e/metrics_summary.py Parses new exclusion labels and interleaves -/+ diff samples for modified metrics.
scripts/e2e/metrics_summary_test.py Adds unit tests for diff parsing/sampling/summary/structured JSON generation.
scripts/e2e/compare_metrics.py Renames diff inputs to baseline/current, swaps unified diff direction, updates CLI flags and exclusion labels.
scripts/e2e/compare_metrics_test.py Updates expectations to match new diff direction and exclusion label text.
.github/workflows/ci-summary-report.yml Exposes GH_TOKEN to the metrics summary step so gh api can query run artifacts.
.github/workflows/ci-summary-report-publish.yml Passes sourceRunId into the publish script to construct artifact download URLs.
.github/scripts/ci-summary-report-publish.js Sanitizes artifact_id and renders CI-run / per-snapshot diff download links in the PR comment details.
.github/scripts/ci-summary-report-publish.test.js Adds tests for artifact_id sanitization and link rendering behavior.
.github/actions/verify-metrics-snapshot/action.yaml Updates action invocation to use new compare_metrics.py CLI flags (--current/--baseline).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +60 to +79
echo "::group::Querying diff artifact IDs"
if [ -z "${GITHUB_RUN_ID:-}" ] || [ -z "${GITHUB_REPOSITORY:-}" ]; then
echo "GITHUB_RUN_ID or GITHUB_REPOSITORY not set; skipping artifact ID query"
echo '{}' > "$METRICS_DIR/artifact_ids.json"
else
echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}"
api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100"
echo "API URL: https://api.github.com/${api_url}"

api_output=$(gh api "$api_url" 2>&1)
api_exit=$?
if [ $api_exit -ne 0 ]; then
echo "::warning::gh api call failed (exit $api_exit) — no artifact download links will be rendered"
echo "gh api error output:"
echo "$api_output"
echo '{}' > "$METRICS_DIR/artifact_ids.json"
else
echo "API call succeeded; filtering diff_metrics_snapshot_* artifacts"
jq_output=$(echo "$api_output" | \
jq '[.artifacts[] | select(.name | startswith("diff_metrics_snapshot_")) | {key: .name, value: .id}] | from_entries' 2>&1)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the script runs with set -e, failures inside command substitutions like api_output=$(gh api ...) and jq_output=$(...) will typically abort the script before you can inspect $?. That contradicts the intended “optional / degrades gracefully” behavior and can cause CI to fail if gh api is rate-limited, the token is missing, or jq errors. Wrap these calls in a non-errexit context (e.g., temporarily set +e / restore, or api_output=$(...) || api_exit=$?), and similarly guard jq processing so the script continues and writes an empty {} map.

Copilot uses AI. Check for mistakes.

# Enable debug tracing and exit on error
set -exo pipefail
# Exit on error, treat unset variables as errors, fail on pipe errors.
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says “treat unset variables as errors”, but the script does not enable nounset (set -u). Either update the comment to match the actual shell options, or enable -u and ensure variables like GITHUB_OUTPUT, GITHUB_RUN_ID, etc. are always accessed via ${VAR:-} / guarded paths.

Suggested change
# Exit on error, treat unset variables as errors, fail on pipe errors.
# Exit on error and fail on pipe errors.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +67
echo "Querying artifacts for run ${GITHUB_RUN_ID} in ${GITHUB_REPOSITORY}"
api_url="repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/artifacts?per_page=100"
echo "API URL: https://api.github.com/${api_url}"
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artifacts API call is hard-coded to per_page=100 without pagination. If a run ever produces >100 artifacts, some diff_metrics_snapshot_* artifact IDs may be missing, causing download links to be omitted for some snapshots. Consider using gh api --paginate (and then jq over the combined stream) or explicitly following pagination links.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +105
# Interleave pairs of (removed, added) lines so both sides are always visible.
max_pairs = max(1, max_lines // 2)
interleaved = []
for i in range(min(max_pairs, len(removed_lines), len(added_lines))):
interleaved.append(removed_lines[i])
interleaved.append(added_lines[i])

if len(removed_lines) > max_pairs or len(added_lines) > max_pairs:
interleaved.append("...")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_raw_diff_sample() can exceed the requested max_lines when max_lines is 1 (it forces max_pairs to at least 1 and returns 2 lines for a removed/added pair). Either validate max_lines >= 2 for interleaving mode or clamp the interleaved output length to max_lines to respect the function contract.

Copilot uses AI. Check for mistakes.
Comment on lines 179 to +190
parser = argparse.ArgumentParser(description='Generate diff between two Jaeger metric files')
parser.add_argument('--file1', help='Path to first metric file')
parser.add_argument('--file2', help='Path to second metric file')
parser.add_argument('--current', help='Path to the current metric file (e.g. from the PR)')
parser.add_argument('--baseline', help='Path to the baseline metric file (e.g. from main branch)')
parser.add_argument('--output', '-o', default='metrics_diff.txt',
help='Output diff file path (default: metrics_diff.txt)')

args = parser.parse_args()

# Read input files
file1_lines = read_metric_file(args.file1)
file2_lines = read_metric_file(args.file2)
baseline_lines = read_metric_file(args.baseline)
current_lines = read_metric_file(args.current)

Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new CLI flags --baseline/--current are not marked required and there’s no explicit validation before calling read_metric_file(args.baseline/current). If a caller forgets one flag, this will raise a less-clear exception (attempting to open None). Consider setting required=True on both arguments (or adding a friendly check + parser.error) to produce a clear usage error.

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +105
// Validate artifact_id (optional, non-negative integer for the diff artifact download link)
const artifactId = safeNum(entry.artifact_id);
// artifact_id must be a positive integer (GitHub artifact IDs are always > 0)
const sanitizedArtifactId = (artifactId !== null && Number.isInteger(artifactId) && artifactId > 0)
? artifactId : null;
result.push({ snapshot, added, removed, modified, metric_names: names, artifact_id: sanitizedArtifactId });
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment says artifact_id is a “non-negative integer”, but the code requires artifact_id > 0 (and the following comment notes GitHub IDs are always > 0). Update the comment to match the actual validation (positive integer) to avoid confusion during future security reviews.

Copilot uses AI. Check for mistakes.
Copilot AI requested a review from yurishkuro April 10, 2026 18:06
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.62%. Comparing base (ea88ef2) to head (8baaccc).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8341      +/-   ##
==========================================
- Coverage   95.63%   95.62%   -0.02%     
==========================================
  Files         314      314              
  Lines       16507    16507              
==========================================
- Hits        15786    15784       -2     
- Misses        568      570       +2     
  Partials      153      153              
Flag Coverage Δ
badger_direct 9.30% <ø> (ø)
badger_e2e 1.07% <ø> (ø)
cassandra-4.x-direct-manual 13.63% <ø> (ø)
cassandra-4.x-e2e-auto 1.06% <ø> (ø)
cassandra-4.x-e2e-manual 1.06% <ø> (ø)
cassandra-5.x-direct-manual 13.63% <ø> (ø)
cassandra-5.x-e2e-auto 1.06% <ø> (ø)
cassandra-5.x-e2e-manual 1.06% <ø> (ø)
clickhouse 1.20% <ø> (ø)
elasticsearch-6.x-direct 17.49% <ø> (ø)
elasticsearch-7.x-direct 17.52% <ø> (ø)
elasticsearch-8.x-direct 17.68% <ø> (ø)
elasticsearch-8.x-e2e 1.07% <ø> (ø)
elasticsearch-9.x-e2e 1.07% <ø> (ø)
grpc_direct 8.09% <ø> (ø)
grpc_e2e 1.07% <ø> (ø)
kafka-3.x-v2 1.07% <ø> (ø)
memory_v2 1.07% <ø> (ø)
opensearch-1.x-direct 17.57% <ø> (ø)
opensearch-2.x-direct 17.57% <ø> (ø)
opensearch-2.x-e2e 1.07% <ø> (ø)
opensearch-3.x-e2e 1.07% <ø> (ø)
query 1.07% <ø> (ø)
tailsampling-processor 0.54% <ø> (ø)
unittests 94.26% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yurishkuro yurishkuro merged commit 22442ce into main Apr 10, 2026
81 checks passed
@yurishkuro yurishkuro deleted the copilot/chore-fix-metrics-comparison-ci branch April 10, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:ci Change related to continuous integration / testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[chore]: Metrics comparison CI check is difficult to read

4 participants