Skip to content

Enhance e2e debug information with artifact collection#1356

Closed
rafaelvzago wants to merge 6 commits intoistio-ecosystem:mainfrom
rafaelvzago:enhance-e2e-debug-artifacts-892
Closed

Enhance e2e debug information with artifact collection#1356
rafaelvzago wants to merge 6 commits intoistio-ecosystem:mainfrom
rafaelvzago:enhance-e2e-debug-artifacts-892

Conversation

@rafaelvzago
Copy link
Copy Markdown
Contributor

This PR enhances e2e test debugging by implementing a comprehensive debug artifact collection system that automatically captures cluster state when tests fail.

Key Features

  1. New DebugCollector Package (tests/e2e/util/debugcollector/)

    • Records initial cluster state to focus on test-created resources
    • Collects comprehensive debug information on test failure
    • Saves artifacts in organized directory structure under $ARTIFACTS
  2. Integrated into All Test Suites

    • Ambient, Control Plane, Dual Stack, Multi-Control Plane, Multicluster, Operator tests
    • For multicluster tests, creates separate artifact collections per cluster
    • Only activates on test failure (no overhead on passing tests)
  3. Configurable Collection Depth

    • Environment variable DEBUG_COLLECTOR_DEPTH supports: full, minimal, logs-only
  4. Improved Console Output

    • Simplified LogDebugInfo to show summary and artifact location
    • Directs users to detailed artifacts instead of verbose console output

Why We Need It

Currently, when e2e tests fail, the console output has predefined commands that aren't always sufficient to diagnose failures. This enhancement provides comprehensive cluster state capture (similar to must-gather), organizes debug information for easier analysis, and saves artifacts for CI/CD systems.

Implementation Approach

Similar to the existing Cleaner pattern from PR #889, but focused on debug collection rather than cleanup.

Artifact Directory Structure

$ARTIFACTS/
  debug-<suite-name>-<timestamp>/
    cluster-scoped/
    namespaces/<namespace>/
      resources/
      logs/
      events.yaml
    istioctl/

Environment Variables

  • ARTIFACTS: Directory where debug artifacts are saved (default: /tmp)
  • DEBUG_COLLECTOR_DEPTH: Collection depth (full, minimal, logs-only)

Security

  • Secrets handled safely: only names/types collected, NOT actual data
  • No hardcoded credentials or sensitive information
  • Uses environment variables for configuration

Testing

  • All code compiles successfully with e2e build tags
  • All linter checks pass
  • Follows existing project patterns and code style
  • Documentation updated in tests/e2e/README.md

Red Hat AI Compliance

This implementation was developed with AI code assistant support. All AI-generated code has been thoroughly reviewed, tested, and validated according to Red Hat AI guidelines. No sensitive information was used in development.

Assisted-by: AI Code Assistant

@istio-testing
Copy link
Copy Markdown
Collaborator

Hi @rafaelvzago. Thanks for your PR.

I'm waiting for a istio-ecosystem or istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.55%. Comparing base (fd7e684) to head (ccf9044).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1356      +/-   ##
==========================================
- Coverage   80.64%   80.55%   -0.09%     
==========================================
  Files          44       44              
  Lines        2299     2299              
==========================================
- Hits         1854     1852       -2     
- Misses        327      328       +1     
- Partials      118      119       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ilure

Implement a new DebugCollector type that captures comprehensive cluster
state information when e2e tests fail and saves it as artifacts.

The debug collector:
- Records initial cluster state (namespaces) to focus on test-created resources
- Collects resources (Deployments, DaemonSets, Services, Pods, ConfigMaps)
- Captures pod logs from all containers
- Gathers events from all namespaces
- Collects custom resources (Istio, IstioCNI, ZTunnel, IstioRevision, IstioRevisionTag)
- Saves istioctl proxy-status output
- Organizes artifacts in timestamped directories under $ARTIFACTS

Supports configurable collection depth via DEBUG_COLLECTOR_DEPTH env var
(full, minimal, logs-only) to control the amount of debug data collected.

Fixes istio-ecosystem#892

Assisted-by: AI Code Assistant
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Integrate the DebugCollector into all e2e test suites to automatically
collect comprehensive debug information when tests fail:

- Ambient test suite
- Control plane and control plane update test suites
- Dual stack test suite
- Multi-control plane test suite
- Multicluster test suites (primary-remote, multi-primary, external control plane)
- Operator installation test suite

Each test suite now:
1. Creates a debug collector with a descriptive context name
2. Records initial cluster state in BeforeAll
3. Collects and saves debug artifacts in AfterAll on test failure
4. For multicluster tests, creates separate collectors for each cluster

The debug artifacts are saved before the existing LogDebugInfo call,
ensuring debug information is preserved even if cleanup fails.

Related to istio-ecosystem#892

Assisted-by: AI Code Assistant
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Update LogDebugInfo to provide a concise summary that directs users to
the debug artifacts directory for detailed information, rather than
printing extensive debug output to the console.

Changes:
- Print TEST FAILURE DETECTED banner with artifacts directory location
- Display quick status summary with high-level pod information
- Remove verbose debug output (now saved to artifacts by DebugCollector)
- Keep console output clean while preserving full debug info in artifacts

This improves test output readability while ensuring comprehensive debug
information is still available in the artifacts directory for
troubleshooting.

Related to istio-ecosystem#892

Assisted-by: AI Code Assistant
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Add comprehensive documentation for the debug collector feature to the
e2e test README, including:

- How to use the debug collector in test suites
- Explanation of artifact directory structure
- DEBUG_COLLECTOR_DEPTH environment variable options
- Best practices for multicluster test debugging
- Examples of debug collector usage patterns

The documentation follows the same pattern as the existing cleaner
documentation and provides clear guidance for test authors on how to
leverage the debug collection feature.

Related to istio-ecosystem#892

Assisted-by: AI Code Assistant
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
- Use new octal literal style (0o755, 0o644)
- Remove unused parameter in collectCustomResources
- Remove unused imports (istioctl, ptr) from common utils
- Remove obsolete debug functions that were replaced by artifact collection

These changes ensure code quality and align with project linting standards.

Related to istio-ecosystem#892

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
@rafaelvzago rafaelvzago force-pushed the enhance-e2e-debug-artifacts-892 branch from 44e18d8 to ccf9044 Compare November 17, 2025 17:17
rafaelvzago added a commit to rafaelvzago/sail-operator that referenced this pull request Nov 17, 2025
Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356
rafaelvzago added a commit to rafaelvzago/sail-operator that referenced this pull request Nov 17, 2025
Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
istio-testing pushed a commit that referenced this pull request Nov 17, 2025
Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: #1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
istio-testing pushed a commit to istio-testing/sail-operator that referenced this pull request Nov 18, 2025
Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
istio-testing added a commit that referenced this pull request Nov 18, 2025
Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: #1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Co-authored-by: Rafael Zago <rafaelvzago@gmail.com>
rafaelvzago added a commit to rafaelvzago/sail-operator that referenced this pull request Nov 21, 2025
…em#1357)

Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
rafaelvzago added a commit to rafaelvzago/sail-operator that referenced this pull request Nov 24, 2025
…em#1357)

Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
rafaelvzago added a commit to rafaelvzago/sail-operator that referenced this pull request Jan 22, 2026
…em#1363)

Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Co-authored-by: Rafael Zago <rafaelvzago@gmail.com>
Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
dgn pushed a commit to dgn/sail-operator that referenced this pull request Mar 17, 2026
…em#1357)

Update the codecov.yml ignore pattern from 'tests' to 'tests/**' to properly
exclude all files under the tests directory from coverage calculation.

This resolves the codecov/project failures for PRs that add e2e test
infrastructure code, which is not meant to be unit tested.

Related to: istio-ecosystem#1356

Signed-off-by: Rafael Zago <rafaelvzago@gmail.com>
Signed-off-by: Daniel Grimm <dgrimm@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants