cicd / testing: Add xfails tracker script#2227
Conversation
WalkthroughA new Python script for scanning test suites to detect and aggregate pytest xfail markers. It uses AST-based analysis to identify decorator-based xfails, parameterized xfails, and runtime xfail calls, then generates a formatted report grouped by failure reason with metadata for each occurrence. Changes
Sequence DiagramsequenceDiagram
participant Main as main()
participant Discovery as find_test_files()
participant Parser as AST Parser
participant Collector as XFailCollector
participant Aggregator as collect_xfails()
participant Grouper as group_xfails_by_reason()
participant Formatter as format_table()
participant Output as Output Report
Main->>Discovery: Scan tests directory
Discovery-->>Main: Return test file paths
Main->>Aggregator: Process all test files
loop For each test file
Aggregator->>Parser: Parse Python file
Parser->>Collector: Visit AST nodes
alt Detect Decorator Xfail
Collector->>Collector: Extract `@pytest.mark.xfail`
else Detect Parameter Xfail
Collector->>Collector: Extract pytest.param marks
else Detect Runtime Xfail
Collector->>Collector: Extract pytest.xfail() calls
end
Collector-->>Aggregator: XFailInfo instances
end
Aggregator-->>Main: List of all XFailInfo
Main->>Grouper: Group by reason
Grouper-->>Main: Grouped xfails dict
Main->>Formatter: Format for output
Formatter->>Formatter: Build summary & tables
Formatter-->>Main: Formatted report string
Main->>Output: Print report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kahyunnam, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new utility designed to improve the tracking and management of technical debt within the test suite. By adding an automated script to report Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a very useful script for tracking pytest.mark.xfail markers, which will be a great tool for managing technical debt in tests. The implementation using Python's ast module is solid. I've provided a few suggestions to improve robustness, maintainability, and readability of the script. The most important one is to handle aliased imports of pytest to make the script more resilient to different coding styles. Other suggestions focus on refactoring for clarity, adding type hints, and improving compatibility messages for older Python versions. Overall, great work on this utility!
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/xfails_tracker.py (1)
1-10: Document Python version requirement.The script uses
ast.unparse()(line 209) which requires Python 3.9+. Consider adding this requirement to the docstring to help users understand compatibility.Apply this diff to document the requirement:
#!/usr/bin/env python3 """ XFails Tracker - Report Generator for pytest.mark.xfail markers This script scans the test suite for xfail markers and generates a report showing the total number of xfails and their reasons. +Requires Python 3.9+ for full functionality (falls back to ast.dump on older versions). + Usage: python scripts/xfails_tracker.py """
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/xfails_tracker.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
scripts/xfails_tracker.py
272-272: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
🔇 Additional comments (10)
scripts/xfails_tracker.py (10)
20-34: LGTM!The dataclass design is clean and captures all necessary metadata for tracking xfails. The type hint for
xfail_typewith inline documentation is helpful.
39-66: LGTM!The state management for nested functions is well-implemented with proper save/restore patterns. The test function detection logic correctly handles both naming conventions and decorator-based tests.
68-111: LGTM!The AST pattern matching for pytest calls and decorators is implemented correctly. The comment on line 78 helpfully explains why runtime xfails are checked outside test functions.
113-143: LGTM!The decorator handling correctly distinguishes between parameterized and bare decorators. The marks argument processing properly handles both single markers and collections.
157-188: LGTM!The xfail information extraction correctly handles pytest.mark.xfail semantics where the first positional argument is the condition. The keyword argument processing properly handles reason, strict, and **kwargs cases.
214-241: LGTM!The runtime xfail extraction correctly handles the different semantics where the first positional argument is the reason (not condition as in decorator xfails). The comment explaining why condition is None is helpful.
244-253: LGTM!The test file discovery logic correctly identifies test files by pattern and includes common helper files that may contain xfail calls. The use of
set()to remove duplicates is appropriate.
256-275: Broad exception catching is acceptable here.The broad
Exceptioncatch on line 272 (flagged by static analysis) is appropriate for a reporting tool that should continue processing remaining files even if one fails. Error messages are properly logged to stderr.
286-351: LGTM!The report formatting is well-structured with both summary and detailed sections. The sorting by count, path relativization, and reason truncation all enhance readability.
354-377: LGTM!The main function correctly determines paths, handles missing directories, and separates progress messages (stderr) from the actual report (stdout), which is good practice for scripts that may be piped.
|
/bot run |
yzh119
left a comment
There was a problem hiding this comment.
I tried the script and it looks cool. The next step could be creating a weekly job running this script and create an github issue.
A reference can be found at https://github.com/flashinfer-ai/flashinfer/blob/main/.github/workflows/update-codeowners.yml
📌 Description
Testing item from open November roadmap todos. Adds xfails reporting/tracking script, to keep track of any technical debt noted in testing.
🔍 Related Issues
Related to this copilot PR which was closed earlier: #1733
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.