FIX integration tests by escaping aalcr and adding judge args#1062
FIX integration tests by escaping aalcr and adding judge args#1062gwarmstrong merged 5 commits intomainfrom
Conversation
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
📝 WalkthroughWalkthroughFour files are modified to introduce a new constant Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/gpu-tests/test_eval.py (1)
225-244: Expanded excluded_datasets set looks reasonableExcluding
bfcl_v4andaalcrfrom the “prepare and eval all datasets” sweep is consistent with the existing pattern of carving out heavy or problematic datasets and should help keep this integration test stable. You might optionally add a short comment forbfcl_v4similar toaalcrfor future maintainability.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
nemo_skills/dataset/open-proof-corpus-judge/__init__.py(1 hunks)nemo_skills/dataset/proof-arena-judge/__init__.py(1 hunks)nemo_skills/dataset/proof-bench-judge/__init__.py(1 hunks)tests/gpu-tests/test_eval.py(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (4)
nemo_skills/dataset/open-proof-corpus-judge/__init__.py (1)
19-20: Judge-only marker constant is appropriateDefining
JUDGE_PIPELINE_ARGS = {}with a clear comment is consistent with the newhasattr-based detection and doesn’t alter existing behavior; it cleanly marks this dataset as judge-only.nemo_skills/dataset/proof-bench-judge/__init__.py (1)
19-20: Consistent judge-only marker for proof-bench-judgeAdding
JUDGE_PIPELINE_ARGS = {}with the judge-only comment aligns this module with the other judge datasets and works as intended with the updated detection logic.tests/gpu-tests/test_eval.py (1)
257-259: hasattr-based judge dataset detection is correctSwitching from a truthiness check to
hasattr(dataset_module, "JUDGE_PIPELINE_ARGS")correctly treats an empty dict marker as “judge-enabled” and matches the new constants added in the judge datasets, without changing behavior for modules that don’t define the attribute.nemo_skills/dataset/proof-arena-judge/__init__.py (1)
19-20: Judge-only marker added consistently
JUDGE_PIPELINE_ARGS = {}plus the clarifying comment brings this dataset in line with the other judge-only modules and integrates cleanly with the updated test detection logic.
|
@coderabbitai title |
|
Tip For best results, initiate chat on the files or code changes. @gwarmstrong, based on the changes in this PR, here are some more descriptive title suggestions: Recommended: Alternatives: The current title "Fix integration tests" is quite generic. The PR specifically adds the Would you like me to use one of these titles, or would you prefer a different phrasing? |
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.