fix: IFBench error handling and build improvements#1073
Conversation
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
📝 WalkthroughWalkthroughThe changes introduce parametrized, commit-specific IFBench fetching in the Dockerfile and enhance error handling in the IFBench evaluation library by wrapping instruction checks in try/except blocks with logging, while disabling the en_core_web_sm download. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
dockerfiles/Dockerfile.nemo-skills (1)
40-45: UseIFBENCH_DIRconsistently or drop it as a configurable ARGThe commit-pinned checkout flow looks fine, but
IFBENCH_DIRis only used in these twoRUNsteps. Later instructions (COPYandRUN cd /opt/benchmarks/IFBench && git apply ifbench.patch) still hard-code the path, so overriding--build-arg IFBENCH_DIR=...would break the build.Either (a) replace those hard-coded
/opt/benchmarks/IFBenchusages with${IFBENCH_DIR}, or (b) removeIFBENCH_DIRas a build-time override and keep the path fixed to avoid misleading configurability.dockerfiles/ifbench.patch (1)
72-77: Prefer logging overCommenting out the
download('en_core_web_sm')call makes sense given the image now pre-downloads the model, but a bareConsider using the module logger instead, and optionally gating the message behind a debug/info level:
-# assumed to be predownloaded -print("skipping download of en_core_web_sm") -# download('en_core_web_sm') +# assumed to be predownloaded +logger = logging.getLogger(__name__) +logger.info("Skipping download of en_core_web_sm (assumed preinstalled)") +# download('en_core_web_sm')(Adjust if a
loggeris already defined earlier in the file.)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dockerfiles/Dockerfile.nemo-skills(1 hunks)dockerfiles/ifbench.patch(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (2)
dockerfiles/ifbench.patch (2)
39-57: Exception handling in the loose-following test looks solidWrapping
instruction.check_following(r)in a try/except, skipping empty responses explicitly, and logging vialogging.exceptionprevents a single bad response or buggy checker from aborting the entire evaluation run while still surfacing the stack trace.The behavior for non-empty, non-exceptional responses remains the same (first response that passes
check_followingmarks the instruction as followed), so this change is safe.
61-65: Trailing newline inprint_reportis harmlessAdding a blank line after the report loop doesn’t affect correctness and only tweaks formatting of the printed report. No changes needed.
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary