Clarify HF offline behavior with --offline flag #2847
Clarify HF offline behavior with --offline flag #2847
Conversation
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: Alex Filby <afilby@nvidia.com>
- Remove non-required args from CLI parser test (account, partition) - Clarify docstring on executor offline test - Add test pinning container-writable as unconditional and HF_HUB_OFFLINE=0 as the default Signed-off-by: Alex Filby <afilby@nvidia.com>
…se and improve Qwen3 message Use argparse mutually_exclusive_group so conflicting flags are rejected at parse time. Clarify the Qwen3 assertion to mention huggingface-cli download and HF_HOME. Reformat test file per ruff. Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: Alex Filby <afilby@nvidia.com>
Signed-off-by: Alex Filby <afilby@nvidia.com>
c0946df to
f6c23d0
Compare
📝 WalkthroughWalkthroughThis pull request introduces offline mode support for HuggingFace Hub connectivity in the performance experiment CLI. An --offline flag is added to argument parsing with mutual exclusion from --hf_token, propagated through setup and executor functions to configure environment variables, and documented with comprehensive guidance and unit tests. Changes
Sequence DiagramsequenceDiagram
participant User as User / CLI
participant Parser as Argument Parser
participant Setup as setup_experiment.main()
participant Executor as slurm_executor()
participant Slurm as Slurm Batch System
User->>Parser: Call with --offline flag
Parser->>Parser: Parse args, validate --offline<br/>and --hf_token mutual exclusion
Parser->>Setup: Call main(..., offline=True)
Setup->>Setup: Validate Qwen3 requirement<br/>(hf_token or offline)
Setup->>Executor: Invoke slurm_executor(..., offline=True)
Executor->>Executor: Set HF_HUB_OFFLINE="1"<br/>in PERF_ENV_VARS
Executor->>Executor: Add --container-writable<br/>to srun_args
Executor->>Slurm: Submit sbatch with offline<br/>environment & writable container
Slurm->>Slurm: Execute task without<br/>HF Hub network calls
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/performance/README.md (1)
244-244:⚠️ Potential issue | 🟡 MinorTypographical error: double hyphen prefix.
Line 244 has
- -which appears to be a typo.📝 Proposed fix
-- - `--save_config_filepath`: Path to save the task configuration file. +- `--save_config_filepath`: Path to save the task configuration file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/README.md` at line 244, Fix the typographical double-hyphen bullet at the README entry by removing the extra hyphen so the line reads a single list bullet with the option name; specifically update the text containing "`--save_config_filepath`" to remove the leading duplicate hyphen (change "- - `--save_config_filepath`..." to "- `--save_config_filepath`...").scripts/performance/setup_experiment.py (1)
340-355:⚠️ Potential issue | 🟠 Major
dgxc_executordoes not accept anofflineparameter, creating inconsistent behavior withslurm_executor.The
offlineflag from CLI is silently ignored when a DGXCloud cluster is specified becausedgxc_executor(lines 167-182 inscripts/performance/utils/executors.py) does not accept anofflineparameter, unlikeslurm_executorwhich conditionally setsHF_HUB_OFFLINE=1when offline mode is enabled (lines 114-116). This means users running with--offline --dgxc_cluster=...cannot dynamically enable offline mode for DGXCloud.The
dgxc_executorhardcodesTRANSFORMERS_OFFLINE=1by default but lacks the conditional logic to respect user intent for offline mode. Consider either:
- Adding
offlineparameter support todgxc_executorwith conditionalHF_HUB_OFFLINEhandling, or- Raising an error when
--offlineis used with DGXCloud (if offline mode is not supported there)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/setup_experiment.py` around lines 340 - 355, The dgxc_executor call ignores the CLI --offline flag because dgxc_executor (in scripts/performance/utils/executors.py) does not accept an offline parameter while slurm_executor does; to fix, add an offline parameter to dgxc_executor signature and propagation in its call sites (including scripts/performance/setup_experiment.py) and implement the same conditional environment handling as slurm_executor: when offline is True set HF_HUB_OFFLINE=1 (and only set TRANSFORMERS_OFFLINE/other offline-related env vars when appropriate) or, if DGXCloud truly cannot support offline mode, explicitly detect offline in dgxc_executor and raise a clear error indicating --offline is unsupported for DGXCloud. Ensure you update the dgxc_executor parameter list, its custom_env_vars construction, and the call site in setup_experiment.py to pass the offline flag.
🧹 Nitpick comments (5)
tests/unit_tests/scripts/test_performance_offline_mode.py (5)
137-152: Add@pytest.mark.unitmarker.📝 Proposed fix
+@pytest.mark.unit def test_slurm_executor_default_has_container_writable_and_hub_online(import_performance_module): """By default, --container-writable is always set and HF Hub access stays online."""As per coding guidelines: "Use pytest markers to categorize tests (unit, integration, system)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/scripts/test_performance_offline_mode.py` around lines 137 - 152, Add the pytest unit marker to the test function test_slurm_executor_default_has_container_writable_and_hub_online by decorating it with `@pytest.mark.unit`; ensure pytest is imported in the file if not already, place the decorator immediately above the function definition and run tests to confirm the marker is recognized.
73-92: Add@pytest.mark.unitmarker to categorize this test.Per coding guidelines, tests should use pytest markers to categorize them.
📝 Proposed fix
+@pytest.mark.unit def test_parse_cli_args_accepts_offline_flag(import_performance_module): """The performance CLI should keep exposing the offline switch."""As per coding guidelines: "Use pytest markers to categorize tests (unit, integration, system)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/scripts/test_performance_offline_mode.py` around lines 73 - 92, Add the pytest unit marker to the test function test_parse_cli_args_accepts_offline_flag by decorating it with `@pytest.mark.unit`; ensure pytest is imported at the top of the file if not already present so the decorator resolves. Locate the function definition in tests/unit_tests/scripts/test_performance_offline_mode.py and add the `@pytest.mark.unit` decorator immediately above def test_parse_cli_args_accepts_offline_flag(...).
95-115: Add@pytest.mark.unitmarker.📝 Proposed fix
+@pytest.mark.unit def test_argparse_rejects_hf_token_with_offline(import_performance_module): """argparse should reject --hf_token and --offline together at parse time."""As per coding guidelines: "Use pytest markers to categorize tests (unit, integration, system)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/scripts/test_performance_offline_mode.py` around lines 95 - 115, Add the pytest unit marker to the test function test_argparse_rejects_hf_token_with_offline so it is categorized as a unit test; locate the function definition for test_argparse_rejects_hf_token_with_offline (which creates argument_parser and calls parser.parse_args) and add `@pytest.mark.unit` immediately above the def line to mark it accordingly.
118-134: Add@pytest.mark.unitmarker and note test isolation concern.The test correctly verifies offline mode behavior. However, due to
PERF_ENV_VARSbeing a mutable module-level dict (as noted in executors.py review), this test may be affected by or affect other tests in the same process. The test order independence could be compromised.📝 Proposed fix for marker
+@pytest.mark.unit def test_slurm_executor_sets_offline_env_and_container_writable(import_performance_module): """Offline mode should set HF_HUB_OFFLINE and preserve the offline Transformers default."""As per coding guidelines: "Use pytest markers to categorize tests (unit, integration, system)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/scripts/test_performance_offline_mode.py` around lines 118 - 134, Add the pytest marker and guard mutable global state: annotate test_slurm_executor_sets_offline_env_and_container_writable with `@pytest.mark.unit` and ensure it does not leak PERf_ENV_VARS changes by snapshotting and restoring the module-level PERF_ENV_VARS around the call to executors.slurm_executor (or use monkeypatch to set a fresh copy) so other tests remain isolated; reference the executors.slurm_executor invocation and the PERF_ENV_VARS mutable dict when implementing the snapshot/restore or monkeypatch.
155-171: Add@pytest.mark.unitmarker.📝 Proposed fix
+@pytest.mark.unit def test_slurm_executor_hf_token_enables_online_transformers(import_performance_module): """Providing an HF token should enable the online Transformers path."""As per coding guidelines: "Use pytest markers to categorize tests (unit, integration, system)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit_tests/scripts/test_performance_offline_mode.py` around lines 155 - 171, Add the pytest unit marker to the test function by decorating test_slurm_executor_hf_token_enables_online_transformers with `@pytest.mark.unit` and ensure pytest is imported in the module if not already; this means adding "import pytest" at the top (or ensuring it's present) and placing "@pytest.mark.unit" immediately above the def of test_slurm_executor_hf_token_enables_online_transformers so the test is categorized as a unit test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/performance/utils/executors.py`:
- Around line 39-48: The module-level dict PERF_ENV_VARS is being mutated
in-place by slurm_executor causing cross-call state leakage; to fix, make a
fresh local copy inside slurm_executor (e.g., env = PERF_ENV_VARS.copy() or
dict(PERF_ENV_VARS)) and perform all modifications (adding WANDB_API_KEY,
HF_TOKEN, HF_HUB_OFFLINE, TRANSFORMERS_OFFLINE overrides, etc.) on that local
copy before using it, leaving the module-level PERF_ENV_VARS immutable for
subsequent calls.
---
Outside diff comments:
In `@scripts/performance/README.md`:
- Line 244: Fix the typographical double-hyphen bullet at the README entry by
removing the extra hyphen so the line reads a single list bullet with the option
name; specifically update the text containing "`--save_config_filepath`" to
remove the leading duplicate hyphen (change "- - `--save_config_filepath`..." to
"- `--save_config_filepath`...").
In `@scripts/performance/setup_experiment.py`:
- Around line 340-355: The dgxc_executor call ignores the CLI --offline flag
because dgxc_executor (in scripts/performance/utils/executors.py) does not
accept an offline parameter while slurm_executor does; to fix, add an offline
parameter to dgxc_executor signature and propagation in its call sites
(including scripts/performance/setup_experiment.py) and implement the same
conditional environment handling as slurm_executor: when offline is True set
HF_HUB_OFFLINE=1 (and only set TRANSFORMERS_OFFLINE/other offline-related env
vars when appropriate) or, if DGXCloud truly cannot support offline mode,
explicitly detect offline in dgxc_executor and raise a clear error indicating
--offline is unsupported for DGXCloud. Ensure you update the dgxc_executor
parameter list, its custom_env_vars construction, and the call site in
setup_experiment.py to pass the offline flag.
---
Nitpick comments:
In `@tests/unit_tests/scripts/test_performance_offline_mode.py`:
- Around line 137-152: Add the pytest unit marker to the test function
test_slurm_executor_default_has_container_writable_and_hub_online by decorating
it with `@pytest.mark.unit`; ensure pytest is imported in the file if not already,
place the decorator immediately above the function definition and run tests to
confirm the marker is recognized.
- Around line 73-92: Add the pytest unit marker to the test function
test_parse_cli_args_accepts_offline_flag by decorating it with
`@pytest.mark.unit`; ensure pytest is imported at the top of the file if not
already present so the decorator resolves. Locate the function definition in
tests/unit_tests/scripts/test_performance_offline_mode.py and add the
`@pytest.mark.unit` decorator immediately above def
test_parse_cli_args_accepts_offline_flag(...).
- Around line 95-115: Add the pytest unit marker to the test function
test_argparse_rejects_hf_token_with_offline so it is categorized as a unit test;
locate the function definition for test_argparse_rejects_hf_token_with_offline
(which creates argument_parser and calls parser.parse_args) and add
`@pytest.mark.unit` immediately above the def line to mark it accordingly.
- Around line 118-134: Add the pytest marker and guard mutable global state:
annotate test_slurm_executor_sets_offline_env_and_container_writable with
`@pytest.mark.unit` and ensure it does not leak PERf_ENV_VARS changes by
snapshotting and restoring the module-level PERF_ENV_VARS around the call to
executors.slurm_executor (or use monkeypatch to set a fresh copy) so other tests
remain isolated; reference the executors.slurm_executor invocation and the
PERF_ENV_VARS mutable dict when implementing the snapshot/restore or
monkeypatch.
- Around line 155-171: Add the pytest unit marker to the test function by
decorating test_slurm_executor_hf_token_enables_online_transformers with
`@pytest.mark.unit` and ensure pytest is imported in the module if not already;
this means adding "import pytest" at the top (or ensuring it's present) and
placing "@pytest.mark.unit" immediately above the def of
test_slurm_executor_hf_token_enables_online_transformers so the test is
categorized as a unit test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ef5ac3be-73a9-41a0-b953-351960a53e46
📒 Files selected for processing (5)
scripts/performance/README.mdscripts/performance/argument_parser.pyscripts/performance/setup_experiment.pyscripts/performance/utils/executors.pytests/unit_tests/scripts/test_performance_offline_mode.py
|
/claude review |
Added a test case to check for regression. Signed-off-by: Alex Filby <afilby@nvidia.com>
4ba3718 to
ec09583
Compare
Update of #2672 for main branch:
Removed in: #2112 , Originally added in #2086 #2084
Summary by CodeRabbit
New Features
--offlineflag for offline HuggingFace Hub access with local caching support.--offlineand--hf_tokenare now mutually exclusive CLI options.Documentation
Tests