diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 3e3e4fb3fe..b83ec70073 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -10,15 +10,15 @@ List issues that this PR closes ([syntax](https://docs.github.com/en/issues/trac * **You can potentially add a usage example below** ```python -# Add a code snippet demonstrating how to use this +# Add a code snippet demonstrating how to use this ``` # Before your PR is "Ready for review" **Pre checks**: -- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md) +- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md) - [ ] Did you write any new necessary tests? -- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/NeMo-RL/blob/main/docs/testing.md) for how to run tests -- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/NeMo-RL/blob/main/docs/documentation.md) for how to write, build and test the docs. +- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA-NeMo/RL/blob/main/docs/testing.md) for how to run tests +- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA-NeMo/RL/blob/main/docs/documentation.md) for how to write, build and test the docs. # Additional Information * ... diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2cc6a3051b..3dc065655a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -31,7 +31,7 @@ We follow a direct clone and branch workflow for now: 1. Clone the repository directly: ```bash - git clone https://github.com/NVIDIA/NeMo-RL + git clone https://github.com/NVIDIA-NeMo/RL cd nemo-rl ``` diff --git a/README.md b/README.md index 9f605b1b5c..cdf9404834 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,7 @@ cd nemo-rl # by running (This is not necessary if you are using the pure Pytorch/DTensor path): git submodule update --init --recursive -# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure +# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure # submodules are automatically updated after switching branches or pulling updates by configuring git with: # git config submodule.recurse true @@ -226,7 +226,7 @@ sbatch \ We also support multi-turn generation and training (tool use, games, etc.). Reference example for training to play a Sliding Puzzle Game: ```sh -uv run python examples/run_grpo_sliding_puzzle.py +uv run python examples/run_grpo_sliding_puzzle.py ``` ## Supervised Fine-Tuning (SFT) @@ -409,7 +409,7 @@ If you use NeMo RL in your research, please cite it using the following BibTeX e ```bibtex @misc{nemo-rl, title = {NeMo RL: A Scalable and Efficient Post-Training Library}, -howpublished = {\url{https://github.com/NVIDIA/NeMo-RL}}, +howpublished = {\url{https://github.com/NVIDIA-NeMo/RL}}, year = {2025}, note = {GitHub repository}, } @@ -417,8 +417,8 @@ note = {GitHub repository}, ## Contributing -We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md) for more information on how to get involved. +We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md) for more information on how to get involved. ## Licenses -NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA/NeMo-RL/blob/main/LICENSE). +NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA-NeMo/RL/blob/main/LICENSE). diff --git a/docs/adding-new-models.md b/docs/adding-new-models.md index 155a012f47..e0de97ae40 100644 --- a/docs/adding-new-models.md +++ b/docs/adding-new-models.md @@ -12,7 +12,7 @@ $$\text{KL} = E_{x \sim \pi}[\pi(x) - \pi_{\text{ref}}(x)]$$ When summed/integrated, replacing the $x \sim \pi$ with $x \sim \pi_{\text{wrong}}$ leads to an error of: -$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$ +$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$ So, to verify correctness, we calculate: @@ -65,28 +65,28 @@ When investigating discrepancies beyond the acceptable threshold, focus on these When validating Hugging Face-based models, perform the following checks: -- **Compare log probabilities** +- **Compare log probabilities** Ensure the generation log probabilities from inference backends like **vLLM** match those computed by Hugging Face. This comparison helps diagnose potential mismatches. -- **Test parallelism** +- **Test parallelism** Verify consistency with other parallelism settings. -- **Variance** +- **Variance** Repeat tests multiple times (e.g., 10 runs) to confirm that behavior is deterministic or within acceptable variance. -- **Check sequence lengths** - Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens. +- **Check sequence lengths** + Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens. Ensure the model behaves consistently at each length. -- **Use real and dummy data** - - **Real data:** Tokenize and generate from actual text samples. +- **Use real and dummy data** + - **Real data:** Tokenize and generate from actual text samples. - **Dummy data:** Simple numeric sequences to test basic generation. -- **Vary sampling parameters** - Test both greedy and sampling generation modes. +- **Vary sampling parameters** + Test both greedy and sampling generation modes. Adjust temperature and top-p to confirm output consistency across backends. -- **Test different batch sizes** +- **Test different batch sizes** Try with batch sizes of 1, 8, and 32 to ensure consistent behavior across different batch configurations. --- @@ -95,11 +95,11 @@ When validating Hugging Face-based models, perform the following checks: ### Additional Validation -- **Compare Megatron outputs** +- **Compare Megatron outputs** Ensure the Megatron forward pass aligns with Hugging Face and the generation log probabilities from inference backends like **vLLM**. -- **Parallel settings** - Match the same parallelism configurations used for the HuggingFace-based tests. +- **Parallel settings** + Match the same parallelism configurations used for the HuggingFace-based tests. Confirm outputs remain consistent across repeated runs. --- @@ -128,7 +128,7 @@ By following these validation steps and ensuring your model's outputs remain con We also maintain a set of standalone scripts that can be used to diagnose issues related to correctness that we have encountered before. -## [1.max_model_len_respected.py](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py) +## [1.max_model_len_respected.py](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py) Test if a new model respects the `max_model_len` passed to vllm: @@ -142,7 +142,7 @@ uv run --extra vllm tools/model_diagnostics/1.max_model_len_respected.py Qwen/Qw # [Qwen/Qwen2.5-1.5B] ALL GOOD! ``` -## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py) +## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py) Test that vLLM yields near-identical token log-probabilities when comparing decoding with a single prefill pass across multiple prompts. diff --git a/docs/model-quirks.md b/docs/model-quirks.md index fa2b181c7e..ca08b2741b 100644 --- a/docs/model-quirks.md +++ b/docs/model-quirks.md @@ -6,7 +6,7 @@ This document outlines special cases and model-specific behaviors that require c ### Tied Weights -Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA/NeMo-RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1. +Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA-NeMo/RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1. **Special Handling:** - We skip the tied weights check for all Gemma-3 models when using the DTensor policy, allowing training using TP > 1. diff --git a/examples/configs/grpo-deepscaler-1.5b-8K.yaml b/examples/configs/grpo-deepscaler-1.5b-8K.yaml index 96bc7f2e76..1013f3d4c2 100644 --- a/examples/configs/grpo-deepscaler-1.5b-8K.yaml +++ b/examples/configs/grpo-deepscaler-1.5b-8K.yaml @@ -30,7 +30,7 @@ checkpointing: save_period: 10 policy: - # Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227) + # Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227) model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" tokenizer: name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default diff --git a/examples/configs/grpo_math_1B.yaml b/examples/configs/grpo_math_1B.yaml index 85cc620b62..1842b01497 100644 --- a/examples/configs/grpo_math_1B.yaml +++ b/examples/configs/grpo_math_1B.yaml @@ -30,7 +30,7 @@ checkpointing: save_period: 10 policy: - # Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227) + # Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227) model_name: "Qwen/Qwen2.5-1.5B" tokenizer: name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default diff --git a/examples/configs/grpo_sliding_puzzle.yaml b/examples/configs/grpo_sliding_puzzle.yaml index 0b99e750e8..8493bfc40e 100644 --- a/examples/configs/grpo_sliding_puzzle.yaml +++ b/examples/configs/grpo_sliding_puzzle.yaml @@ -24,7 +24,7 @@ policy: max_new_tokens: ${policy.max_total_sequence_length} temperature: 1.0 # Setting top_p/top_k to 0.999/10000 to strip out Qwen's special/illegal tokens - # https://github.com/NVIDIA/NeMo-RL/issues/237 + # https://github.com/NVIDIA-NeMo/RL/issues/237 top_p: 0.999 top_k: 10000 stop_token_ids: null @@ -38,7 +38,7 @@ policy: data: add_system_prompt: false - + env: sliding_puzzle_game: cfg: diff --git a/examples/configs/recipes/llm/grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml b/examples/configs/recipes/llm/grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml index 0ec0ef477a..2458739e2e 100644 --- a/examples/configs/recipes/llm/grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml +++ b/examples/configs/recipes/llm/grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml @@ -45,7 +45,7 @@ policy: context_parallel_size: 1 custom_parallel_plan: null dynamic_batching: - # TODO: OOMs if enabled https://github.com/NVIDIA/NeMo-RL/issues/383 + # TODO: OOMs if enabled https://github.com/NVIDIA-NeMo/RL/issues/383 enabled: False train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}} logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}} diff --git a/examples/converters/convert_dcp_to_hf.py b/examples/converters/convert_dcp_to_hf.py index fc53418696..d87d97a64e 100644 --- a/examples/converters/convert_dcp_to_hf.py +++ b/examples/converters/convert_dcp_to_hf.py @@ -51,7 +51,7 @@ def main(): model_name_or_path = config["policy"]["model_name"] # TODO: After the following PR gets merged: - # https://github.com/NVIDIA/NeMo-RL/pull/148/files + # https://github.com/NVIDIA-NeMo/RL/pull/148/files # tokenizer should be copied from policy/tokenizer/* instead of relying on the model name # We can expose a arg at the top level --tokenizer_path to plumb that through. # This is more stable than relying on the current NeMo-RL get_tokenizer() which can diff --git a/nemo_rl/algorithms/dpo.py b/nemo_rl/algorithms/dpo.py index c7b3de9f5f..3883328216 100644 --- a/nemo_rl/algorithms/dpo.py +++ b/nemo_rl/algorithms/dpo.py @@ -71,7 +71,7 @@ class DPOConfig(TypedDict): preference_average_log_probs: bool sft_average_log_probs: bool ## TODO(@ashors) support other loss functions - ## https://github.com/NVIDIA/NeMo-RL/issues/193 + ## https://github.com/NVIDIA-NeMo/RL/issues/193 # preference_loss: str # gt_reward_scale: float preference_loss_weight: float diff --git a/nemo_rl/distributed/ray_actor_environment_registry.py b/nemo_rl/distributed/ray_actor_environment_registry.py index 4c7eebee13..1f1937729d 100644 --- a/nemo_rl/distributed/ray_actor_environment_registry.py +++ b/nemo_rl/distributed/ray_actor_environment_registry.py @@ -17,7 +17,7 @@ ACTOR_ENVIRONMENT_REGISTRY: dict[str, str] = { "nemo_rl.models.generation.vllm.VllmGenerationWorker": PY_EXECUTABLES.VLLM, # Temporary workaround for the coupled implementation of DTensorPolicyWorker and vLLM. - # This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA/NeMo-RL/issues/501 is resolved. + # This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA-NeMo/RL/issues/501 is resolved. "nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker": PY_EXECUTABLES.VLLM, "nemo_rl.models.policy.fsdp1_policy_worker.FSDP1PolicyWorker": PY_EXECUTABLES.BASE, "nemo_rl.models.policy.megatron_policy_worker.MegatronPolicyWorker": PY_EXECUTABLES.MCORE, diff --git a/nemo_rl/models/generation/vllm.py b/nemo_rl/models/generation/vllm.py index f0cd5eb50b..966ad0a205 100644 --- a/nemo_rl/models/generation/vllm.py +++ b/nemo_rl/models/generation/vllm.py @@ -330,7 +330,7 @@ def _patch_vllm_init_workers_ray(): enable_prefix_caching=torch.cuda.get_device_capability()[0] >= 8, dtype=self.cfg["vllm_cfg"]["precision"], seed=seed, - # Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA/NeMo-RL/issues/186) + # Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA-NeMo/RL/issues/186) enforce_eager=True, max_model_len=self.cfg["vllm_cfg"]["max_model_len"], trust_remote_code=True, diff --git a/nemo_rl/models/policy/dtensor_policy_worker.py b/nemo_rl/models/policy/dtensor_policy_worker.py index 61dcd9a127..a5e1d9259d 100644 --- a/nemo_rl/models/policy/dtensor_policy_worker.py +++ b/nemo_rl/models/policy/dtensor_policy_worker.py @@ -162,7 +162,7 @@ def __init__( device_map="cpu", # load weights onto CPU initially # Always load the model in float32 to keep master weights in float32. # Keeping the master weights in lower precision has shown to cause issues with convergence. - # https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models. + # https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models. torch_dtype=torch.float32, trust_remote_code=True, **sliding_window_overwrite( @@ -381,7 +381,7 @@ def train( and not self.skip_tie_check ): raise ValueError( - f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead." + f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead." ) if gbs is None: gbs = self.cfg["train_global_batch_size"] diff --git a/nemo_rl/models/policy/fsdp1_policy_worker.py b/nemo_rl/models/policy/fsdp1_policy_worker.py index ef0eb98720..f4ec53daa0 100644 --- a/nemo_rl/models/policy/fsdp1_policy_worker.py +++ b/nemo_rl/models/policy/fsdp1_policy_worker.py @@ -96,7 +96,7 @@ def __init__( device_map="cpu", # load weights onto CPU initially # Always load the model in float32 to keep master weights in float32. # Keeping the master weights in lower precision has shown to cause issues with convergence. - # https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models. + # https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models. torch_dtype=torch.float32, trust_remote_code=True, **sliding_window_overwrite( @@ -110,7 +110,7 @@ def __init__( self.reference_model = AutoModelForCausalLM.from_pretrained( model_name, device_map="cpu", # load weights onto CPU initially - torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA/nemo-rl/issues/13 is fixed + torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA-NeMo/RL/issues/13 is fixed trust_remote_code=True, **sliding_window_overwrite( model_name @@ -249,7 +249,7 @@ def train( skip_tie_check = os.environ.get("NRL_SKIP_TIED_WEIGHT_CHECK") if self.num_tied_weights != 0 and not skip_tie_check: raise ValueError( - f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead." + f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead." ) if gbs is None: diff --git a/nemo_rl/package_info.py b/nemo_rl/package_info.py index 3fcefc1375..29883366db 100644 --- a/nemo_rl/package_info.py +++ b/nemo_rl/package_info.py @@ -28,8 +28,8 @@ __contact_names__ = "NVIDIA" __contact_emails__ = "nemo-tookit@nvidia.com" __homepage__ = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/" -__repository_url__ = "https://github.com/NVIDIA/NeMo-RL" -__download_url__ = "https://github.com/NVIDIA/NeMo-RL/releases" +__repository_url__ = "https://github.com/NVIDIA-NeMo/RL" +__download_url__ = "https://github.com/NVIDIA-NeMo/RL/releases" __description__ = "NeMo-RL - a toolkit for model alignment" __license__ = "Apache2" __keywords__ = "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, language, reinforcement learning, RLHF, preference modeling, SteerLM, DPO" diff --git a/nemo_rl/utils/native_checkpoint.py b/nemo_rl/utils/native_checkpoint.py index b857264d31..43d511bd74 100644 --- a/nemo_rl/utils/native_checkpoint.py +++ b/nemo_rl/utils/native_checkpoint.py @@ -248,7 +248,7 @@ def convert_dcp_to_hf( config.save_pretrained(hf_ckpt_path) # TODO: After the following PR gets merged: - # https://github.com/NVIDIA/NeMo-RL/pull/148/files + # https://github.com/NVIDIA-NeMo/RL/pull/148/files # tokenizer should be copied from policy/tokenizer/* instead of relying on the model name # We can expose a arg at the top level --tokenizer_path to plumb that through. # This is more stable than relying on the current NeMo-RL get_tokenizer() which can diff --git a/pyproject.toml b/pyproject.toml index 62095ae9fb..6b1371de83 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -101,7 +101,7 @@ test = [ megatron-core = { workspace = true } nemo-tron = { workspace = true } # The NeMo Run source to be used by nemo-tron -nemo_run = { git = "https://github.com/NVIDIA/NeMo-Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" } +nemo_run = { git = "https://github.com/NVIDIA-NeMo/Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" } # torch/torchvision/triton all come from the torch index in order to pick up aarch64 wheels torch = [ { index = "pytorch-cu128" }, diff --git a/tests/functional/dpo.sh b/tests/functional/dpo.sh index 562f62a0b8..b03b611b25 100755 --- a/tests/functional/dpo.sh +++ b/tests/functional/dpo.sh @@ -36,7 +36,7 @@ uv run $PROJECT_ROOT/examples/run_dpo.py \ uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS # TODO: threshold set higher since test is flaky -# https://github.com/NVIDIA/NeMo-RL/issues/370 +# https://github.com/NVIDIA-NeMo/RL/issues/370 uv run tests/check_metrics.py $JSON_METRICS \ 'data["train/loss"]["3"] < 0.8' diff --git a/tests/functional/test_converter_roundtrip.py b/tests/functional/test_converter_roundtrip.py index e551d0e6b5..8767e564f5 100644 --- a/tests/functional/test_converter_roundtrip.py +++ b/tests/functional/test_converter_roundtrip.py @@ -1,3 +1,16 @@ +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. #!/usr/bin/env python3 """ Functional test for converter roundtrip functionality. diff --git a/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long.v2.sh b/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long.v2.sh index 9e3a004460..b22c00dec0 100755 --- a/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long.v2.sh +++ b/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long.v2.sh @@ -32,7 +32,7 @@ uv run examples/run_sft.py \ # Convert tensorboard logs to json uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS -# TODO: the memory check is known to OOM. see https://github.com/NVIDIA/NeMo-RL/issues/263 +# TODO: the memory check is known to OOM. see https://github.com/NVIDIA-NeMo/RL/issues/263 # Only run metrics if the target step is reached if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then # TODO: FIGURE OUT CORRECT METRICS diff --git a/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp2sp.v2.sh b/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp2sp.v2.sh index 26c78649c8..abed80e5ed 100755 --- a/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp2sp.v2.sh +++ b/tests/test_suites/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp2sp.v2.sh @@ -31,7 +31,7 @@ uv run examples/run_sft.py \ # Convert tensorboard logs to json uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS -# TODO: memory check will fail due to OOM tracked here https://github.com/NVIDIA/NeMo-RL/issues/263 +# TODO: memory check will fail due to OOM tracked here https://github.com/NVIDIA-NeMo/RL/issues/263 # Only run metrics if the target step is reached if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then diff --git a/tests/test_suites/llm/sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v2.sh b/tests/test_suites/llm/sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v2.sh index eeaa9c8025..257add6fc5 100755 --- a/tests/test_suites/llm/sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v2.sh +++ b/tests/test_suites/llm/sft-qwen2.5-32b-4n8g-fsdp2tp8sp-actckpt.v2.sh @@ -3,7 +3,7 @@ SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd) source $SCRIPT_DIR/common.env # TODO: this config can crash on OOM -# https://github.com/NVIDIA/NeMo-RL/issues/263 +# https://github.com/NVIDIA-NeMo/RL/issues/263 # ===== BEGIN CONFIG ===== NUM_NODES=4 diff --git a/tests/unit/models/generation/test_vllm_generation.py b/tests/unit/models/generation/test_vllm_generation.py index dc1de1b123..1404b02337 100644 --- a/tests/unit/models/generation/test_vllm_generation.py +++ b/tests/unit/models/generation/test_vllm_generation.py @@ -475,7 +475,7 @@ async def test_vllm_policy_generation_async( @pytest.mark.skip( - reason="Skipping for now, will be fixed in https://github.com/NVIDIA/NeMo-RL/issues/408" + reason="Skipping for now, will be fixed in https://github.com/NVIDIA-NeMo/RL/issues/408" ) def test_vllm_worker_seed_behavior(cluster, tokenizer): """ diff --git a/tests/unit/utils/test_native_checkpoint.py b/tests/unit/utils/test_native_checkpoint.py index 88356d2dba..7df7f8543b 100755 --- a/tests/unit/utils/test_native_checkpoint.py +++ b/tests/unit/utils/test_native_checkpoint.py @@ -330,7 +330,7 @@ def test_convert_dcp_to_hf(policy, num_gpus): os.path.join(tmp_dir, "test_hf_and_dcp-hf-offline"), simple_policy_config["model_name"], # TODO: After the following PR gets merged: - # https://github.com/NVIDIA/NeMo-RL/pull/148/files + # https://github.com/NVIDIA-NeMo/RL/pull/148/files # tokenizer should be copied from policy/tokenizer/* instead of relying on the model name # We can expose a arg at the top level --tokenizer_path to plumb that through. # This is more stable than relying on the current NeMo-RL get_tokenizer() which can diff --git a/uv.lock b/uv.lock index 9b50767fac..e2a02bdd23 100644 --- a/uv.lock +++ b/uv.lock @@ -2427,7 +2427,7 @@ test = [ [[package]] name = "nemo-run" version = "0.5.0rc0.dev0" -source = { git = "https://github.com/NVIDIA/NeMo-Run?rev=414f0077c648fde2c71bb1186e97ccbf96d6844c#414f0077c648fde2c71bb1186e97ccbf96d6844c" } +source = { git = "https://github.com/NVIDIA-NeMo/Run?rev=414f0077c648fde2c71bb1186e97ccbf96d6844c#414f0077c648fde2c71bb1186e97ccbf96d6844c" } dependencies = [ { name = "catalogue" }, { name = "cryptography" }, @@ -2473,7 +2473,7 @@ requires-dist = [ { name = "ijson" }, { name = "lightning" }, { name = "matplotlib" }, - { name = "nemo-run", git = "https://github.com/NVIDIA/NeMo-Run?rev=414f0077c648fde2c71bb1186e97ccbf96d6844c" }, + { name = "nemo-run", git = "https://github.com/NVIDIA-NeMo/Run?rev=414f0077c648fde2c71bb1186e97ccbf96d6844c" }, { name = "onnx" }, { name = "scikit-learn" }, { name = "webdataset" },