feat: start nemo gym and other environments with cached venvs#1927
feat: start nemo gym and other environments with cached venvs#1927
Conversation
This avoids issues like seeing (raylet) Installing ... which can lead to failures in high node count settings Signed-off-by: Terry Kong <terryk@nvidia.com>
📝 WalkthroughWalkthroughModified environment creation logic in utils.py to conditionally create local virtual environments on actor nodes when using the "uv" Python executable, with proper environment variable propagation to the runtime environment. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@nemo_rl/environments/utils.py`:
- Around line 116-119: The environment variables VIRTUAL_ENV and
UV_PROJECT_ENVIRONMENT are being set to actor_py_exec (the Python executable
path returned by create_local_venv_on_each_node) instead of the venv root;
compute the venv root from actor_py_exec (e.g., venv_root =
os.path.dirname(os.path.dirname(actor_py_exec))) and set both VIRTUAL_ENV and
UV_PROJECT_ENVIRONMENT to venv_root in the extra_env_vars mapping, or
alternatively change create_local_venv_on_each_node to return both
(executable_path, venv_root) and use the returned venv_root when populating
extra_env_vars.
🧹 Nitpick comments (2)
nemo_rl/environments/utils.py (2)
111-115: Minor: dotted FQN used asvenv_namecreates unusually-named directories.
actor_class_fqn(e.g."nemo_rl.environments.nemo_gym.NemoGym") is passed asvenv_name. This will create directories with dots in the name. Not a bug, but consider sanitizing (e.g. replacing.with_) or using just the class name for readability.
122-123: Propagating the entire driveros.environto workers may leak sensitive variables.
{**dict(os.environ), **extra_env_vars}copies every environment variable from the driver process to worker runtime environments. This could unintentionally propagate secrets, tokens, or driver-specific config. Consider allowlisting only the variables that workers actually need, or at minimum filtering out known sensitive prefixes.
…-NeMo#1927) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
…-NeMo#1927) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
This avoids issues like seeing (raylet) Installing ... which can lead to failures in high node count settings
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
closes #1925
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit