Skip to content

add stem dependencies in main python sandbox#1099

Merged
gwarmstrong merged 3 commits intomainfrom
sandbox-add-stem
Dec 11, 2025
Merged

add stem dependencies in main python sandbox#1099
gwarmstrong merged 3 commits intomainfrom
sandbox-add-stem

Conversation

@jiacheng-xu
Copy link
Collaborator

@jiacheng-xu jiacheng-xu commented Dec 11, 2025

I dropped DDGS (package for duckduckgo) but I kept requests (a general library for requesting info). @ekmb let me know if you want to drop this.

I ran it locally. Please check if it works on CI/CD.

(base) aiapps-jcxu at ~/code/NeMo-Skills ±(sandbox-add-stem) ✗ ❯ docker build -t nemo-skills-sandbox:0.0 -f dockerfiles/Dockerfile.sandbox .
[+] Building 121.7s (25/25) FINISHED                                                                                                                                docker:default
 => [internal] load build definition from Dockerfile.sandbox                                                                                                                  0.0s
 => => transferring dockerfile: 6.55kB                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/tiangolo/uwsgi-nginx-flask:python3.10                                                                                              0.3s
 => [internal] load .dockerignore                                                                                                                                             0.0s
 => => transferring context: 2B                                                                                                                                               0.0s
 => [internal] load build context                                                                                                                                             0.0s
 => => transferring context: 591B                                                                                                                                             0.0s
 => [ 1/20] FROM docker.io/tiangolo/uwsgi-nginx-flask:python3.10@sha256:7237e68a5a6023ae8b7e046bc8114f29a2bb3b6906c40ad3c7e553021f5ed52e                                      0.0s
 => CACHED [ 2/20] RUN apt-get update &&     apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev &&     ARCH="amd64" &&     case "$ARCH" in         a  0.0s
 => CACHED [ 3/20] RUN curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh -s -- -y &&     /root/.elan/bin/elan toolchain install leanprove  0.0s
 => CACHED [ 4/20] RUN mkdir -p /lean4 && cd /lean4 &&     /root/.elan/bin/lake new my_project &&     cd my_project &&     echo 'leanprover/lean4:v4.12.0' > lean-toolchain   0.0s
 => CACHED [ 5/20] RUN cd /lean4/my_project &&     /root/.elan/bin/lake exe cache get &&     /root/.elan/bin/lake build                                                       0.0s
 => CACHED [ 6/20] COPY requirements/code_execution.txt /app/requirements.txt                                                                                                 0.0s
 => CACHED [ 7/20] RUN pip install --no-cache-dir -r /app/requirements.txt                                                                                                    0.0s
 => CACHED [ 8/20] COPY requirements/stem.txt /app/stem_requirements.txt                                                                                                      0.0s
 => [ 9/20] RUN curl -LsSf https://astral.sh/uv/install.sh | sh                                                                                                               0.9s
 => [10/20] RUN uv pip install --upgrade pip                                                                                                                                  0.9s
 => [11/20] RUN uv pip install -r /app/stem_requirements.txt                                                                                                                 72.0s 
 => [12/20] RUN mkdir -p /data && pip install gdown &&     if [ "0" != "1" ]; then         python -c "import gdown; url = 'https://drive.google.com/uc?id=17G_k65N_6yFFZ2O-  18.2s 
 => [13/20] COPY nemo_skills/code_execution/local_sandbox/local_sandbox_server.py /app/main.py                                                                                0.1s 
 => [14/20] COPY dockerfiles/sandbox/nginx.conf.template /etc/nginx/nginx.conf.template                                                                                       0.1s 
 => [15/20] COPY dockerfiles/sandbox/start-with-nginx.sh /start-with-nginx.sh                                                                                                 0.1s 
 => [16/20] COPY dockerfiles/sandbox/block_network.c /tmp/block_network.c                                                                                                     0.1s 
 => [17/20] RUN gcc -shared -fPIC -o /usr/lib/libblock_network.so /tmp/block_network.c -ldl &&     rm /tmp/block_network.c &&     echo "Built libblock_network.so for networ  0.3s 
 => [18/20] RUN chmod +x /start-with-nginx.sh                                                                                                                                 0.2s 
 => [19/20] WORKDIR /app                                                                                                                                                      0.1s
 => [20/20] RUN echo "uwsgi_read_timeout 14400s;" > /etc/nginx/conf.d/custom_timeout.conf                                                                                     0.2s
 => exporting to image                                                                                                                                                       28.2s
 => => exporting layers                                                                                                                                                      28.1s
 => => writing image sha256:83408427259164613ea42b3dda1d003c87d0b3f0d961b36be17b0af97b2bd92b                                                                                  0.0s
 => => naming to docker.io/library/nemo-skills-sandbox:0.0                                                                                                                    0.0s

 1 warning found (use docker --debug to expand):
 - JSONArgsRecommended: JSON arguments recommended for CMD to prevent unintended behavior related to OS signals (line 153)

Summary by CodeRabbit

Release Notes

  • New Features
    • Added STEM-related Python libraries to the sandbox environment for scientific and technical computing support.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

📝 Walkthrough

Walkthrough

This pull request introduces STEM library support to the sandbox environment by adding a new requirements file and updating the Dockerfile to install these dependencies during the build process. The changes include copying the requirements file, setting environment variables, installing the uv package manager, and installing all STEM-related dependencies.

Changes

Cohort / File(s) Summary
Docker build configuration
dockerfiles/Dockerfile.sandbox
Adds workflow to install STEM libraries: copies requirements/stem.txt to /app/stem_requirements.txt, sets pip and Python environment variables, installs uv package manager, upgrades pip via uv, and installs STEM requirements. Preserves existing data download and execution logic.
Dependencies
requirements/stem.txt
New file containing extensive list of Python packages for STEM-related functionality, one package per line.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Dockerfile.sandbox - Verify the installation sequence, environment variable configurations, and compatibility of uv package manager integration with existing build steps
  • stem.txt - Confirm package list completeness and absence of conflicting dependencies with existing requirements

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding STEM-related dependencies to the Python sandbox environment.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch sandbox-add-stem

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
requirements/stem.txt (2)

1-201: Consider pinning package versions for reproducibility and build stability.

All packages are specified without version constraints. This can lead to:

  • Non-reproducible Docker builds (different versions may be installed on different build dates)
  • Incompatible transitive dependencies
  • Difficult troubleshooting if a package release introduces breaking changes

For production images, version pinning is a best practice.

Example approach:

- arxiv
+ arxiv==2.1.0
- beautifulsoup4
+ beautifulsoup4==4.12.2

Alternatively, use a lock file approach (e.g., pip-compile, poetry.lock, or uv.lock) to generate pinned versions from a higher-level specification.


1-201: Document the rationale for the STEM dependency list.

The file contains 200+ packages spanning many domains (astronomy, chemistry, biology, mathematics, etc.). Consider adding a comment or README explaining:

  • What STEM functionality is expected to be available in the sandbox
  • Whether this full list is required for all use cases, or if it should be modularized
  • How to maintain and update this list over time

This will help future developers understand the scope and avoid accumulating unused dependencies.

dockerfiles/Dockerfile.sandbox (1)

72-80: Minor: Consider consistency in package manager usage.

The Dockerfile uses uv for STEM requirements (lines 78–80) but pip directly for gdown (line 85). While both work, using uv pip consistently would be slightly cleaner:

  RUN mkdir -p /data && uv pip install gdown && \

This is optional and does not affect functionality.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bbba89 and b084f76.

📒 Files selected for processing (2)
  • dockerfiles/Dockerfile.sandbox (1 hunks)
  • requirements/stem.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unit-tests
  • GitHub Check: pre-commit
🔇 Additional comments (2)
dockerfiles/Dockerfile.sandbox (1)

68-80: STEM installation flow is well-structured, but will fail due to invalid packages in requirements.

The Dockerfile changes correctly:

  • Separate STEM requirements from base Python dependencies
  • Set environment variables for reproducible and efficient pip operations
  • Use uv for fast, reliable package installation
  • Order operations correctly (install uv → upgrade pip → install requirements)

However, the build will fail because requirements/stem.txt contains invalid packages (e.g., HTMLParser at line 62), so this Dockerfile change is dependent on fixing the requirements file first.

Ensure that all invalid packages are removed from requirements/stem.txt before merging, otherwise the Docker build will fail with a pip error like:

ERROR: Could not find a version that satisfies the requirement HTMLParser
requirements/stem.txt (1)

31-31: All specified packages exist on PyPI and are legitimate.

The packages cited as potentially invalid are confirmed to be real PyPI packages:

  • data (v0.4) — handles text/bytes/file-like inputs uniformly
  • diff (v2023.12.6) — defines a difference protocol for objects
  • ocl (v0.12) — compiles Python code to C99/OpenCL/JavaScript
  • quote (v3.0) — Goodreads Quote API wrapper
  • utils (v1.0.2) — utility functions library

While these names are common, each corresponds to a specific, functional package on PyPI. No changes are needed.

grep
gutenbergpy
html5lib
HTMLParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the requirements/stem.txt file to see the context
fd -type f -name "stem.txt" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232


🌐 Web query:

HTMLParser PyPI package

💡 Result:

The PyPI package "HTMLParser" (package name: HTMLParser) is a small backport of Python 2.7’s HTMLParser for older Python 2.x (last release 0.0.2 in 2013). It is not a modern or maintained parser for current Python 3 — for Python 3 use the stdlib html.parser.HTMLParser or, for robust HTML handling, libraries like BeautifulSoup (bs4) or lxml. Sources: PyPI package page and Python stdlib docs. [1] [2]

Sources

  • HTMLParser (PyPI): backport of HTMLParser from Python 2.7. [1]
  • html.parser — Python standard library (HTMLParser class). [2]

🏁 Script executed:

# Check if requirements directory exists and show its contents
find . -type f -name "*.txt" -path "*/requirements/*" 2>/dev/null | sort

Repository: NVIDIA-NeMo/Skills

Length of output: 229


🏁 Script executed:

# Read the requirements/stem.txt file with line numbers
cat -n requirements/stem.txt | head -70

Repository: NVIDIA-NeMo/Skills

Length of output: 1175


🏁 Script executed:

# Also check for Python version requirements in setup files
fd -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" \) | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232


🏁 Script executed:

# Check for Python version specifications
grep -r "python_requires" . --include="*.py" --include="*.toml" --include="*.cfg" 2>/dev/null | head -10

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Check for .python-version or similar files
cat .python-version 2>/dev/null || cat runtime.txt 2>/dev/null || echo "No version file found"

Repository: NVIDIA-NeMo/Skills

Length of output: 83


🏁 Script executed:

# Look for Dockerfile or similar to understand build context
find . -type f -name "Dockerfile*" | head -3

Repository: NVIDIA-NeMo/Skills

Length of output: 165


🏁 Script executed:

# Check the Dockerfile to understand the build context
cat dockerfiles/Dockerfile.nemo-skills | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 2978


🏁 Script executed:

# Try to determine if HTMLParser package can be installed (simulate pip check)
# We can use pip to query package info
python3 << 'EOF'
import urllib.request
import json

# Check PyPI for HTMLParser package details
try:
    url = "https://pypi.org/pypi/HTMLParser/json"
    with urllib.request.urlopen(url, timeout=5) as response:
        data = json.loads(response.read().decode())
        print(f"Package: {data['info']['name']}")
        print(f"Version: {data['info']['version']}")
        print(f"Summary: {data['info']['summary']}")
        print(f"Home Page: {data['info']['home_page']}")
        print(f"Requires Python: {data['info']['requires_python']}")
        print(f"Last updated: {data['info']['last_updated']}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: NVIDIA-NeMo/Skills

Length of output: 148


Remove HTMLParser—it's an unmaintained Python 2 backport unsuitable for Python 3.10.

HTMLParser (line 62) is available on PyPI but as version 0.0.2 (last released 2013), a backport of Python 2.7's HTMLParser. The Dockerfile uses Python 3.10, making this package redundant and inappropriate. For HTML parsing, use the built-in html.parser module from the standard library, or rely on the already-present alternatives: beautifulsoup4 (line 8) or html5lib (line 61).

- HTMLParser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HTMLParser
🤖 Prompt for AI Agents
requirements/stem.txt line 62: remove the third-party "HTMLParser" entry because
it's an unmaintained Python 2 backport and redundant for Python 3.10; delete
that line from the requirements file and, if any project code imports HTMLParser
from a package, replace it to use the standard-library html.parser or the
existing beautifulsoup4/html5lib dependencies instead.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@gwarmstrong gwarmstrong merged commit 28f622e into main Dec 11, 2025
5 checks passed
@gwarmstrong gwarmstrong deleted the sandbox-add-stem branch December 11, 2025 18:34
gwarmstrong added a commit that referenced this pull request Dec 11, 2025
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 12, 2025
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants