add stem dependencies in main python sandbox by jiacheng-xu · Pull Request #1099 · NVIDIA-NeMo/Skills

jiacheng-xu · 2025-12-11T07:39:08Z

I dropped DDGS (package for duckduckgo) but I kept requests (a general library for requesting info). @ekmb let me know if you want to drop this.

I ran it locally. Please check if it works on CI/CD.

(base) aiapps-jcxu at ~/code/NeMo-Skills ±(sandbox-add-stem) ✗ ❯ docker build -t nemo-skills-sandbox:0.0 -f dockerfiles/Dockerfile.sandbox .
[+] Building 121.7s (25/25) FINISHED                                                                                                                                docker:default
 => [internal] load build definition from Dockerfile.sandbox                                                                                                                  0.0s
 => => transferring dockerfile: 6.55kB                                                                                                                                        0.0s
 => [internal] load metadata for docker.io/tiangolo/uwsgi-nginx-flask:python3.10                                                                                              0.3s
 => [internal] load .dockerignore                                                                                                                                             0.0s
 => => transferring context: 2B                                                                                                                                               0.0s
 => [internal] load build context                                                                                                                                             0.0s
 => => transferring context: 591B                                                                                                                                             0.0s
 => [ 1/20] FROM docker.io/tiangolo/uwsgi-nginx-flask:python3.10@sha256:7237e68a5a6023ae8b7e046bc8114f29a2bb3b6906c40ad3c7e553021f5ed52e                                      0.0s
 => CACHED [ 2/20] RUN apt-get update &&     apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev &&     ARCH="amd64" &&     case "$ARCH" in         a  0.0s
 => CACHED [ 3/20] RUN curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh -s -- -y &&     /root/.elan/bin/elan toolchain install leanprove  0.0s
 => CACHED [ 4/20] RUN mkdir -p /lean4 && cd /lean4 &&     /root/.elan/bin/lake new my_project &&     cd my_project &&     echo 'leanprover/lean4:v4.12.0' > lean-toolchain   0.0s
 => CACHED [ 5/20] RUN cd /lean4/my_project &&     /root/.elan/bin/lake exe cache get &&     /root/.elan/bin/lake build                                                       0.0s
 => CACHED [ 6/20] COPY requirements/code_execution.txt /app/requirements.txt                                                                                                 0.0s
 => CACHED [ 7/20] RUN pip install --no-cache-dir -r /app/requirements.txt                                                                                                    0.0s
 => CACHED [ 8/20] COPY requirements/stem.txt /app/stem_requirements.txt                                                                                                      0.0s
 => [ 9/20] RUN curl -LsSf https://astral.sh/uv/install.sh | sh                                                                                                               0.9s
 => [10/20] RUN uv pip install --upgrade pip                                                                                                                                  0.9s
 => [11/20] RUN uv pip install -r /app/stem_requirements.txt                                                                                                                 72.0s 
 => [12/20] RUN mkdir -p /data && pip install gdown &&     if [ "0" != "1" ]; then         python -c "import gdown; url = 'https://drive.google.com/uc?id=17G_k65N_6yFFZ2O-  18.2s 
 => [13/20] COPY nemo_skills/code_execution/local_sandbox/local_sandbox_server.py /app/main.py                                                                                0.1s 
 => [14/20] COPY dockerfiles/sandbox/nginx.conf.template /etc/nginx/nginx.conf.template                                                                                       0.1s 
 => [15/20] COPY dockerfiles/sandbox/start-with-nginx.sh /start-with-nginx.sh                                                                                                 0.1s 
 => [16/20] COPY dockerfiles/sandbox/block_network.c /tmp/block_network.c                                                                                                     0.1s 
 => [17/20] RUN gcc -shared -fPIC -o /usr/lib/libblock_network.so /tmp/block_network.c -ldl &&     rm /tmp/block_network.c &&     echo "Built libblock_network.so for networ  0.3s 
 => [18/20] RUN chmod +x /start-with-nginx.sh                                                                                                                                 0.2s 
 => [19/20] WORKDIR /app                                                                                                                                                      0.1s
 => [20/20] RUN echo "uwsgi_read_timeout 14400s;" > /etc/nginx/conf.d/custom_timeout.conf                                                                                     0.2s
 => exporting to image                                                                                                                                                       28.2s
 => => exporting layers                                                                                                                                                      28.1s
 => => writing image sha256:83408427259164613ea42b3dda1d003c87d0b3f0d961b36be17b0af97b2bd92b                                                                                  0.0s
 => => naming to docker.io/library/nemo-skills-sandbox:0.0                                                                                                                    0.0s

 1 warning found (use docker --debug to expand):
 - JSONArgsRecommended: JSON arguments recommended for CMD to prevent unintended behavior related to OS signals (line 153)

Summary by CodeRabbit

Release Notes

New Features
- Added STEM-related Python libraries to the sandbox environment for scientific and technical computing support.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>

coderabbitai · 2025-12-11T07:44:40Z

📝 Walkthrough

Walkthrough

This pull request introduces STEM library support to the sandbox environment by adding a new requirements file and updating the Dockerfile to install these dependencies during the build process. The changes include copying the requirements file, setting environment variables, installing the uv package manager, and installing all STEM-related dependencies.

Changes

Cohort / File(s)	Summary
Docker build configuration `dockerfiles/Dockerfile.sandbox`	Adds workflow to install STEM libraries: copies `requirements/stem.txt` to `/app/stem_requirements.txt`, sets pip and Python environment variables, installs `uv` package manager, upgrades pip via uv, and installs STEM requirements. Preserves existing data download and execution logic.
Dependencies `requirements/stem.txt`	New file containing extensive list of Python packages for STEM-related functionality, one package per line.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Dockerfile.sandbox - Verify the installation sequence, environment variable configurations, and compatibility of uv package manager integration with existing build steps
stem.txt - Confirm package list completeness and absence of conflicting dependencies with existing requirements

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding STEM-related dependencies to the Python sandbox environment.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sandbox-add-stem

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

requirements/stem.txt (2)
1-201: Consider pinning package versions for reproducibility and build stability.

All packages are specified without version constraints. This can lead to:

Non-reproducible Docker builds (different versions may be installed on different build dates)

Incompatible transitive dependencies

Difficult troubleshooting if a package release introduces breaking changes

For production images, version pinning is a best practice.

Example approach:
- arxiv
+ arxiv==2.1.0
- beautifulsoup4
+ beautifulsoup4==4.12.2
Alternatively, use a lock file approach (e.g., pip-compile, poetry.lock, or uv.lock) to generate pinned versions from a higher-level specification.

1-201: Document the rationale for the STEM dependency list.

The file contains 200+ packages spanning many domains (astronomy, chemistry, biology, mathematics, etc.). Consider adding a comment or README explaining:

What STEM functionality is expected to be available in the sandbox

Whether this full list is required for all use cases, or if it should be modularized

How to maintain and update this list over time

This will help future developers understand the scope and avoid accumulating unused dependencies.
dockerfiles/Dockerfile.sandbox (1)
72-80: Minor: Consider consistency in package manager usage.

The Dockerfile uses uv for STEM requirements (lines 78–80) but pip directly for gdown (line 85). While both work, using uv pip consistently would be slightly cleaner:
  RUN mkdir -p /data && uv pip install gdown && \
This is optional and does not affect functionality.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0bbba89 and b084f76.

📒 Files selected for processing (2)

dockerfiles/Dockerfile.sandbox (1 hunks)
requirements/stem.txt (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (2)

dockerfiles/Dockerfile.sandbox (1)
68-80: STEM installation flow is well-structured, but will fail due to invalid packages in requirements.

The Dockerfile changes correctly:

Separate STEM requirements from base Python dependencies

Set environment variables for reproducible and efficient pip operations

Use uv for fast, reliable package installation

Order operations correctly (install uv → upgrade pip → install requirements)

However, the build will fail because requirements/stem.txt contains invalid packages (e.g., HTMLParser at line 62), so this Dockerfile change is dependent on fixing the requirements file first.

Ensure that all invalid packages are removed from requirements/stem.txt before merging, otherwise the Docker build will fail with a pip error like:
ERROR: Could not find a version that satisfies the requirement HTMLParser
requirements/stem.txt (1)

31-31: All specified packages exist on PyPI and are legitimate.

The packages cited as potentially invalid are confirmed to be real PyPI packages:

data (v0.4) — handles text/bytes/file-like inputs uniformly

diff (v2023.12.6) — defines a difference protocol for objects

ocl (v0.12) — compiles Python code to C99/OpenCL/JavaScript

quote (v3.0) — Goodreads Quote API wrapper

utils (v1.0.2) — utility functions library

While these names are common, each corresponds to a specific, functional package on PyPI. No changes are needed.

coderabbitai · 2025-12-11T07:44:43Z

requirements/stem.txt

+grep
+gutenbergpy
+html5lib
+HTMLParser


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the requirements/stem.txt file to see the context fd -type f -name "stem.txt" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232

🌐 Web query:

HTMLParser PyPI package

💡 Result:

The PyPI package "HTMLParser" (package name: HTMLParser) is a small backport of Python 2.7’s HTMLParser for older Python 2.x (last release 0.0.2 in 2013). It is not a modern or maintained parser for current Python 3 — for Python 3 use the stdlib html.parser.HTMLParser or, for robust HTML handling, libraries like BeautifulSoup (bs4) or lxml. Sources: PyPI package page and Python stdlib docs. [1] [2]

Sources

HTMLParser (PyPI): backport of HTMLParser from Python 2.7. [1]

html.parser — Python standard library (HTMLParser class). [2]

🏁 Script executed:

# Check if requirements directory exists and show its contents find . -type f -name "*.txt" -path "*/requirements/*" 2>/dev/null | sort

Repository: NVIDIA-NeMo/Skills

Length of output: 229

🏁 Script executed:

# Read the requirements/stem.txt file with line numbers cat -n requirements/stem.txt | head -70

Repository: NVIDIA-NeMo/Skills

Length of output: 1175

🏁 Script executed:

# Also check for Python version requirements in setup files fd -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" \) | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232

🏁 Script executed:

# Check for Python version specifications grep -r "python_requires" . --include="*.py" --include="*.toml" --include="*.cfg" 2>/dev/null | head -10

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Check for .python-version or similar files cat .python-version 2>/dev/null || cat runtime.txt 2>/dev/null || echo "No version file found"

Repository: NVIDIA-NeMo/Skills

Length of output: 83

🏁 Script executed:

# Look for Dockerfile or similar to understand build context find . -type f -name "Dockerfile*" | head -3

Repository: NVIDIA-NeMo/Skills

Length of output: 165

🏁 Script executed:

# Check the Dockerfile to understand the build context cat dockerfiles/Dockerfile.nemo-skills | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 2978

🏁 Script executed:

# Try to determine if HTMLParser package can be installed (simulate pip check) # We can use pip to query package info python3 << 'EOF' import urllib.request import json # Check PyPI for HTMLParser package details try: url = "https://pypi.org/pypi/HTMLParser/json" with urllib.request.urlopen(url, timeout=5) as response: data = json.loads(response.read().decode()) print(f"Package: {data['info']['name']}") print(f"Version: {data['info']['version']}") print(f"Summary: {data['info']['summary']}") print(f"Home Page: {data['info']['home_page']}") print(f"Requires Python: {data['info']['requires_python']}") print(f"Last updated: {data['info']['last_updated']}") except Exception as e: print(f"Error: {e}") EOF

Repository: NVIDIA-NeMo/Skills

Length of output: 148

Remove HTMLParser—it's an unmaintained Python 2 backport unsuitable for Python 3.10.

HTMLParser (line 62) is available on PyPI but as version 0.0.2 (last released 2013), a backport of Python 2.7's HTMLParser. The Dockerfile uses Python 3.10, making this package redundant and inappropriate. For HTML parsing, use the built-in html.parser module from the standard library, or rely on the already-present alternatives: beautifulsoup4 (line 8) or html5lib (line 61).

- HTMLParser

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

HTMLParser

🤖 Prompt for AI Agents

requirements/stem.txt line 62: remove the third-party "HTMLParser" entry because it's an unmaintained Python 2 backport and redundant for Python 3.10; delete that line from the requirements file and, if any project code imports HTMLParser from a package, replace it to use the standard-library html.parser or the existing beautifulsoup4/html5lib dependencies instead.

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

add stem dependencies in main python sandbox

b084f76

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>

jiacheng-xu requested review from Kipok, ekmb and gwarmstrong December 11, 2025 07:39

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

gwarmstrong added 2 commits December 11, 2025 09:50

ENH maint sandbox only use cpu wheels

881ab1a

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT do not install stem requirements on github ci

60c5dd7

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong approved these changes Dec 11, 2025

View reviewed changes

gwarmstrong merged commit 28f622e into main Dec 11, 2025
5 checks passed

gwarmstrong deleted the sandbox-add-stem branch December 11, 2025 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add stem dependencies in main python sandbox#1099

add stem dependencies in main python sandbox#1099
gwarmstrong merged 3 commits intomainfrom
sandbox-add-stem

jiacheng-xu commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jiacheng-xu commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 11, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jiacheng-xu commented Dec 11, 2025 •

edited by coderabbitai bot

Loading