Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions dockerfiles/Dockerfile.sandbox
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,24 @@ ENV PATH="/lean4/my_project:$PATH"
COPY requirements/code_execution.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt


# Install STEM related libraries
COPY requirements/stem.txt /app/stem_requirements.txt


# Speed/size/env hygiene
ENV PIP_DISABLE_PIP_VERSION_CHECK=1 \
UV_SYSTEM_PYTHON=1 \
PATH="/root/.local/bin:${PATH}"


# Install uv (adds to ~/.local/bin), then install deps
RUN if [ "$GITHUB_CI" != "1" ]; then \
curl -LsSf https://astral.sh/uv/install.sh | sh && \
uv pip install --upgrade pip && \
uv pip install -r /app/stem_requirements.txt --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu; \
fi

# For scicode eval - create data directory and download test data
# Set GITHUB_CI=1 build arg to skip download (useful for CI when download fails)
# If skipped, scicode evaluations will fail unless the file is manually mounted
Expand Down
201 changes: 201 additions & 0 deletions requirements/stem.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
arxiv
ascii_magic
astronomy
astroquery
atc
basc
bbn
beautifulsoup4
bible
biopython
bioservices
bson
cactus
cantera
certifi
chardet
chemicals
chemics
chemlib
chempy
chemspipy
chess
cloudscraper
convertdate
conway
coxeter
Crypto
cssselect
cvxopt
cvxpy
data
datasets
ddc
depmap
diff
dill
dipy
dp_accounting
duckduckgo_search
easyocr
ecdsa
enchant
ephem
ete3
feedparser
filetype
fiona
fishpy
fontTools
galois
gemmi
geocoder
geonamescache
geopandas
geopy
gita
gmpy2
googletrans
grep
gutenbergpy
html5lib
HTMLParser
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the requirements/stem.txt file to see the context
fd -type f -name "stem.txt" | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232


🌐 Web query:

HTMLParser PyPI package

💡 Result:

The PyPI package "HTMLParser" (package name: HTMLParser) is a small backport of Python 2.7’s HTMLParser for older Python 2.x (last release 0.0.2 in 2013). It is not a modern or maintained parser for current Python 3 — for Python 3 use the stdlib html.parser.HTMLParser or, for robust HTML handling, libraries like BeautifulSoup (bs4) or lxml. Sources: PyPI package page and Python stdlib docs. [1] [2]

Sources

  • HTMLParser (PyPI): backport of HTMLParser from Python 2.7. [1]
  • html.parser — Python standard library (HTMLParser class). [2]

🏁 Script executed:

# Check if requirements directory exists and show its contents
find . -type f -name "*.txt" -path "*/requirements/*" 2>/dev/null | sort

Repository: NVIDIA-NeMo/Skills

Length of output: 229


🏁 Script executed:

# Read the requirements/stem.txt file with line numbers
cat -n requirements/stem.txt | head -70

Repository: NVIDIA-NeMo/Skills

Length of output: 1175


🏁 Script executed:

# Also check for Python version requirements in setup files
fd -type f \( -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" \) | head -5

Repository: NVIDIA-NeMo/Skills

Length of output: 232


🏁 Script executed:

# Check for Python version specifications
grep -r "python_requires" . --include="*.py" --include="*.toml" --include="*.cfg" 2>/dev/null | head -10

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Check for .python-version or similar files
cat .python-version 2>/dev/null || cat runtime.txt 2>/dev/null || echo "No version file found"

Repository: NVIDIA-NeMo/Skills

Length of output: 83


🏁 Script executed:

# Look for Dockerfile or similar to understand build context
find . -type f -name "Dockerfile*" | head -3

Repository: NVIDIA-NeMo/Skills

Length of output: 165


🏁 Script executed:

# Check the Dockerfile to understand the build context
cat dockerfiles/Dockerfile.nemo-skills | head -100

Repository: NVIDIA-NeMo/Skills

Length of output: 2978


🏁 Script executed:

# Try to determine if HTMLParser package can be installed (simulate pip check)
# We can use pip to query package info
python3 << 'EOF'
import urllib.request
import json

# Check PyPI for HTMLParser package details
try:
    url = "https://pypi.org/pypi/HTMLParser/json"
    with urllib.request.urlopen(url, timeout=5) as response:
        data = json.loads(response.read().decode())
        print(f"Package: {data['info']['name']}")
        print(f"Version: {data['info']['version']}")
        print(f"Summary: {data['info']['summary']}")
        print(f"Home Page: {data['info']['home_page']}")
        print(f"Requires Python: {data['info']['requires_python']}")
        print(f"Last updated: {data['info']['last_updated']}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: NVIDIA-NeMo/Skills

Length of output: 148


Remove HTMLParser—it's an unmaintained Python 2 backport unsuitable for Python 3.10.

HTMLParser (line 62) is available on PyPI but as version 0.0.2 (last released 2013), a backport of Python 2.7's HTMLParser. The Dockerfile uses Python 3.10, making this package redundant and inappropriate. For HTML parsing, use the built-in html.parser module from the standard library, or rely on the already-present alternatives: beautifulsoup4 (line 8) or html5lib (line 61).

- HTMLParser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
HTMLParser
🤖 Prompt for AI Agents
requirements/stem.txt line 62: remove the third-party "HTMLParser" entry because
it's an unmaintained Python 2 backport and redundant for Python 3.10; delete
that line from the requirements file and, if any project code imports HTMLParser
from a package, replace it to use the standard-library html.parser or the
existing beautifulsoup4/html5lib dependencies instead.

huggingface_hub
iapws
imageio
importlib_metadata
importlib_resources
IndianConstitution
indic_nlp_library
inflect
ipywidgets
isbnlib
jax
jdcal
language_tool_python
law
lie
LIEGenTools
lifelines
lingpy
lxml
matplotlib
mendeleev
mido
mimic
mingus
mip
molmass
molparse
molvs
music21
mygene
myvariant
networkx
nibabel
nltk
nuclear
num2words
numba
numpy
numpy_financial
ocl
open_tamil
opencv-python
openmc_data
openmm
openpyxl
optopy
ortools
osmnx
packaging
paddleocr
pandapower
pandas
pandas_datareader
pdf2image
pdfminer
pdfplumber
pdfreader
piexif
pint
planarity
polyhedron
pretty_midi
pronouncing
ptable
py3Dmol
pybel
pyclipper
pycosat
pycountry
pydataset
pyequion
pyfiglet
pyfluids
pyhull
pylaw
pymatgen
pymcm
pymongo
PyMuPDF
PyPDF2
pyromat
pysam
pysat
pyscf
pysmiles
pyswisseph
pytamil
pyteomics
pytesseract
python-igraph
python-snappy
qiskit
quote
quotes
qutip
ragas
rdflib
rdkit
requests
requests_cache
sanskrit
sanskrit_parser
scipy
seaborn
shakespeare
shapely
sieve
skimpy
skyfield
spacy
spherogram
statistic
statsmodels
steam
stim
sympy
tamil
tensorflow
tensorflow_datasets
thermo
thermochem
thermopy
thermostat
tinycss2
torch
torchvision
transformers
trimesh
typ
utils
vedas
wbdata
webcolors
wikidata
wikipedia>=1.4.0
wikipedia_api
wordfreq
wptools
yfinance