feat: migrate sandbox from uwsgi to gunicorn by gwarmstrong · Pull Request #1232 · NVIDIA-NeMo/Skills

gwarmstrong · 2026-02-11T21:00:07Z

Summary

Replace tiangolo/uwsgi-nginx-flask base image with python:3.10-slim + nginx + gunicorn
Remove all uwsgi configuration (UWSGI_PROCESSES, UWSGI_CHEAPER, UWSGI_CPU_AFFINITY, uwsgi_read_timeout)
Add deprecation warnings for old uwsgi env vars in start-with-nginx.sh
Fix SIGINT handler in shell_worker for gunicorn compatibility
Update icpc/ioi dataset env vars to use GUNICORN_WORKERS
Test fixes for port conflicts and sys.executable usage

Depends on #1231 (lockfiles). Split from #1226.

Summary by CodeRabbit

Release Notes

Chores
- Migrated sandbox server infrastructure from uWSGI to Gunicorn for improved performance and reliability.
- Implemented dependency locking for consistent and reproducible builds.
- Added Lean 4 toolchain support.
- Enhanced port management with pre-checks and conflict detection during startup.
Bug Fixes
- Improved signal handling in shell worker for better interrupt handling.

… builds Add sandbox-core.lock and sandbox.lock for pinned dependency installation. Switch Dockerfile.sandbox from pip install to uv install from lockfiles. Add flask and gunicorn to code_execution.txt requirements. Signed-off-by: George Armstrong <georgea@nvidia.com>

Replace tiangolo/uwsgi-nginx-flask with python:3.10-slim and gunicorn. Add nginx as system package, remove all uwsgi configuration (UWSGI_PROCESSES, UWSGI_CHEAPER, UWSGI_CPU_AFFINITY, uwsgi_read_timeout). Add deprecation warnings for old uwsgi env vars in start-with-nginx.sh. Fix SIGINT handler in shell_worker for gunicorn compatibility. Update icpc/ioi dataset env vars to use GUNICORN_WORKERS. Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2026-02-11T21:02:55Z

📝 Walkthrough

Walkthrough

This pull request migrates the sandbox infrastructure from uWSGI to Gunicorn, updating the Docker build configuration to use a base Python image with lock-file-based dependency management, refactoring startup scripts to launch Gunicorn workers instead of uWSGI, and updating related configuration files across dataset and test modules.

Changes

Cohort / File(s)	Summary
Docker Build System `dockerfiles/Dockerfile.sandbox`	Replaced base image from tiangolo/uwsgi-nginx-flask to python:3.10-slim; added lock-file-based dependency management via requirements/sandbox-core.lock and requirements/sandbox.lock; introduced PyPy3 installation with architecture detection; added Lean 4 toolchain setup; removed legacy UWSGI environment variables; changed NGINX_PORT configuration.
Network Blocking `dockerfiles/sandbox/block_network.c`	Updated comment to reference Gunicorn instead of uWSGI; no functional changes to domain socket handling.
Worker Startup `dockerfiles/sandbox/start-with-nginx.sh`	Replaced uWSGI with Gunicorn worker launch logic; added port_is_free utility and pre-startup port availability checks; updated deprecation warnings for old uWSGI variables; revised health checks and multi-node coordination for Gunicorn endpoints; changed nginx shutdown approach.
Local Sandbox Execution `nemo_skills/code_execution/local_sandbox/local_sandbox_server.py`	Added SIGINT handler restoration after IPython TerminalInteractiveShell creation; updated contextual comment to reference Gunicorn instead of uWSGI.
Docker Execution Configuration `nemo_skills/code_execution/local_sandbox/start_local_sandbox.sh`	Replaced UWSGI environment variable propagation (UWSGI_CPU_AFFINITY, UWSGI_PROCESSES) with GUNICORN_WORKERS in docker run invocation.
Dataset Configurations `nemo_skills/dataset/icpc/__init__.py`, `nemo_skills/dataset/ioi/__init__.py`	Replaced SANDBOX_ENV_VARS entries: removed UWSGI_PROCESSES, UWSGI_CPU_AFFINITY, and UWSGI_CHEAPER; added GUNICORN_WORKERS=1024.
Dependencies and Testing `requirements/code_execution.txt`, `tests/test_sandbox_network_blocking.py`	Added flask and gunicorn packages to code_execution requirements; added SANDBOX_WORKER_BASE_PORT environment variable to sandbox tests; updated subprocess execution to use sys.executable instead of hard-coded python3.

Sequence Diagram

sequenceDiagram
    participant Builder as Docker Builder
    participant Nginx as Nginx (Reverse Proxy)
    participant Health as Health Checker
    participant Worker as Gunicorn Worker
    participant App as Application
    
    Builder->>Nginx: Configure with upstream ports
    Builder->>Worker: Start worker with pre-check
    
    Note over Worker: port_is_free()<br/>checks availability
    
    alt Port Available
        Worker->>Worker: Bind to worker port
        Worker->>App: Launch Gunicorn process
        App->>App: Start listening on port
    else Port Conflict
        Worker->>Worker: Retry with offset
    end
    
    Health->>Worker: Health check request
    Worker-->>Health: Response (Gunicorn endpoint)
    Health->>Nginx: Register upstream
    
    Nginx->>Worker: Route requests
    Worker->>App: Forward to application
    App-->>Worker: Response
    Worker-->>Nginx: Return to client

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Adding support for arm64 containers #856: Modifies Dockerfile multi-architecture support (TARGETARCH/ARCH handling) similar to TARGETARCH introduction in this PR.
Add multi-node sandbox support for SLURM clusters #1218: Updates dockerfiles/sandbox/start-with-nginx.sh with overlapping changes to worker startup and port/upstream handling (multi-node SLURM coordination in that PR vs. Gunicorn migration here).
Network Blocking for Sandbox Code Execution #1071: Touches sandbox network-blocking files (block_network.c, start-with-nginx.sh, local_sandbox_server.py) that were modified or reorganized in this PR.

Suggested reviewers

activatedgeek
Kipok

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main change: migrating the sandbox runtime from uwsgi to gunicorn, which is the primary focus across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

_{11 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T21:04:40Z

dockerfiles/sandbox/start-with-nginx.sh

+            --backlog 100 \
+            --access-logfile /var/log/worker${i}.log \
+            --error-logfile /var/log/worker${i}.log \
+            --log-level info \


Trailing backslash creates invalid shell syntax

Suggested change

--log-level info \

--log-level info

greptile-apps · 2026-02-11T21:04:41Z

dockerfiles/sandbox/start-with-nginx.sh

+    for retry in $(seq 0 $MAX_STARTUP_RETRIES); do
+        if port_is_free $port; then
+            break
+        fi
+        CONFLICT_COUNT=$((CONFLICT_COUNT + 1))
+        port=$((port + PORT_INCREMENT))
+        echo "[$_H] Port conflict on worker $i, trying port $port"
    done


Port conflict retry exhaustion will spawn worker on occupied port. If all MAX_STARTUP_RETRIES iterations fail to find a free port, the loop exits without breaking, and line 434 spawns the worker with an occupied port, causing it to fail. Add a check after the loop:

if ! port_is_free $port; then echo "[$_H] ERROR: Worker $i could not find a free port after $MAX_STARTUP_RETRIES retries" cleanup exit 1 fi

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@dockerfiles/Dockerfile.sandbox`:
- Around line 34-36: Update the RUN apt-get install invocation to avoid
installing recommended packages and drop the unused net-tools package: modify
the RUN command that performs apt-get update && apt-get install -y nginx curl
git net-tools bzip2 build-essential libseccomp-dev && ... so it adds the
--no-install-recommends flag to apt-get install and removes net-tools from the
package list; keep the existing cleanup (rm -rf /var/lib/apt/lists/*
/var/cache/apt/archives/*) unchanged.

In `@dockerfiles/sandbox/start-with-nginx.sh`:
- Around line 134-154: The bare redirection in start_worker_fast (">
/var/log/worker${i}.log") should be replaced with a portable no-op truncation;
change that line to use the shell builtin no-op with redirection (e.g., ": >
/var/log/worker${i}.log") so the log file is truncated safely and ShellCheck
SC2188 is resolved while keeping the rest of start_worker_fast (gunicorn
invocation, redirects, and echo "$!:$WORKER_PORT") unchanged.

In `@nemo_skills/dataset/icpc/__init__.py`:
- Around line 25-29: The SANDBOX_ENV_VARS change replaced uWSGI's lazy-scaling
UWSGI_CHEAPER with a fixed "GUNICORN_WORKERS=1024", which forces 1024 processes
at startup; update the SANDBOX_ENV_VARS constant to avoid spawning all workers
immediately by either (a) restoring a uWSGI lazy-scaling env var (e.g. set/keep
UWSGI_CHEAPER instead of GUNICORN_WORKERS) or (b) switch Gunicorn to a
threaded/greenlet worker model and lower process count: replace
"GUNICORN_WORKERS=1024" with a conservative process count (e.g. NUM_WORKERS
small), add "GUNICORN_WORKER_CLASS=gthread" or "GUNICORN_WORKER_CLASS=gevent"
and add a "GUNICORN_THREADS=<desired_thread_count>" env var to maintain
concurrency without pre-forking 1024 processes; apply the same change in
nemo_skills/dataset/ioi/__init__.py and ensure code is compatible with the
chosen worker_class before merging.

In `@tests/test_sandbox_network_blocking.py`:
- Around line 41-42: The test currently calls get_free_port twice with
strategy="random" for variables port and worker_base_port which can collide;
call get_free_port the second time with exclude containing the first port (use
get_free_port(..., exclude=[port])) so worker_base_port is chosen distinct from
port (refer to variables port and worker_base_port and the get_free_port
function's exclude parameter).

🧹 Nitpick comments (2)

requirements/code_execution.txt (1)

16-19: Infrastructure dependencies mixed with LLM-facing packages.

The file header (Line 16) says these are "requirements for code execution - put all packages that LLM should have access to in here," but flask and gunicorn are infrastructure/server dependencies, not packages the LLM needs. Consider either updating the comment to reflect the dual purpose, or splitting server dependencies into a separate requirements file so the lock-file compilation source is clearer.
dockerfiles/sandbox/start-with-nginx.sh (1)
421-437: Unused loop variable retry (ShellCheck SC2034).

The variable retry on Line 425 is only used to control iteration count, never referenced in the loop body. This is harmless but flagged by ShellCheck. You can suppress it with a leading underscore convention or just use _.
Proposed fix
-    for retry in $(seq 0 $MAX_STARTUP_RETRIES); do
+    for _retry in $(seq 0 $MAX_STARTUP_RETRIES); do

coderabbitai · 2026-02-11T21:08:40Z

dockerfiles/Dockerfile.sandbox

 RUN apt-get update && \
-    apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev && \
-    ARCH="${TARGETARCH:-$(dpkg --print-architecture)}" && \
+    apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \
+    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if net-tools or its commands (netstat, ifconfig, arp, route) are referenced anywhere echo "=== Searching for net-tools commands usage ===" rg -i '\b(netstat|ifconfig|arp|route)\b' --type=sh --type=py --type=bash 2>/dev/null || echo "No matches found" echo "" echo "=== Searching for net-tools package reference ===" rg 'net-tools' 2>/dev/null || echo "No matches found" echo "" echo "=== Checking the dockerfile for net-tools and checking startup script ===" cd dockerfiles && ls -la | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 1140

🏁 Script executed:

#!/bin/bash # Read the Dockerfile to see the full context and startup script reference echo "=== Reading Dockerfile.sandbox ===" cat dockerfiles/Dockerfile.sandbox | head -160 echo "" echo "=== Looking for the start-with-nginx.sh script ===" find . -name "start-with-nginx.sh" -o -name "start.sh" | head -10

Repository: NVIDIA-NeMo/Skills

Length of output: 7511

🏁 Script executed:

#!/bin/bash # Check what's in the startup scripts to see if net-tools commands are used echo "=== Searching for startup script files ===" find . -type f -name "*start*" | grep -E "\.(sh|py)$" | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 340

🏁 Script executed:

#!/bin/bash # Check the startup script for command usage echo "=== Checking start-with-nginx.sh for command usage ===" cat dockerfiles/sandbox/start-with-nginx.sh

Repository: NVIDIA-NeMo/Skills

Length of output: 25914

🏁 Script executed:

#!/bin/bash # Verify iproute2 is included in python:3.10-slim by checking Debian package info echo "=== Checking if ss (from iproute2) is typically available in Debian slim images ===" echo "iproute2 is part of the essential packages in Debian and is included in slim images" echo "Python:3.10-slim is based on Debian 12 (bookworm), which includes iproute2 by default" echo "" echo "=== Verification Summary ===" echo "1. ss command usage found: YES (port_is_free function uses 'ss -tln')" echo "2. net-tools commands found: NO (no netstat, ifconfig, arp, route usage)" echo "3. net-tools package reference: ONLY in the apt-get install line (line 35)" echo "4. iproute2 in slim images: YES (included by default in Debian slim)"

Repository: NVIDIA-NeMo/Skills

Length of output: 632

Add --no-install-recommends and remove unused net-tools package.

The apt-get install on line 35 pulls in unnecessary recommended packages (Trivy DS-0029) and includes net-tools, which is not used. The scripts use ss (from iproute2, pre-installed in python:3.10-slim) rather than tools from net-tools.

Proposed fix

RUN apt-get update && \ - apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \ + apt-get install -y --no-install-recommends nginx curl git bzip2 build-essential libseccomp-dev && \ rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RUN apt-get update && \

apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev && \

ARCH="${TARGETARCH:-$(dpkg --print-architecture)}" && \

apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \

rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

RUN apt-get update && \

apt-get install -y --no-install-recommends nginx curl git bzip2 build-essential libseccomp-dev && \

rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

🧰 Tools

🪛 Trivy (0.69.1)

[error] 34-36: 'apt-get' missing '--no-install-recommends'

'--no-install-recommends' flag is missed: 'apt-get update && apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*'

Rule: DS-0029

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents

In `@dockerfiles/Dockerfile.sandbox` around lines 34 - 36, Update the RUN apt-get install invocation to avoid installing recommended packages and drop the unused net-tools package: modify the RUN command that performs apt-get update && apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && ... so it adds the --no-install-recommends flag to apt-get install and removes net-tools from the package list; keep the existing cleanup (rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*) unchanged.

coderabbitai · 2026-02-11T21:08:40Z

dockerfiles/sandbox/start-with-nginx.sh

 start_worker_fast() {
    local i=$1
    local WORKER_PORT=$2

-    cat > /tmp/worker${i}_uwsgi.ini << EOF
-[uwsgi]
-module = main
-callable = app
-processes = ${UWSGI_PROCESSES}
-http-socket = 0.0.0.0:${WORKER_PORT}
-vacuum = true
-master = true
-die-on-term = true
-memory-report = true
-listen = 100
-http-timeout = 300
-socket-timeout = 300
-disable-logging = false
-log-date = true
-log-prefix = [worker${i}]
-logto = /var/log/worker${i}.log
-EOF
-
-    if [ -n "$UWSGI_CHEAPER" ]; then
-        echo "cheaper = ${UWSGI_CHEAPER}" >> /tmp/worker${i}_uwsgi.ini
-    fi
-
    > /var/log/worker${i}.log
-    ( cd /app && env WORKER_NUM=$i uwsgi --ini /tmp/worker${i}_uwsgi.ini ) &
+    # Redirect stdout/stderr to the log file so the $() subshell that calls
+    # this function doesn't block waiting for the pipe to close.
+    # (uWSGI's --logto closed stdout implicitly; gunicorn does not.)
+    ( cd /app && env WORKER_NUM=$i \
+        gunicorn main:app \
+            --workers ${GUNICORN_WORKERS} \
+            --bind 0.0.0.0:${WORKER_PORT} \
+            --timeout 300 \
+            --graceful-timeout 30 \
+            --backlog 100 \
+            --access-logfile /var/log/worker${i}.log \
+            --error-logfile /var/log/worker${i}.log \
+            --log-level info \
+    ) >> /var/log/worker${i}.log 2>&1 &
    echo "$!:$WORKER_PORT"
 }


⚠️ Potential issue | 🟡 Minor

ShellCheck SC2188: redirection without a command on Line 138.

> /var/log/worker${i}.log is a bare redirection. While it works in bash for truncating, it's flagged by ShellCheck and not portable. Use : > (the no-op builtin) instead.

Proposed fix

- > /var/log/worker${i}.log + : > /var/log/worker${i}.log

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

start_worker_fast() {

local i=$1

local WORKER_PORT=$2

cat > /tmp/worker${i}_uwsgi.ini << EOF

[uwsgi]

module = main

callable = app

processes = ${UWSGI_PROCESSES}

http-socket = 0.0.0.0:${WORKER_PORT}

vacuum = true

master = true

die-on-term = true

memory-report = true

listen = 100

http-timeout = 300

socket-timeout = 300

disable-logging = false

log-date = true

log-prefix = [worker${i}]

logto = /var/log/worker${i}.log

EOF

if [ -n "$UWSGI_CHEAPER" ]; then

echo "cheaper = ${UWSGI_CHEAPER}" >> /tmp/worker${i}_uwsgi.ini

fi

> /var/log/worker${i}.log

( cd /app && env WORKER_NUM=$i uwsgi --ini /tmp/worker${i}_uwsgi.ini ) &

# Redirect stdout/stderr to the log file so the $() subshell that calls

# this function doesn't block waiting for the pipe to close.

# (uWSGI's --logto closed stdout implicitly; gunicorn does not.)

( cd /app && env WORKER_NUM=$i \

gunicorn main:app \

--workers ${GUNICORN_WORKERS} \

--bind 0.0.0.0:${WORKER_PORT} \

--timeout 300 \

--graceful-timeout 30 \

--backlog 100 \

--access-logfile /var/log/worker${i}.log \

--error-logfile /var/log/worker${i}.log \

--log-level info \

) >> /var/log/worker${i}.log 2>&1 &

echo "$!:$WORKER_PORT"

}

start_worker_fast() {

local i=$1

local WORKER_PORT=$2

: > /var/log/worker${i}.log

# Redirect stdout/stderr to the log file so the $() subshell that calls

# this function doesn't block waiting for the pipe to close.

# (uWSGI's --logto closed stdout implicitly; gunicorn does not.)

( cd /app && env WORKER_NUM=$i \

gunicorn main:app \

--workers ${GUNICORN_WORKERS} \

--bind 0.0.0.0:${WORKER_PORT} \

--timeout 300 \

--graceful-timeout 30 \

--backlog 100 \

--access-logfile /var/log/worker${i}.log \

--error-logfile /var/log/worker${i}.log \

--log-level info \

) >> /var/log/worker${i}.log 2>&1 &

echo "$!:$WORKER_PORT"

}

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 138-138: This redirection doesn't have a command. Move to its command (or use 'true' as no-op).

(SC2188)

🤖 Prompt for AI Agents

In `@dockerfiles/sandbox/start-with-nginx.sh` around lines 134 - 154, The bare redirection in start_worker_fast ("> /var/log/worker${i}.log") should be replaced with a portable no-op truncation; change that line to use the shell builtin no-op with redirection (e.g., ": > /var/log/worker${i}.log") so the log file is truncated safely and ShellCheck SC2188 is resolved while keeping the rest of start_worker_fast (gunicorn invocation, redirects, and echo "$!:$WORKER_PORT") unchanged.

coderabbitai · 2026-02-11T21:08:40Z

nemo_skills/dataset/icpc/__init__.py

 SANDBOX_ENV_VARS = [
-    "UWSGI_PROCESSES=1024",
-    "UWSGI_CPU_AFFINITY=8",
-    "UWSGI_CHEAPER=1023",
+    "GUNICORN_WORKERS=1024",
    "NUM_WORKERS=1",
    "STATEFUL_SANDBOX=0",
 ]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current code in icpc/__init__.py cat -n nemo_skills/dataset/icpc/__init__.py | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 1339

🏁 Script executed:

# Check the equivalent file in ioi/__init__.py cat -n nemo_skills/dataset/ioi/__init__.py | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 1208

🌐 Web query:

Gunicorn lazy worker spawning vs immediate spawn all workers at startup

💡 Result:

Gunicorn does not have a “lazy / on-demand worker spawning” mode in the sense of only creating workers when traffic arrives. Gunicorn uses a pre-fork worker model: a master (arbiter) process starts and manages a fixed pool of worker processes, and those workers handle requests. This model is explicitly “pre-fork” (workers exist before requests are handled). [1]

What can vary: when the application code is loaded

Gunicorn gives you two practical “startup behaviors,” but both still spawn the workers at startup:

Default (no --preload)

Gunicorn starts the master, spawns the configured workers, and each worker loads/imports your app itself (so app code is not shared between workers).

Benefit: easier code reload behavior (restart workers to pick up code changes). [2]

Preload mode (--preload / preload_app=True)

Gunicorn loads the application code in the master first, then forks workers so they can share read-only memory pages via copy-on-write (often lower RAM and faster boot). [2]

“Spawn more workers later” (not traffic-driven)

Gunicorn can change the worker count at runtime via signals:

TTIN increases worker count by 1

TTOU decreases worker count by 1 [3]

But this is an operator-driven scaling mechanism, not automatic “lazy spawn on demand.”

Sources: Gunicorn design (pre-fork model) [1], preload_app/--preload setting docs [2], Gunicorn signal handling (TTIN/TTOU) [3].

🌐 Web query:

Gunicorn worker class alternatives to sync worker for lazy spawning

💡 Result:

Gunicorn doesn’t “lazy spawn” workers on demand—its core architecture is a pre-fork worker model where the master (arbiter) maintains a worker pool at the configured size. (gunicorn.org)
So there isn’t a worker class that changes that fundamental behavior; the practical alternative is to reduce process count and get concurrency inside each worker.

Worker-class alternatives to sync (to avoid needing many pre-forked processes)

gthread (threaded workers): run fewer worker processes and add a thread pool per worker (--threads). Also supports keep-alive (unlike sync). (docs.gunicorn.org)
Example:

gunicorn myapp:app -k gthread --workers 2 --threads 8

gevent (greenlets / async I/O): good for I/O-bound workloads and many concurrent connections; uses --worker-connections. (gunicorn.org)
Example:

gunicorn myapp:app -k gevent --workers 2 --worker-connections 1000

asgi (native Gunicorn ASGI worker, Gunicorn ≥ 24.0.0, beta): for async frameworks; provides keep-alive + WebSockets. (gunicorn.org)
Example:

gunicorn myasgi:app --worker-class asgi --workers 2

eventlet: listed but deprecated in current Gunicorn docs—prefer gevent. (gunicorn.org)

tornado: primarily for Tornado apps (WSGI under it isn’t generally recommended). (docs.gunicorn.org)

If your goal is “spin up capacity only when load arrives”

Gunicorn itself won’t do that automatically, but you can change worker count at runtime via signals (e.g., add workers with TTIN, remove with TTOU). (gunicorn.org)

If you tell me whether you’re serving WSGI vs ASGI, and whether the bottleneck is CPU-bound vs I/O-bound, I can suggest a concrete configuration (workers/threads/worker-connections/timeouts) that best approximates “lazy capacity” without over-forking.

Citations:

1: https://gunicorn.org/design/?utm_source=openai

2: https://docs.gunicorn.org/en/20.0.4/settings.html?utm_source=openai

3: https://gunicorn.org/design/?utm_source=openai

4: https://gunicorn.org/asgi/?utm_source=openai

5: https://gunicorn.org/design/?utm_source=openai

6: https://docs.gunicorn.org/en/21.1.0/design.html?utm_source=openai

7: https://gunicorn.org/design/?utm_source=openai

Loss of lazy worker spawning (uWSGI cheaper subsystem).

The configuration changed from uWSGI's UWSGI_CHEAPER=1023 (which spawned 1 worker and scaled to 1024 on demand) to Gunicorn's --workers 1024 (all 1024 workers spawned immediately at startup). Gunicorn does not support lazy/on-demand worker spawning; its pre-fork model always spawns the full worker pool at boot, causing significantly higher startup memory pressure for ICPC/IOI benchmarks.

The --preload flag reduces per-worker memory via copy-on-write but does not delay worker spawning. Alternative worker classes (gthread, gevent) reduce process count by adding threading/greenlet concurrency per worker, but require ensuring the code is compatible. Consider whether the current configuration is acceptable or if an alternative worker class would be appropriate.

This same issue applies to nemo_skills/dataset/ioi/__init__.py.

🤖 Prompt for AI Agents

In `@nemo_skills/dataset/icpc/__init__.py` around lines 25 - 29, The SANDBOX_ENV_VARS change replaced uWSGI's lazy-scaling UWSGI_CHEAPER with a fixed "GUNICORN_WORKERS=1024", which forces 1024 processes at startup; update the SANDBOX_ENV_VARS constant to avoid spawning all workers immediately by either (a) restoring a uWSGI lazy-scaling env var (e.g. set/keep UWSGI_CHEAPER instead of GUNICORN_WORKERS) or (b) switch Gunicorn to a threaded/greenlet worker model and lower process count: replace "GUNICORN_WORKERS=1024" with a conservative process count (e.g. NUM_WORKERS small), add "GUNICORN_WORKER_CLASS=gthread" or "GUNICORN_WORKER_CLASS=gevent" and add a "GUNICORN_THREADS=<desired_thread_count>" env var to maintain concurrency without pre-forking 1024 processes; apply the same change in nemo_skills/dataset/ioi/__init__.py and ensure code is compatible with the chosen worker_class before merging.

coderabbitai · 2026-02-11T21:08:41Z

tests/test_sandbox_network_blocking.py

    port = get_free_port(strategy="random")
+    worker_base_port = get_free_port(strategy="random")


⚠️ Potential issue | 🟡 Minor

Minor: potential port collision between port and worker_base_port.

Both calls use strategy="random" independently. get_free_port already supports an exclude parameter to avoid this.

Proposed fix

port = get_free_port(strategy="random") - worker_base_port = get_free_port(strategy="random") + worker_base_port = get_free_port(exclude=[port], strategy="random")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

port = get_free_port(strategy="random")

worker_base_port = get_free_port(strategy="random")

port = get_free_port(strategy="random")

worker_base_port = get_free_port(exclude=[port], strategy="random")

🤖 Prompt for AI Agents

In `@tests/test_sandbox_network_blocking.py` around lines 41 - 42, The test currently calls get_free_port twice with strategy="random" for variables port and worker_base_port which can collide; call get_free_port the second time with exclude containing the first port (use get_free_port(..., exclude=[port])) so worker_base_port is chosen distinct from port (refer to variables port and worker_base_port and the get_free_port function's exclude parameter).

gwarmstrong · 2026-02-11T21:17:49Z

Recreating from origin branch to fix CI auth

gwarmstrong added 2 commits February 11, 2026 13:01

gwarmstrong force-pushed the georgea/sandbox-gunicorn branch from 9305cdf to b5a8316 Compare February 11, 2026 21:01

gwarmstrong mentioned this pull request Feb 11, 2026

feat: move sandbox from uwsgi to gunicorn, use supported image, use lockfile for python deps #1226

Closed

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

gwarmstrong closed this Feb 11, 2026

		port = get_free_port(strategy="random")
		worker_base_port = get_free_port(strategy="random")

Conversation

gwarmstrong commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

What can vary: when the application code is loaded

“Spawn more workers later” (not traffic-driven)

Worker-class alternatives to sync (to avoid needing many pre-forked processes)

If your goal is “spin up capacity only when load arrives”

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gwarmstrong commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gwarmstrong commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Worker-class alternatives to `sync` (to avoid needing many pre-forked processes)