Skip to content

feat: migrate sandbox from uwsgi to gunicorn#1232

Closed
gwarmstrong wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
gwarmstrong:georgea/sandbox-gunicorn
Closed

feat: migrate sandbox from uwsgi to gunicorn#1232
gwarmstrong wants to merge 2 commits intoNVIDIA-NeMo:mainfrom
gwarmstrong:georgea/sandbox-gunicorn

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Feb 11, 2026

Summary

  • Replace tiangolo/uwsgi-nginx-flask base image with python:3.10-slim + nginx + gunicorn
  • Remove all uwsgi configuration (UWSGI_PROCESSES, UWSGI_CHEAPER, UWSGI_CPU_AFFINITY, uwsgi_read_timeout)
  • Add deprecation warnings for old uwsgi env vars in start-with-nginx.sh
  • Fix SIGINT handler in shell_worker for gunicorn compatibility
  • Update icpc/ioi dataset env vars to use GUNICORN_WORKERS
  • Test fixes for port conflicts and sys.executable usage

Depends on #1231 (lockfiles). Split from #1226.

Summary by CodeRabbit

Release Notes

  • Chores

    • Migrated sandbox server infrastructure from uWSGI to Gunicorn for improved performance and reliability.
    • Implemented dependency locking for consistent and reproducible builds.
    • Added Lean 4 toolchain support.
    • Enhanced port management with pre-checks and conflict detection during startup.
  • Bug Fixes

    • Improved signal handling in shell worker for better interrupt handling.

… builds

Add sandbox-core.lock and sandbox.lock for pinned dependency installation.
Switch Dockerfile.sandbox from pip install to uv install from lockfiles.
Add flask and gunicorn to code_execution.txt requirements.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Replace tiangolo/uwsgi-nginx-flask with python:3.10-slim and gunicorn.
Add nginx as system package, remove all uwsgi configuration (UWSGI_PROCESSES,
UWSGI_CHEAPER, UWSGI_CPU_AFFINITY, uwsgi_read_timeout).
Add deprecation warnings for old uwsgi env vars in start-with-nginx.sh.
Fix SIGINT handler in shell_worker for gunicorn compatibility.
Update icpc/ioi dataset env vars to use GUNICORN_WORKERS.

Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

This pull request migrates the sandbox infrastructure from uWSGI to Gunicorn, updating the Docker build configuration to use a base Python image with lock-file-based dependency management, refactoring startup scripts to launch Gunicorn workers instead of uWSGI, and updating related configuration files across dataset and test modules.

Changes

Cohort / File(s) Summary
Docker Build System
dockerfiles/Dockerfile.sandbox
Replaced base image from tiangolo/uwsgi-nginx-flask to python:3.10-slim; added lock-file-based dependency management via requirements/sandbox-core.lock and requirements/sandbox.lock; introduced PyPy3 installation with architecture detection; added Lean 4 toolchain setup; removed legacy UWSGI environment variables; changed NGINX_PORT configuration.
Network Blocking
dockerfiles/sandbox/block_network.c
Updated comment to reference Gunicorn instead of uWSGI; no functional changes to domain socket handling.
Worker Startup
dockerfiles/sandbox/start-with-nginx.sh
Replaced uWSGI with Gunicorn worker launch logic; added port_is_free utility and pre-startup port availability checks; updated deprecation warnings for old uWSGI variables; revised health checks and multi-node coordination for Gunicorn endpoints; changed nginx shutdown approach.
Local Sandbox Execution
nemo_skills/code_execution/local_sandbox/local_sandbox_server.py
Added SIGINT handler restoration after IPython TerminalInteractiveShell creation; updated contextual comment to reference Gunicorn instead of uWSGI.
Docker Execution Configuration
nemo_skills/code_execution/local_sandbox/start_local_sandbox.sh
Replaced UWSGI environment variable propagation (UWSGI_CPU_AFFINITY, UWSGI_PROCESSES) with GUNICORN_WORKERS in docker run invocation.
Dataset Configurations
nemo_skills/dataset/icpc/__init__.py, nemo_skills/dataset/ioi/__init__.py
Replaced SANDBOX_ENV_VARS entries: removed UWSGI_PROCESSES, UWSGI_CPU_AFFINITY, and UWSGI_CHEAPER; added GUNICORN_WORKERS=1024.
Dependencies and Testing
requirements/code_execution.txt, tests/test_sandbox_network_blocking.py
Added flask and gunicorn packages to code_execution requirements; added SANDBOX_WORKER_BASE_PORT environment variable to sandbox tests; updated subprocess execution to use sys.executable instead of hard-coded python3.

Sequence Diagram

sequenceDiagram
    participant Builder as Docker Builder
    participant Nginx as Nginx (Reverse Proxy)
    participant Health as Health Checker
    participant Worker as Gunicorn Worker
    participant App as Application
    
    Builder->>Nginx: Configure with upstream ports
    Builder->>Worker: Start worker with pre-check
    
    Note over Worker: port_is_free()<br/>checks availability
    
    alt Port Available
        Worker->>Worker: Bind to worker port
        Worker->>App: Launch Gunicorn process
        App->>App: Start listening on port
    else Port Conflict
        Worker->>Worker: Retry with offset
    end
    
    Health->>Worker: Health check request
    Worker-->>Health: Response (Gunicorn endpoint)
    Health->>Nginx: Register upstream
    
    Nginx->>Worker: Route requests
    Worker->>App: Forward to application
    App-->>Worker: Response
    Worker-->>Nginx: Return to client
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • activatedgeek
  • Kipok
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: migrating the sandbox runtime from uwsgi to gunicorn, which is the primary focus across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

--backlog 100 \
--access-logfile /var/log/worker${i}.log \
--error-logfile /var/log/worker${i}.log \
--log-level info \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing backslash creates invalid shell syntax

Suggested change
--log-level info \
--log-level info

Comment on lines +425 to 432
for retry in $(seq 0 $MAX_STARTUP_RETRIES); do
if port_is_free $port; then
break
fi
CONFLICT_COUNT=$((CONFLICT_COUNT + 1))
port=$((port + PORT_INCREMENT))
echo "[$_H] Port conflict on worker $i, trying port $port"
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port conflict retry exhaustion will spawn worker on occupied port. If all MAX_STARTUP_RETRIES iterations fail to find a free port, the loop exits without breaking, and line 434 spawns the worker with an occupied port, causing it to fail. Add a check after the loop:

if ! port_is_free $port; then
    echo "[$_H] ERROR: Worker $i could not find a free port after $MAX_STARTUP_RETRIES retries"
    cleanup
    exit 1
fi

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@dockerfiles/Dockerfile.sandbox`:
- Around line 34-36: Update the RUN apt-get install invocation to avoid
installing recommended packages and drop the unused net-tools package: modify
the RUN command that performs apt-get update && apt-get install -y nginx curl
git net-tools bzip2 build-essential libseccomp-dev && ... so it adds the
--no-install-recommends flag to apt-get install and removes net-tools from the
package list; keep the existing cleanup (rm -rf /var/lib/apt/lists/*
/var/cache/apt/archives/*) unchanged.

In `@dockerfiles/sandbox/start-with-nginx.sh`:
- Around line 134-154: The bare redirection in start_worker_fast (">
/var/log/worker${i}.log") should be replaced with a portable no-op truncation;
change that line to use the shell builtin no-op with redirection (e.g., ": >
/var/log/worker${i}.log") so the log file is truncated safely and ShellCheck
SC2188 is resolved while keeping the rest of start_worker_fast (gunicorn
invocation, redirects, and echo "$!:$WORKER_PORT") unchanged.

In `@nemo_skills/dataset/icpc/__init__.py`:
- Around line 25-29: The SANDBOX_ENV_VARS change replaced uWSGI's lazy-scaling
UWSGI_CHEAPER with a fixed "GUNICORN_WORKERS=1024", which forces 1024 processes
at startup; update the SANDBOX_ENV_VARS constant to avoid spawning all workers
immediately by either (a) restoring a uWSGI lazy-scaling env var (e.g. set/keep
UWSGI_CHEAPER instead of GUNICORN_WORKERS) or (b) switch Gunicorn to a
threaded/greenlet worker model and lower process count: replace
"GUNICORN_WORKERS=1024" with a conservative process count (e.g. NUM_WORKERS
small), add "GUNICORN_WORKER_CLASS=gthread" or "GUNICORN_WORKER_CLASS=gevent"
and add a "GUNICORN_THREADS=<desired_thread_count>" env var to maintain
concurrency without pre-forking 1024 processes; apply the same change in
nemo_skills/dataset/ioi/__init__.py and ensure code is compatible with the
chosen worker_class before merging.

In `@tests/test_sandbox_network_blocking.py`:
- Around line 41-42: The test currently calls get_free_port twice with
strategy="random" for variables port and worker_base_port which can collide;
call get_free_port the second time with exclude containing the first port (use
get_free_port(..., exclude=[port])) so worker_base_port is chosen distinct from
port (refer to variables port and worker_base_port and the get_free_port
function's exclude parameter).
🧹 Nitpick comments (2)
requirements/code_execution.txt (1)

16-19: Infrastructure dependencies mixed with LLM-facing packages.

The file header (Line 16) says these are "requirements for code execution - put all packages that LLM should have access to in here," but flask and gunicorn are infrastructure/server dependencies, not packages the LLM needs. Consider either updating the comment to reflect the dual purpose, or splitting server dependencies into a separate requirements file so the lock-file compilation source is clearer.

dockerfiles/sandbox/start-with-nginx.sh (1)

421-437: Unused loop variable retry (ShellCheck SC2034).

The variable retry on Line 425 is only used to control iteration count, never referenced in the loop body. This is harmless but flagged by ShellCheck. You can suppress it with a leading underscore convention or just use _.

Proposed fix
-    for retry in $(seq 0 $MAX_STARTUP_RETRIES); do
+    for _retry in $(seq 0 $MAX_STARTUP_RETRIES); do

Comment on lines 34 to +36
RUN apt-get update && \
apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev && \
ARCH="${TARGETARCH:-$(dpkg --print-architecture)}" && \
apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if net-tools or its commands (netstat, ifconfig, arp, route) are referenced anywhere
echo "=== Searching for net-tools commands usage ==="
rg -i '\b(netstat|ifconfig|arp|route)\b' --type=sh --type=py --type=bash 2>/dev/null || echo "No matches found"

echo ""
echo "=== Searching for net-tools package reference ==="
rg 'net-tools' 2>/dev/null || echo "No matches found"

echo ""
echo "=== Checking the dockerfile for net-tools and checking startup script ==="
cd dockerfiles && ls -la | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 1140


🏁 Script executed:

#!/bin/bash
# Read the Dockerfile to see the full context and startup script reference
echo "=== Reading Dockerfile.sandbox ==="
cat dockerfiles/Dockerfile.sandbox | head -160

echo ""
echo "=== Looking for the start-with-nginx.sh script ==="
find . -name "start-with-nginx.sh" -o -name "start.sh" | head -10

Repository: NVIDIA-NeMo/Skills

Length of output: 7511


🏁 Script executed:

#!/bin/bash
# Check what's in the startup scripts to see if net-tools commands are used
echo "=== Searching for startup script files ==="
find . -type f -name "*start*" | grep -E "\.(sh|py)$" | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 340


🏁 Script executed:

#!/bin/bash
# Check the startup script for command usage
echo "=== Checking start-with-nginx.sh for command usage ==="
cat dockerfiles/sandbox/start-with-nginx.sh

Repository: NVIDIA-NeMo/Skills

Length of output: 25914


🏁 Script executed:

#!/bin/bash
# Verify iproute2 is included in python:3.10-slim by checking Debian package info
echo "=== Checking if ss (from iproute2) is typically available in Debian slim images ==="
echo "iproute2 is part of the essential packages in Debian and is included in slim images"
echo "Python:3.10-slim is based on Debian 12 (bookworm), which includes iproute2 by default"
echo ""
echo "=== Verification Summary ==="
echo "1. ss command usage found: YES (port_is_free function uses 'ss -tln')"
echo "2. net-tools commands found: NO (no netstat, ifconfig, arp, route usage)"
echo "3. net-tools package reference: ONLY in the apt-get install line (line 35)"
echo "4. iproute2 in slim images: YES (included by default in Debian slim)"

Repository: NVIDIA-NeMo/Skills

Length of output: 632


Add --no-install-recommends and remove unused net-tools package.

The apt-get install on line 35 pulls in unnecessary recommended packages (Trivy DS-0029) and includes net-tools, which is not used. The scripts use ss (from iproute2, pre-installed in python:3.10-slim) rather than tools from net-tools.

Proposed fix
 RUN apt-get update && \
-    apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \
+    apt-get install -y --no-install-recommends nginx curl git bzip2 build-essential libseccomp-dev && \
     rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN apt-get update && \
apt-get install -y curl git net-tools bzip2 build-essential libseccomp-dev && \
ARCH="${TARGETARCH:-$(dpkg --print-architecture)}" && \
apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
RUN apt-get update && \
apt-get install -y --no-install-recommends nginx curl git bzip2 build-essential libseccomp-dev && \
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
🧰 Tools
🪛 Trivy (0.69.1)

[error] 34-36: 'apt-get' missing '--no-install-recommends'

'--no-install-recommends' flag is missed: 'apt-get update && apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*'

Rule: DS-0029

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
In `@dockerfiles/Dockerfile.sandbox` around lines 34 - 36, Update the RUN apt-get
install invocation to avoid installing recommended packages and drop the unused
net-tools package: modify the RUN command that performs apt-get update &&
apt-get install -y nginx curl git net-tools bzip2 build-essential libseccomp-dev
&& ... so it adds the --no-install-recommends flag to apt-get install and
removes net-tools from the package list; keep the existing cleanup (rm -rf
/var/lib/apt/lists/* /var/cache/apt/archives/*) unchanged.

Comment on lines 134 to 154
start_worker_fast() {
local i=$1
local WORKER_PORT=$2

cat > /tmp/worker${i}_uwsgi.ini << EOF
[uwsgi]
module = main
callable = app
processes = ${UWSGI_PROCESSES}
http-socket = 0.0.0.0:${WORKER_PORT}
vacuum = true
master = true
die-on-term = true
memory-report = true
listen = 100
http-timeout = 300
socket-timeout = 300
disable-logging = false
log-date = true
log-prefix = [worker${i}]
logto = /var/log/worker${i}.log
EOF

if [ -n "$UWSGI_CHEAPER" ]; then
echo "cheaper = ${UWSGI_CHEAPER}" >> /tmp/worker${i}_uwsgi.ini
fi

> /var/log/worker${i}.log
( cd /app && env WORKER_NUM=$i uwsgi --ini /tmp/worker${i}_uwsgi.ini ) &
# Redirect stdout/stderr to the log file so the $() subshell that calls
# this function doesn't block waiting for the pipe to close.
# (uWSGI's --logto closed stdout implicitly; gunicorn does not.)
( cd /app && env WORKER_NUM=$i \
gunicorn main:app \
--workers ${GUNICORN_WORKERS} \
--bind 0.0.0.0:${WORKER_PORT} \
--timeout 300 \
--graceful-timeout 30 \
--backlog 100 \
--access-logfile /var/log/worker${i}.log \
--error-logfile /var/log/worker${i}.log \
--log-level info \
) >> /var/log/worker${i}.log 2>&1 &
echo "$!:$WORKER_PORT"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ShellCheck SC2188: redirection without a command on Line 138.

> /var/log/worker${i}.log is a bare redirection. While it works in bash for truncating, it's flagged by ShellCheck and not portable. Use : > (the no-op builtin) instead.

Proposed fix
-    > /var/log/worker${i}.log
+    : > /var/log/worker${i}.log
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
start_worker_fast() {
local i=$1
local WORKER_PORT=$2
cat > /tmp/worker${i}_uwsgi.ini << EOF
[uwsgi]
module = main
callable = app
processes = ${UWSGI_PROCESSES}
http-socket = 0.0.0.0:${WORKER_PORT}
vacuum = true
master = true
die-on-term = true
memory-report = true
listen = 100
http-timeout = 300
socket-timeout = 300
disable-logging = false
log-date = true
log-prefix = [worker${i}]
logto = /var/log/worker${i}.log
EOF
if [ -n "$UWSGI_CHEAPER" ]; then
echo "cheaper = ${UWSGI_CHEAPER}" >> /tmp/worker${i}_uwsgi.ini
fi
> /var/log/worker${i}.log
( cd /app && env WORKER_NUM=$i uwsgi --ini /tmp/worker${i}_uwsgi.ini ) &
# Redirect stdout/stderr to the log file so the $() subshell that calls
# this function doesn't block waiting for the pipe to close.
# (uWSGI's --logto closed stdout implicitly; gunicorn does not.)
( cd /app && env WORKER_NUM=$i \
gunicorn main:app \
--workers ${GUNICORN_WORKERS} \
--bind 0.0.0.0:${WORKER_PORT} \
--timeout 300 \
--graceful-timeout 30 \
--backlog 100 \
--access-logfile /var/log/worker${i}.log \
--error-logfile /var/log/worker${i}.log \
--log-level info \
) >> /var/log/worker${i}.log 2>&1 &
echo "$!:$WORKER_PORT"
}
start_worker_fast() {
local i=$1
local WORKER_PORT=$2
: > /var/log/worker${i}.log
# Redirect stdout/stderr to the log file so the $() subshell that calls
# this function doesn't block waiting for the pipe to close.
# (uWSGI's --logto closed stdout implicitly; gunicorn does not.)
( cd /app && env WORKER_NUM=$i \
gunicorn main:app \
--workers ${GUNICORN_WORKERS} \
--bind 0.0.0.0:${WORKER_PORT} \
--timeout 300 \
--graceful-timeout 30 \
--backlog 100 \
--access-logfile /var/log/worker${i}.log \
--error-logfile /var/log/worker${i}.log \
--log-level info \
) >> /var/log/worker${i}.log 2>&1 &
echo "$!:$WORKER_PORT"
}
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 138-138: This redirection doesn't have a command. Move to its command (or use 'true' as no-op).

(SC2188)

🤖 Prompt for AI Agents
In `@dockerfiles/sandbox/start-with-nginx.sh` around lines 134 - 154, The bare
redirection in start_worker_fast ("> /var/log/worker${i}.log") should be
replaced with a portable no-op truncation; change that line to use the shell
builtin no-op with redirection (e.g., ": > /var/log/worker${i}.log") so the log
file is truncated safely and ShellCheck SC2188 is resolved while keeping the
rest of start_worker_fast (gunicorn invocation, redirects, and echo
"$!:$WORKER_PORT") unchanged.

Comment on lines 25 to 29
SANDBOX_ENV_VARS = [
"UWSGI_PROCESSES=1024",
"UWSGI_CPU_AFFINITY=8",
"UWSGI_CHEAPER=1023",
"GUNICORN_WORKERS=1024",
"NUM_WORKERS=1",
"STATEFUL_SANDBOX=0",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current code in icpc/__init__.py
cat -n nemo_skills/dataset/icpc/__init__.py | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 1339


🏁 Script executed:

# Check the equivalent file in ioi/__init__.py
cat -n nemo_skills/dataset/ioi/__init__.py | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 1208


🌐 Web query:

Gunicorn lazy worker spawning vs immediate spawn all workers at startup

💡 Result:

Gunicorn does not have a “lazy / on-demand worker spawning” mode in the sense of only creating workers when traffic arrives. Gunicorn uses a pre-fork worker model: a master (arbiter) process starts and manages a fixed pool of worker processes, and those workers handle requests. This model is explicitly “pre-fork” (workers exist before requests are handled). [1]

What can vary: when the application code is loaded

Gunicorn gives you two practical “startup behaviors,” but both still spawn the workers at startup:

  1. Default (no --preload)
  • Gunicorn starts the master, spawns the configured workers, and each worker loads/imports your app itself (so app code is not shared between workers).
  • Benefit: easier code reload behavior (restart workers to pick up code changes). [2]
  1. Preload mode (--preload / preload_app=True)
  • Gunicorn loads the application code in the master first, then forks workers so they can share read-only memory pages via copy-on-write (often lower RAM and faster boot). [2]

“Spawn more workers later” (not traffic-driven)

Gunicorn can change the worker count at runtime via signals:

  • TTIN increases worker count by 1
  • TTOU decreases worker count by 1 [3]

But this is an operator-driven scaling mechanism, not automatic “lazy spawn on demand.”

Sources: Gunicorn design (pre-fork model) [1], preload_app/--preload setting docs [2], Gunicorn signal handling (TTIN/TTOU) [3].


🌐 Web query:

Gunicorn worker class alternatives to sync worker for lazy spawning

💡 Result:

Gunicorn doesn’t “lazy spawn” workers on demand—its core architecture is a pre-fork worker model where the master (arbiter) maintains a worker pool at the configured size. (gunicorn.org)
So there isn’t a worker class that changes that fundamental behavior; the practical alternative is to reduce process count and get concurrency inside each worker.

Worker-class alternatives to sync (to avoid needing many pre-forked processes)

  • gthread (threaded workers): run fewer worker processes and add a thread pool per worker (--threads). Also supports keep-alive (unlike sync). (docs.gunicorn.org)
    Example:

    gunicorn myapp:app -k gthread --workers 2 --threads 8
  • gevent (greenlets / async I/O): good for I/O-bound workloads and many concurrent connections; uses --worker-connections. (gunicorn.org)
    Example:

    gunicorn myapp:app -k gevent --workers 2 --worker-connections 1000
  • asgi (native Gunicorn ASGI worker, Gunicorn ≥ 24.0.0, beta): for async frameworks; provides keep-alive + WebSockets. (gunicorn.org)
    Example:

    gunicorn myasgi:app --worker-class asgi --workers 2
  • eventlet: listed but deprecated in current Gunicorn docs—prefer gevent. (gunicorn.org)

  • tornado: primarily for Tornado apps (WSGI under it isn’t generally recommended). (docs.gunicorn.org)

If your goal is “spin up capacity only when load arrives”

Gunicorn itself won’t do that automatically, but you can change worker count at runtime via signals (e.g., add workers with TTIN, remove with TTOU). (gunicorn.org)

If you tell me whether you’re serving WSGI vs ASGI, and whether the bottleneck is CPU-bound vs I/O-bound, I can suggest a concrete configuration (workers/threads/worker-connections/timeouts) that best approximates “lazy capacity” without over-forking.

Citations:


Loss of lazy worker spawning (uWSGI cheaper subsystem).

The configuration changed from uWSGI's UWSGI_CHEAPER=1023 (which spawned 1 worker and scaled to 1024 on demand) to Gunicorn's --workers 1024 (all 1024 workers spawned immediately at startup). Gunicorn does not support lazy/on-demand worker spawning; its pre-fork model always spawns the full worker pool at boot, causing significantly higher startup memory pressure for ICPC/IOI benchmarks.

The --preload flag reduces per-worker memory via copy-on-write but does not delay worker spawning. Alternative worker classes (gthread, gevent) reduce process count by adding threading/greenlet concurrency per worker, but require ensuring the code is compatible. Consider whether the current configuration is acceptable or if an alternative worker class would be appropriate.

This same issue applies to nemo_skills/dataset/ioi/__init__.py.

🤖 Prompt for AI Agents
In `@nemo_skills/dataset/icpc/__init__.py` around lines 25 - 29, The
SANDBOX_ENV_VARS change replaced uWSGI's lazy-scaling UWSGI_CHEAPER with a fixed
"GUNICORN_WORKERS=1024", which forces 1024 processes at startup; update the
SANDBOX_ENV_VARS constant to avoid spawning all workers immediately by either
(a) restoring a uWSGI lazy-scaling env var (e.g. set/keep UWSGI_CHEAPER instead
of GUNICORN_WORKERS) or (b) switch Gunicorn to a threaded/greenlet worker model
and lower process count: replace "GUNICORN_WORKERS=1024" with a conservative
process count (e.g. NUM_WORKERS small), add "GUNICORN_WORKER_CLASS=gthread" or
"GUNICORN_WORKER_CLASS=gevent" and add a
"GUNICORN_THREADS=<desired_thread_count>" env var to maintain concurrency
without pre-forking 1024 processes; apply the same change in
nemo_skills/dataset/ioi/__init__.py and ensure code is compatible with the
chosen worker_class before merging.

Comment on lines 41 to +42
port = get_free_port(strategy="random")
worker_base_port = get_free_port(strategy="random")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: potential port collision between port and worker_base_port.

Both calls use strategy="random" independently. get_free_port already supports an exclude parameter to avoid this.

Proposed fix
     port = get_free_port(strategy="random")
-    worker_base_port = get_free_port(strategy="random")
+    worker_base_port = get_free_port(exclude=[port], strategy="random")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
port = get_free_port(strategy="random")
worker_base_port = get_free_port(strategy="random")
port = get_free_port(strategy="random")
worker_base_port = get_free_port(exclude=[port], strategy="random")
🤖 Prompt for AI Agents
In `@tests/test_sandbox_network_blocking.py` around lines 41 - 42, The test
currently calls get_free_port twice with strategy="random" for variables port
and worker_base_port which can collide; call get_free_port the second time with
exclude containing the first port (use get_free_port(..., exclude=[port])) so
worker_base_port is chosen distinct from port (refer to variables port and
worker_base_port and the get_free_port function's exclude parameter).

@gwarmstrong
Copy link
Collaborator Author

Recreating from origin branch to fix CI auth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant