Skip to content

Commit f0f2921

Browse files
keivenchangsaturley-hall
authored andcommitted
feat: add network isolation modes to container/run.sh script (#3237)
Signed-off-by: Keiven Chang <[email protected]> Signed-off-by: Harrison King Saturley-Hall <[email protected]>
1 parent 7189faa commit f0f2921

File tree

2 files changed

+142
-21
lines changed

2 files changed

+142
-21
lines changed

container/README.md

Lines changed: 117 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -167,36 +167,112 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
167167
- **GPU Management**: Automatic GPU detection and allocation
168168
- **Volume Mounting**: Workspace and HuggingFace cache mounting
169169
- **User Management**: Root or user-based container execution
170-
- **Network Configuration**: Host networking for service communication
170+
- **Network Configuration**: Configurable networking modes (host, bridge, none, container sharing)
171171
- **Resource Limits**: Memory, file descriptors, and IPC configuration
172172

173173
**Common Usage Examples:**
174174

175175
```bash
176-
# Basic container launch (inference/production)
177-
./run.sh --image dynamo:latest-vllm
176+
# Basic container launch (inference/production, runs as root user)
177+
./run.sh --image dynamo:latest-vllm -v $HOME/.cache:/home/ubuntu/.cache
178178

179-
# Mount workspace for development (use local-dev image for local user permissions)
180-
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace
179+
# Mount workspace for development (use local-dev image for local host user permissions)
180+
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
181181

182182
# Use specific image and framework for development
183-
./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace
183+
./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
184184

185185
# Interactive development shell with workspace mounted
186-
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -- bash
186+
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it -- bash
187187

188188
# Development with custom environment variables
189-
./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace
190-
191-
# Production inference without GPU access
192-
./run.sh --image dynamo:latest-vllm --gpus none
189+
./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
193190

194191
# Dry run to see docker command
195192
./run.sh --dry-run
196193

197194
# Development with custom volume mounts
198-
./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace
195+
./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
196+
```
197+
198+
### Network Configuration Options
199+
200+
The `run.sh` script supports different networking modes via the `--network` flag (defaults to `host`):
201+
202+
#### Host Networking (Default)
203+
```bash
204+
# Same examples with local host user permissions
205+
./run.sh --image dynamo:latest-vllm-local-dev --network host -v $HOME/.cache:/home/ubuntu/.cache
206+
./run.sh --image dynamo:latest-vllm-local-dev -v $HOME/.cache:/home/ubuntu/.cache
207+
```
208+
**Use cases:**
209+
- High-performance ML inference (default for GPU workloads)
210+
- Services that need direct host port access
211+
- Maximum network performance with minimal overhead
212+
- Sharing services with the host machine (NATS, etcd, etc.)
213+
214+
**⚠️ Port Sharing Limitation:** Host networking shares all ports with the host machine, which means you can only run **one instance** of services like NATS (port 4222) or etcd (port 2379) across all containers and the host.
215+
216+
#### Bridge Networking (Isolated)
217+
```bash
218+
# CI/testing with isolated bridge networking and host cache sharing
219+
./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/ubuntu/.cache
220+
```
221+
**Use cases:**
222+
- Secure isolation from host network
223+
- CI/CD pipelines requiring complete isolation
224+
- When you need absolute control of ports
225+
- Exposing specific services to host while maintaining isolation
226+
227+
**Note:** For port sharing with the host, use the `--port` or `-p` option with format `host_port:container_port` (e.g., `--port 8000:8000` or `-p 9081:8081`) to expose specific container ports to the host.
228+
229+
#### No Networking ⚠️ **LIMITED FUNCTIONALITY**
230+
```bash
231+
# Complete network isolation - no external connectivity
232+
./run.sh --image dynamo:latest-vllm --network none --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
233+
234+
# Same with local user permissions
235+
./run.sh --image dynamo:latest-vllm-local-dev --network none --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
199236
```
237+
**⚠️ WARNING: `--network none` severely limits Dynamo functionality:**
238+
- **No model downloads** - HuggingFace models cannot be downloaded
239+
- **No API access** - Cannot reach external APIs or services
240+
- **No distributed inference** - Multi-node setups won't work
241+
- **No monitoring/logging** - External monitoring systems unreachable
242+
- **Limited debugging** - Cannot access external debugging tools
243+
244+
**Very limited use cases:**
245+
- Pre-downloaded models with purely local processing
246+
- Air-gapped security environments (models must be pre-staged)
247+
248+
#### Container Network Sharing
249+
Use `--network container:name` to share the network namespace with another container.
250+
251+
**Use cases:**
252+
- Sidecar patterns (logging, monitoring, caching)
253+
- Service mesh architectures
254+
- Sharing network namespaces between related containers
255+
256+
See Docker documentation for `--network container:name` usage.
257+
258+
#### Custom Networks
259+
Use custom Docker networks for multi-container applications. Create with `docker network create` and specify with `--network network-name`.
260+
261+
**Use cases:**
262+
- Multi-container applications
263+
- Service discovery by container name
264+
265+
See Docker documentation for custom network creation and management.
266+
267+
#### Network Mode Comparison
268+
269+
| Mode | Performance | Security | Use Case | Dynamo Compatibility | Port Sharing | Port Publishing |
270+
|------|-------------|----------|----------|---------------------|---------------|-----------------|
271+
| `host` | Highest | Lower | ML/GPU workloads, high-performance services | ✅ Full | ⚠️ **Shared with host** (one NATS/etcd only) | ❌ Not needed |
272+
| `bridge` | Good | Higher | General web services, controlled port exposure | ✅ Full | ✅ Isolated ports |`-p host:container` |
273+
| `none` | N/A | Highest | Air-gapped environments only | ⚠️ **Very Limited** | ✅ No network | ❌ No network |
274+
| `container:name` | Good | Medium | Sidecar patterns, shared network stacks | ✅ Full | ⚠️ Shared with target container | ❌ Use target's ports |
275+
| Custom networks | Good | Medium | Multi-container applications | ✅ Full | ✅ Isolated ports |`-p host:container` |
200276

201277
## Workflow Examples
202278

@@ -206,30 +282,51 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
206282
./build.sh --framework vllm --target local-dev
207283

208284
# 2. Run development container using the local-dev image
209-
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it
285+
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it
210286

211287
# 3. Inside container, run inference (requires both frontend and backend)
212288
# Start frontend
213289
python -m dynamo.frontend &
214290

215291
# Start backend (vLLM example)
216-
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 &
292+
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
217293
```
218294

219295
### Production Workflow
220296
```bash
221297
# 1. Build production image
222298
./build.sh --framework vllm --release-build
223299

224-
# 2. Run production container
225-
./run.sh --image dynamo:latest-vllm-local-dev --gpus all
300+
# 2. Run production container (runs as root)
301+
./run.sh --image dynamo:latest-vllm --gpus all
226302
```
227303

228-
### Testing Workflow
304+
### CI/CD Workflow
229305
```bash
230-
# 1. Build with no cache for clean build
306+
# 1. Build image for CI
231307
./build.sh --framework vllm --no-cache
232308

233-
# 2. Test container functionality (--image defaults to dynamo:latest-vllm)
234-
./run.sh --mount-workspace -it -- python -m pytest tests/
309+
# 2. Run tests with network isolation for reproducible results
310+
./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/ubuntu/.cache -- python -m pytest tests/
311+
312+
# 3. Inside the container with bridge networking, start services
313+
# Note: Services are only accessible from the same container - no port conflicts with host
314+
nats-server -js &
315+
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
316+
python -m dynamo.frontend &
317+
318+
# 4. Start worker backend (choose one framework):
319+
# vLLM
320+
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &
321+
322+
# SGLang
323+
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.sglang --model Qwen/Qwen3-0.6B --mem-fraction-static 0.20 --max-running-requests 64 &
324+
325+
# TensorRT-LLM
326+
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.trtllm --model Qwen/Qwen3-0.6B --free-gpu-memory-fraction 0.20 --max-num-tokens 8192 --max-batch-size 64 &
235327
```
328+
329+
**Framework-Specific GPU Memory Arguments:**
330+
- **vLLM**: `--gpu-memory-utilization 0.20` (use 20% GPU memory), `--enforce-eager` (disable CUDA graphs), `--no-enable-prefix-caching` (save memory), `--max-num-seqs 64` (max concurrent sequences)
331+
- **SGLang**: `--mem-fraction-static 0.20` (20% GPU memory for static allocation), `--max-running-requests 64` (max concurrent requests)
332+
- **TensorRT-LLM**: `--free-gpu-memory-fraction 0.20` (reserve 20% GPU memory), `--max-num-tokens 8192` (max tokens in batch), `--max-batch-size 64` (max batch size)

container/run.sh

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,15 @@ DEFAULT_HF_CACHE=${SOURCE_DIR}/.cache/huggingface
3636
GPUS="all"
3737
PRIVILEGED=
3838
VOLUME_MOUNTS=
39+
PORT_MAPPINGS=
3940
MOUNT_WORKSPACE=
4041
ENVIRONMENT_VARIABLES=
4142
REMAINING_ARGS=
4243
INTERACTIVE=
4344
USE_NIXL_GDS=
4445
RUNTIME=nvidia
4546
WORKDIR=/workspace
47+
NETWORK=host
4648

4749
get_options() {
4850
while :; do
@@ -148,6 +150,14 @@ get_options() {
148150
missing_requirement "$1"
149151
fi
150152
;;
153+
-p|--port)
154+
if [ "$2" ]; then
155+
PORT_MAPPINGS+=" -p $2 "
156+
shift
157+
else
158+
missing_requirement "$1"
159+
fi
160+
;;
151161
-e)
152162
if [ "$2" ]; then
153163
ENVIRONMENT_VARIABLES+=" -e $2 "
@@ -165,6 +175,14 @@ get_options() {
165175
--use-nixl-gds)
166176
USE_NIXL_GDS=TRUE
167177
;;
178+
--network)
179+
if [ "$2" ]; then
180+
NETWORK=$2
181+
shift
182+
else
183+
missing_requirement "$1"
184+
fi
185+
;;
168186
--dry-run)
169187
RUN_PREFIX="echo"
170188
echo ""
@@ -304,7 +322,12 @@ show_help() {
304322
echo " [--hf-cache directory to volume mount as the hf cache, default is NONE unless mounting workspace]"
305323
echo " [--gpus gpus to enable, default is 'all', 'none' disables gpu support]"
306324
echo " [--use-nixl-gds add volume mounts and capabilities needed for NVIDIA GPUDirect Storage]"
325+
echo " [--network network mode for container, default is 'host']"
326+
echo " Options: 'host' (default), 'bridge', 'none', 'container:name'"
327+
echo " Examples: --network bridge (isolated), --network none (no network - WARNING: breaks most functionality)"
328+
echo " --network container:redis (share network with 'redis' container)"
307329
echo " [-v add volume mount]"
330+
echo " [-p|--port add port mapping (host_port:container_port)]"
308331
echo " [-e add environment variable]"
309332
echo " [--mount-workspace set up for local development]"
310333
echo " [-- stop processing and pass remaining args as command to docker run]"
@@ -335,14 +358,15 @@ ${RUN_PREFIX} docker run \
335358
${GPU_STRING} \
336359
${INTERACTIVE} \
337360
${RM_STRING} \
338-
--network host \
361+
--network "$NETWORK" \
339362
${RUNTIME:+--runtime "$RUNTIME"} \
340363
--shm-size=10G \
341364
--ulimit memlock=-1 \
342365
--ulimit stack=67108864 \
343366
--ulimit nofile=65536:65536 \
344367
${ENVIRONMENT_VARIABLES} \
345368
${VOLUME_MOUNTS} \
369+
${PORT_MAPPINGS} \
346370
-w "$WORKDIR" \
347371
--cap-add CAP_SYS_PTRACE \
348372
${NIXL_GDS_CAPS} \

0 commit comments

Comments
 (0)