@@ -167,36 +167,112 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
167167- ** GPU Management** : Automatic GPU detection and allocation
168168- ** Volume Mounting** : Workspace and HuggingFace cache mounting
169169- ** User Management** : Root or user-based container execution
170- - ** Network Configuration** : Host networking for service communication
170+ - ** Network Configuration** : Configurable networking modes (host, bridge, none, container sharing)
171171- ** Resource Limits** : Memory, file descriptors, and IPC configuration
172172
173173** Common Usage Examples:**
174174
175175``` bash
176- # Basic container launch (inference/production)
177- ./run.sh --image dynamo:latest-vllm
176+ # Basic container launch (inference/production, runs as root user )
177+ ./run.sh --image dynamo:latest-vllm -v $HOME /.cache:/home/ubuntu/.cache
178178
179- # Mount workspace for development (use local-dev image for local user permissions)
180- ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace
179+ # Mount workspace for development (use local-dev image for local host user permissions)
180+ ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
181181
182182# Use specific image and framework for development
183- ./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace
183+ ./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
184184
185185# Interactive development shell with workspace mounted
186- ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -- bash
186+ ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache - it -- bash
187187
188188# Development with custom environment variables
189- ./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace
190-
191- # Production inference without GPU access
192- ./run.sh --image dynamo:latest-vllm --gpus none
189+ ./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
193190
194191# Dry run to see docker command
195192./run.sh --dry-run
196193
197194# Development with custom volume mounts
198- ./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace
195+ ./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
196+ ```
197+
198+ ### Network Configuration Options
199+
200+ The ` run.sh ` script supports different networking modes via the ` --network ` flag (defaults to ` host ` ):
201+
202+ #### Host Networking (Default)
203+ ``` bash
204+ # Same examples with local host user permissions
205+ ./run.sh --image dynamo:latest-vllm-local-dev --network host -v $HOME /.cache:/home/ubuntu/.cache
206+ ./run.sh --image dynamo:latest-vllm-local-dev -v $HOME /.cache:/home/ubuntu/.cache
207+ ```
208+ ** Use cases:**
209+ - High-performance ML inference (default for GPU workloads)
210+ - Services that need direct host port access
211+ - Maximum network performance with minimal overhead
212+ - Sharing services with the host machine (NATS, etcd, etc.)
213+
214+ ** ⚠️ Port Sharing Limitation:** Host networking shares all ports with the host machine, which means you can only run ** one instance** of services like NATS (port 4222) or etcd (port 2379) across all containers and the host.
215+
216+ #### Bridge Networking (Isolated)
217+ ``` bash
218+ # CI/testing with isolated bridge networking and host cache sharing
219+ ./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME /.cache:/home/ubuntu/.cache
220+ ```
221+ ** Use cases:**
222+ - Secure isolation from host network
223+ - CI/CD pipelines requiring complete isolation
224+ - When you need absolute control of ports
225+ - Exposing specific services to host while maintaining isolation
226+
227+ ** Note:** For port sharing with the host, use the ` --port ` or ` -p ` option with format ` host_port:container_port ` (e.g., ` --port 8000:8000 ` or ` -p 9081:8081 ` ) to expose specific container ports to the host.
228+
229+ #### No Networking ⚠️ ** LIMITED FUNCTIONALITY**
230+ ``` bash
231+ # Complete network isolation - no external connectivity
232+ ./run.sh --image dynamo:latest-vllm --network none --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
233+
234+ # Same with local user permissions
235+ ./run.sh --image dynamo:latest-vllm-local-dev --network none --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache
199236```
237+ ** ⚠️ WARNING: ` --network none ` severely limits Dynamo functionality:**
238+ - ** No model downloads** - HuggingFace models cannot be downloaded
239+ - ** No API access** - Cannot reach external APIs or services
240+ - ** No distributed inference** - Multi-node setups won't work
241+ - ** No monitoring/logging** - External monitoring systems unreachable
242+ - ** Limited debugging** - Cannot access external debugging tools
243+
244+ ** Very limited use cases:**
245+ - Pre-downloaded models with purely local processing
246+ - Air-gapped security environments (models must be pre-staged)
247+
248+ #### Container Network Sharing
249+ Use ` --network container:name ` to share the network namespace with another container.
250+
251+ ** Use cases:**
252+ - Sidecar patterns (logging, monitoring, caching)
253+ - Service mesh architectures
254+ - Sharing network namespaces between related containers
255+
256+ See Docker documentation for ` --network container:name ` usage.
257+
258+ #### Custom Networks
259+ Use custom Docker networks for multi-container applications. Create with ` docker network create ` and specify with ` --network network-name ` .
260+
261+ ** Use cases:**
262+ - Multi-container applications
263+ - Service discovery by container name
264+
265+ See Docker documentation for custom network creation and management.
266+
267+ #### Network Mode Comparison
268+
269+ | Mode | Performance | Security | Use Case | Dynamo Compatibility | Port Sharing | Port Publishing |
270+ | ------| -------------| ----------| ----------| ---------------------| ---------------| -----------------|
271+ | ` host ` | Highest | Lower | ML/GPU workloads, high-performance services | ✅ Full | ⚠️ ** Shared with host** (one NATS/etcd only) | ❌ Not needed |
272+ | ` bridge ` | Good | Higher | General web services, controlled port exposure | ✅ Full | ✅ Isolated ports | ✅ ` -p host:container ` |
273+ | ` none ` | N/A | Highest | Air-gapped environments only | ⚠️ ** Very Limited** | ✅ No network | ❌ No network |
274+ | ` container:name ` | Good | Medium | Sidecar patterns, shared network stacks | ✅ Full | ⚠️ Shared with target container | ❌ Use target's ports |
275+ | Custom networks | Good | Medium | Multi-container applications | ✅ Full | ✅ Isolated ports | ✅ ` -p host:container ` |
200276
201277## Workflow Examples
202278
@@ -206,30 +282,51 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
206282./build.sh --framework vllm --target local-dev
207283
208284# 2. Run development container using the local-dev image
209- ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it
285+ ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME /.cache:/home/ubuntu/.cache - it
210286
211287# 3. Inside container, run inference (requires both frontend and backend)
212288# Start frontend
213289python -m dynamo.frontend &
214290
215291# Start backend (vLLM example)
216- python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 &
292+ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
217293```
218294
219295### Production Workflow
220296``` bash
221297# 1. Build production image
222298./build.sh --framework vllm --release-build
223299
224- # 2. Run production container
225- ./run.sh --image dynamo:latest-vllm-local-dev --gpus all
300+ # 2. Run production container (runs as root)
301+ ./run.sh --image dynamo:latest-vllm --gpus all
226302```
227303
228- ### Testing Workflow
304+ ### CI/CD Workflow
229305``` bash
230- # 1. Build with no cache for clean build
306+ # 1. Build image for CI
231307./build.sh --framework vllm --no-cache
232308
233- # 2. Test container functionality (--image defaults to dynamo:latest-vllm)
234- ./run.sh --mount-workspace -it -- python -m pytest tests/
309+ # 2. Run tests with network isolation for reproducible results
310+ ./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME /.cache:/home/ubuntu/.cache -- python -m pytest tests/
311+
312+ # 3. Inside the container with bridge networking, start services
313+ # Note: Services are only accessible from the same container - no port conflicts with host
314+ nats-server -js &
315+ etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
316+ python -m dynamo.frontend &
317+
318+ # 4. Start worker backend (choose one framework):
319+ # vLLM
320+ DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &
321+
322+ # SGLang
323+ DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.sglang --model Qwen/Qwen3-0.6B --mem-fraction-static 0.20 --max-running-requests 64 &
324+
325+ # TensorRT-LLM
326+ DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.trtllm --model Qwen/Qwen3-0.6B --free-gpu-memory-fraction 0.20 --max-num-tokens 8192 --max-batch-size 64 &
235327```
328+
329+ ** Framework-Specific GPU Memory Arguments:**
330+ - ** vLLM** : ` --gpu-memory-utilization 0.20 ` (use 20% GPU memory), ` --enforce-eager ` (disable CUDA graphs), ` --no-enable-prefix-caching ` (save memory), ` --max-num-seqs 64 ` (max concurrent sequences)
331+ - ** SGLang** : ` --mem-fraction-static 0.20 ` (20% GPU memory for static allocation), ` --max-running-requests 64 ` (max concurrent requests)
332+ - ** TensorRT-LLM** : ` --free-gpu-memory-fraction 0.20 ` (reserve 20% GPU memory), ` --max-num-tokens 8192 ` (max tokens in batch), ` --max-batch-size 64 ` (max batch size)
0 commit comments