Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
09ab3e1
feat: add download acceleration infrastructure
deanq Aug 16, 2025
795c9e5
feat: integrate download acceleration with dependency installer
deanq Aug 16, 2025
046eb58
feat: add workspace acceleration support
deanq Aug 16, 2025
45a65fe
test: add download acceleration test coverage
deanq Aug 16, 2025
ce51390
chore: moved test-handler files to src/
deanq Aug 16, 2025
6c04de1
feat: runtime uses aria2 for accelerated parallel downloads
deanq Aug 16, 2025
66eb286
chore: update project structure and dependencies
deanq Aug 16, 2025
1930b4b
chore: updated tetra-rp
deanq Aug 19, 2025
731fd56
build: local-execution-test use make test-handler
deanq Aug 19, 2025
e829140
chore: update CLAUDE.md
deanq Aug 19, 2025
104b2da
chore: move these values to constants.py for maintainability
deanq Aug 19, 2025
f8aa89a
feat: add system package acceleration with nala
deanq Aug 19, 2025
cd56185
refactor: disable Python package download acceleration
deanq Aug 20, 2025
d7c996d
test: uv is no longer part of download accelerator
deanq Aug 20, 2025
2ab93e3
feat: implement accelerate_downloads parameter logic in RemoteExecutor
deanq Aug 21, 2025
b50a7bf
feat: add pip fallback for Python dependencies when acceleration disa…
deanq Aug 21, 2025
440d00d
feat: enhance HF model caching with hf_transfer/hf_xet strategy
deanq Aug 21, 2025
0320e4d
test: add comprehensive coverage for accelerate_downloads parameter
deanq Aug 21, 2025
034f770
test: update integration tests for new acceleration parameter
deanq Aug 21, 2025
9531079
chore: update dependencies and constants for download acceleration
deanq Aug 21, 2025
d75d320
refactor: remove pip installation method from dependency installer
deanq Aug 21, 2025
227b33e
test: update unit tests to expect UV instead of pip
deanq Aug 21, 2025
338a165
test: rename test file from pip to UV naming convention
deanq Aug 21, 2025
f88745d
feat: implement parallel execution for accelerated downloads
deanq Aug 21, 2025
f22e74d
feat: add async wrapper for HuggingFace model download acceleration
deanq Aug 21, 2025
816fc75
test: update tests for parallel execution and async dependencies
deanq Aug 21, 2025
c9ad0d3
test: comprehensive test coverage expansion and cleanup
deanq Aug 21, 2025
e31137a
refactor: optimize HF acceleration to use native Hub features
deanq Aug 21, 2025
e1db417
chore: memory correction
deanq Aug 21, 2025
76ab9c0
feat: implement HuggingFace download acceleration strategies
deanq Aug 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 1 addition & 16 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,22 +99,7 @@ jobs:
run: make setup

- name: Test local handler execution
run: |
echo "Testing handler with all test_*.json files..."
passed=0
total=0
for test_file in test_*.json; do
total=$((total + 1))
echo "Testing with $test_file..."
if timeout 30s env PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat "$test_file")" uv run python src/handler.py >/dev/null 2>&1; then
echo "✓ $test_file: PASSED"
passed=$((passed + 1))
else
echo "✗ $test_file: FAILED"
exit 1
fi
done
echo "All $passed/$total handler tests passed!"
run: make test-handler

release:
runs-on: ubuntu-latest
Expand Down
26 changes: 15 additions & 11 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,8 @@ make build-cpu # Build CPU-only Docker image

### Local Testing
```bash
# Test handler locally with test_input.json
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_input.json)" uv run python src/handler.py

# Test with other test files
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_class_input.json)" uv run python src/handler.py
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_hf_input.json)" uv run python src/handler.py
# Test handler locally with test*.json
make test-handler
```

### Submodule Management
Expand Down Expand Up @@ -122,6 +118,14 @@ The handler automatically detects and utilizes `/runpod-volume` for persistent w
- **Optimized Resource Usage**: Shared caches across multiple endpoints while maintaining isolation
- **ML Model Efficiency**: Large HF models cached on volume prevent "No space left on device" errors

### HuggingFace Model Acceleration
The system automatically leverages HuggingFace's native acceleration features:
- **hf_transfer**: Accelerated downloads for large model files when available
- **hf_xet**: Automatic chunk-level deduplication and incremental downloads (huggingface_hub>=0.32.0)
- **Native Integration**: Uses HF Hub's `snapshot_download()` for optimal caching and acceleration
- **Transparent Operation**: No code changes needed - acceleration is automatic when repositories support it
- **Token Support**: Configured via `HF_TOKEN` environment variable for private repositories

## Configuration

### Environment Variables
Expand Down Expand Up @@ -160,11 +164,6 @@ make test-integration # Run integration tests only
make test-coverage # Run tests with coverage report
make test-fast # Run tests with fail-fast mode
make test-handler # Test handler locally with all test_*.json files (same as CI)

# Test handler locally with specific test files
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_input.json)" uv run python src/handler.py
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_class_input.json)" uv run python src/handler.py
PYTHONPATH=src RUNPOD_TEST_INPUT="$(cat test_hf_input.json)" uv run python src/handler.py
```

### Testing Framework
Expand Down Expand Up @@ -261,3 +260,8 @@ Configure these in GitHub repository settings:

### Docker Guidelines
- Docker container should never refer to src/

- Always run `make quality-check` before pronouncing you have finished your work
- Always use `git mv` when moving existing files around

- Run the command `make test-handler` to run checks on test files. Do not try to run it one by one like `Bash(env RUNPOD_TEST_INPUT="$(cat test_input.json)" PYTHONPATH=. uv run python handler.py)`
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& chmod +x /usr/local/bin/uv

# Copy app code and install dependencies
COPY README.md src/* pyproject.toml uv.lock test_*.json test-handler.sh ./
COPY README.md src/* pyproject.toml uv.lock ./
RUN uv sync


Expand All @@ -19,11 +19,12 @@ FROM pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime

WORKDIR /app

# Install nala for system package acceleration in runtime stage
RUN apt-get update && apt-get install -y --no-install-recommends nala \
&& rm -rf /var/lib/apt/lists/*

# Copy app and uv binary from builder
COPY --from=builder /app /app
COPY --from=builder /usr/local/bin/uv /usr/local/bin/uv

# Clean up any unnecessary system tools
RUN rm -rf /var/lib/apt/lists/*

CMD ["uv", "run", "handler.py"]
4 changes: 2 additions & 2 deletions Dockerfile-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& chmod +x /usr/local/bin/uv

# Copy app files and install deps
COPY README.md src/* pyproject.toml uv.lock test_*.json test-handler.sh ./
COPY README.md src/* pyproject.toml uv.lock ./
RUN uv sync

# Stage 2: Runtime stage
Expand All @@ -21,7 +21,7 @@ WORKDIR /app

# Install runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
curl ca-certificates \
curl ca-certificates nala \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

Expand Down
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ test-fast: # Run tests with fast-fail mode
uv run pytest tests/ -v -x --tb=short

test-handler: # Test handler locally with all test_*.json files
./test-handler.sh
cd src && ./test-handler.sh

# Smoke Tests (local on Mac OS)

Expand Down Expand Up @@ -97,7 +97,7 @@ format-check: # Check code formatting

# Type checking
typecheck: # Check types with mypy
uv run mypy .
uv run mypy src/

# Quality gates (used in CI)
quality-check: format-check lint typecheck test-coverage
quality-check: format-check lint typecheck test-coverage test-handler
47 changes: 24 additions & 23 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ requires-python = ">=3.9,<3.13"
dependencies = [
"cloudpickle>=3.1.1",
"pydantic>=2.11.4",
"requests>=2.25.0",
"runpod",
"hf_transfer>=0.1.0",
"huggingface_hub>=0.32.0",
]

[dependency-groups]
Expand All @@ -18,6 +21,7 @@ dev = [
"pytest-asyncio>=0.24.0",
"ruff>=0.8.0",
"mypy>=1.11.0",
"types-requests>=2.25.0",
]

[tool.pytest.ini_options]
Expand Down Expand Up @@ -48,40 +52,37 @@ filterwarnings = [
"ignore::pytest.PytestUnknownMarkWarning"
]

[tool.ruff]
# Exclude tetra-rp directory since it's a separate repository
exclude = [
"tetra-rp/",
]

[tool.mypy]
# Basic configuration
python_version = "3.9"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = false # Start lenient, can be stricter later
disallow_incomplete_defs = false
check_untyped_defs = true

# Import discovery
mypy_path = "src"
mypy_path = ["src"]
explicit_package_bases = true
namespace_packages = true

# Error output
check_untyped_defs = true
disallow_any_generics = true
disallow_untyped_defs = false
warn_redundant_casts = true
warn_unused_ignores = true
warn_return_any = true
strict_optional = true
show_error_codes = true
show_column_numbers = true
pretty = true

# Exclude directories
exclude = [
"tetra-rp/",
"tests/", # Start by excluding tests, can add later
]

# Per-module options
[[tool.mypy.overrides]]
module = [
"runpod.*",
"cloudpickle.*",
"cloudpickle",
"runpod",
"transformers",
"hf_transfer",
"huggingface_hub",
]
ignore_missing_imports = true

[tool.ruff]
# Exclude tetra-rp directory since it's a separate repository
exclude = [
"tetra-rp/",
]
1 change: 1 addition & 0 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Worker Tetra package."""
2 changes: 1 addition & 1 deletion src/class_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def __init__(self, workspace_manager):
super().__init__(workspace_manager)
# Instance registry for persistent class instances
self.class_instances: Dict[str, Any] = {}
self.instance_metadata: Dict[str, Dict] = {}
self.instance_metadata: Dict[str, Dict[str, Any]] = {}

def execute(self, request: FunctionRequest) -> FunctionResponse:
"""Execute class method - required by BaseExecutor interface."""
Expand Down
72 changes: 72 additions & 0 deletions src/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,75 @@

RUNTIMES_DIR_NAME = "runtimes"
"""Name of the runtimes directory containing per-endpoint workspaces."""

# Download Acceleration Settings
MIN_SIZE_FOR_ACCELERATION_MB = 10
"""Minimum file size in MB to trigger download acceleration."""

DOWNLOAD_TIMEOUT_SECONDS = 600
"""Default timeout for download operations in seconds."""

# New download accelerator settings
HF_TRANSFER_ENABLED = True
"""Enable hf_transfer for fresh HuggingFace downloads."""


# Size Conversion Constants
BYTES_PER_MB = 1024 * 1024
"""Number of bytes in a megabyte."""

MB_SIZE_THRESHOLD = 1 * BYTES_PER_MB
"""Minimum file size threshold for considering acceleration (1MB)."""

# HuggingFace Model Patterns
LARGE_HF_MODEL_PATTERNS = [
"albert-large",
"albert-xlarge",
"bart-large",
"bert-large",
"bert-base",
"codegen",
"diffusion",
"distilbert-base",
"falcon",
"gpt",
"hubert",
"llama",
"mistral",
"mpt",
"pegasus",
"roberta-large",
"roberta-base",
"santacoder",
"stable-diffusion",
"t5",
"vae",
"wav2vec2",
"whisper",
"xlm-roberta",
"xlnet",
]
"""List of HuggingFace model patterns that benefit from download acceleration."""

# System Package Acceleration with Nala
LARGE_SYSTEM_PACKAGES = [
"build-essential",
"cmake",
"cuda-toolkit",
"curl",
"g++",
"gcc",
"git",
"libssl-dev",
"nvidia-cuda-dev",
"python3-dev",
"wget",
]
"""List of system packages that benefit from nala's accelerated installation."""

NALA_CHECK_CMD = ["which", "nala"]
"""Command to check if nala is available."""

# Logging Configuration
LOG_FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
"""Standard log format string used across the application."""
Loading
Loading