Skip to content

feat: add security improvements and reliability enhancements#4

Merged
waybarrios merged 3 commits intowaybarrios:mainfrom
ersintarhan:feat/security-and-reliability
Jan 16, 2026
Merged

feat: add security improvements and reliability enhancements#4
waybarrios merged 3 commits intowaybarrios:mainfrom
ersintarhan:feat/security-and-reliability

Conversation

@ersintarhan
Copy link
Copy Markdown
Contributor

Summary

This PR adds several security and reliability improvements:

Security Fixes

  • Timing attack prevention with secrets.compare_digest()
  • Rate limiting (--rate-limit flag)
  • Request timeout (--timeout flag)

Reliability

  • TempFileManager for auto-cleanup of temp files
  • Thread-safe _waiting_consumers counter

API Changes

  • timeout parameter in requests
  • --timeout and --rate-limit CLI args

Security fixes:
- Fix timing attack vulnerability in API key verification using secrets.compare_digest()
- Add rate limiting support with sliding window algorithm (--rate-limit flag)
- Add request timeout support to prevent resource exhaustion (--timeout flag)

Reliability improvements:
- Add TempFileManager for automatic cleanup of temporary files (images/videos)
- Register temp files with atexit handler for guaranteed cleanup
- Fix race condition in RequestOutputCollector with thread-safe locking

API changes:
- Add `timeout` parameter to ChatCompletionRequest and CompletionRequest
- Add --timeout and --rate-limit CLI arguments
The asyncio.wait_for() timeout was not working because the underlying
model.generate() and model.chat() calls are synchronous and block the
event loop. This change wraps them in asyncio.to_thread() so that the
timeout can properly interrupt long-running generation requests.

Also adds mise.toml for Python version management (3.12).
@waybarrios
Copy link
Copy Markdown
Owner

Test Results

All features have been verified and tested:

Feature Status
Timing attack prevention (secrets.compare_digest) ✅ Verified
Rate limiting (RateLimiter class) ✅ Verified
Request timeout (--timeout flag) ✅ Verified
TempFileManager (auto-cleanup) ✅ Verified
Thread-safe _waiting_consumers ✅ Verified

Unit Tests Added

13 new tests covering:

  • TestRateLimiter: disabled mode, limit enforcement, per-client tracking, thread safety
  • TestTempFileManager: register/cleanup, cleanup_all, nonexistent files, thread safety
  • TestRequestOutputCollectorThreadSafety: counter manipulation, has_waiting_consumers
  • TestRequestTimeoutField: ChatCompletionRequest and CompletionRequest timeout fields
  • TestAPIKeyVerification: secrets.compare_digest usage

Test Run Output

30 passed, 3 deselected in 1.88s

Ready for merge.

@waybarrios
Copy link
Copy Markdown
Owner

waybarrios commented Jan 16, 2026

I made these unit tests so far: 8131452, but let me know if I am missing something @ersintarhan . Btw great job! This is useful!

@ersintarhan
Copy link
Copy Markdown
Contributor Author

Great tests! I've added a few more to improve coverage:

  • test_verify_api_key_rejects_invalid - 401 response for invalid keys

  • test_verify_api_key_accepts_valid - Valid key acceptance

  • test_rate_limiter_returns_retry_after - Retry-After header when limit exceeded

  • test_rate_limiter_window_cleanup - Sliding window cleanup behavior

    All 34 tests passing now. LGTM!

Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR approved.

@waybarrios waybarrios merged commit 03c60b4 into waybarrios:main Jan 16, 2026
waybarrios pushed a commit that referenced this pull request Jan 26, 2026
…ching (#4)

Gemma 3's model __call__() requires pixel_values as a positional argument,
unlike Qwen2-VL which makes it optional. This caused "missing required
positional argument: 'pixel_values'" errors when using continuous batching
with text-only requests.

The MLLMModelWrapper now injects pixel_values=None for text-only requests,
enabling Gemma 3 to work with continuous batching and prefix caching.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
WainWong pushed a commit to WainWong/vllm-mlx that referenced this pull request Mar 2, 2026
…barrios#4)

Wrap json.dumps() in build_json_system_prompt() and parse_json_output()
calls with try/except to return HTTP 400 instead of crashing the server
when clients send invalid JSON schemas in response_format.

Co-authored-by: Raullen <raullenstudio@raullenacstudio.lan>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mtomcal added a commit to mtomcal/vllm-mlx that referenced this pull request Apr 4, 2026
Refactor streaming to use tested granular event builders instead of
inline dict construction, fixing the gap where tested code wasn't
production code (waybarrios#13). Fix text omission in completed events (waybarrios#6),
add [DONE] sentinel (waybarrios#8), use typed output models to prevent
cross-type field leakage (waybarrios#4, waybarrios#5), fix content join separator (waybarrios#10),
remove dead code branches (waybarrios#9, waybarrios#11), and warn on unrecognized content
types (waybarrios#7). Add Codex CLI setup guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thump604 added a commit to Thump604/vllm-mlx that referenced this pull request Apr 14, 2026
Critical:
- Wire promote_from_ssd into _schedule_waiting() via _try_promote_ssd_pending()
  so SSD fetch path is actually functional (was defined but never called)
- Add shield-and-await-on-cancel pattern to async_promote() per Golden Rule waybarrios#4
  to prevent RAM budget leaks on task cancellation
- Add threading.Lock to SSDIndex for all public methods (writer thread and
  main thread were sharing connection without synchronization)

Important:
- Fix check_ssd() called twice in scheduler fetch path (wasted SQLite query)
- Wire close_ssd_tier() into scheduler reset() for clean shutdown
- Make reserve_budget() actually reserve (increment _current_memory) instead
  of just checking, to prevent budget overcommit during concurrent promotions
Thump604 added a commit to Thump604/vllm-mlx that referenced this pull request Apr 16, 2026
Critical:
- Wire promote_from_ssd into _schedule_waiting() via _try_promote_ssd_pending()
  so SSD fetch path is actually functional (was defined but never called)
- Add shield-and-await-on-cancel pattern to async_promote() per Golden Rule waybarrios#4
  to prevent RAM budget leaks on task cancellation
- Add threading.Lock to SSDIndex for all public methods (writer thread and
  main thread were sharing connection without synchronization)

Important:
- Fix check_ssd() called twice in scheduler fetch path (wasted SQLite query)
- Wire close_ssd_tier() into scheduler reset() for clean shutdown
- Make reserve_budget() actually reserve (increment _current_memory) instead
  of just checking, to prevent budget overcommit during concurrent promotions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants