feat: add security improvements and reliability enhancements#4
Merged
waybarrios merged 3 commits intowaybarrios:mainfrom Jan 16, 2026
Merged
Conversation
Security fixes: - Fix timing attack vulnerability in API key verification using secrets.compare_digest() - Add rate limiting support with sliding window algorithm (--rate-limit flag) - Add request timeout support to prevent resource exhaustion (--timeout flag) Reliability improvements: - Add TempFileManager for automatic cleanup of temporary files (images/videos) - Register temp files with atexit handler for guaranteed cleanup - Fix race condition in RequestOutputCollector with thread-safe locking API changes: - Add `timeout` parameter to ChatCompletionRequest and CompletionRequest - Add --timeout and --rate-limit CLI arguments
The asyncio.wait_for() timeout was not working because the underlying model.generate() and model.chat() calls are synchronous and block the event loop. This change wraps them in asyncio.to_thread() so that the timeout can properly interrupt long-running generation requests. Also adds mise.toml for Python version management (3.12).
Owner
Test ResultsAll features have been verified and tested:
Unit Tests Added13 new tests covering:
Test Run OutputReady for merge. |
Owner
|
I made these unit tests so far: 8131452, but let me know if I am missing something @ersintarhan . Btw great job! This is useful! |
Contributor
Author
|
Great tests! I've added a few more to improve coverage:
|
waybarrios
pushed a commit
that referenced
this pull request
Jan 26, 2026
…ching (#4) Gemma 3's model __call__() requires pixel_values as a positional argument, unlike Qwen2-VL which makes it optional. This caused "missing required positional argument: 'pixel_values'" errors when using continuous batching with text-only requests. The MLLMModelWrapper now injects pixel_values=None for text-only requests, enabling Gemma 3 to work with continuous batching and prefix caching. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
WainWong
pushed a commit
to WainWong/vllm-mlx
that referenced
this pull request
Mar 2, 2026
…barrios#4) Wrap json.dumps() in build_json_system_prompt() and parse_json_output() calls with try/except to return HTTP 400 instead of crashing the server when clients send invalid JSON schemas in response_format. Co-authored-by: Raullen <raullenstudio@raullenacstudio.lan> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
7 tasks
mtomcal
added a commit
to mtomcal/vllm-mlx
that referenced
this pull request
Apr 4, 2026
Refactor streaming to use tested granular event builders instead of inline dict construction, fixing the gap where tested code wasn't production code (waybarrios#13). Fix text omission in completed events (waybarrios#6), add [DONE] sentinel (waybarrios#8), use typed output models to prevent cross-type field leakage (waybarrios#4, waybarrios#5), fix content join separator (waybarrios#10), remove dead code branches (waybarrios#9, waybarrios#11), and warn on unrecognized content types (waybarrios#7). Add Codex CLI setup guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Thump604
added a commit
to Thump604/vllm-mlx
that referenced
this pull request
Apr 14, 2026
Critical: - Wire promote_from_ssd into _schedule_waiting() via _try_promote_ssd_pending() so SSD fetch path is actually functional (was defined but never called) - Add shield-and-await-on-cancel pattern to async_promote() per Golden Rule waybarrios#4 to prevent RAM budget leaks on task cancellation - Add threading.Lock to SSDIndex for all public methods (writer thread and main thread were sharing connection without synchronization) Important: - Fix check_ssd() called twice in scheduler fetch path (wasted SQLite query) - Wire close_ssd_tier() into scheduler reset() for clean shutdown - Make reserve_budget() actually reserve (increment _current_memory) instead of just checking, to prevent budget overcommit during concurrent promotions
Thump604
added a commit
to Thump604/vllm-mlx
that referenced
this pull request
Apr 16, 2026
Critical: - Wire promote_from_ssd into _schedule_waiting() via _try_promote_ssd_pending() so SSD fetch path is actually functional (was defined but never called) - Add shield-and-await-on-cancel pattern to async_promote() per Golden Rule waybarrios#4 to prevent RAM budget leaks on task cancellation - Add threading.Lock to SSDIndex for all public methods (writer thread and main thread were sharing connection without synchronization) Important: - Fix check_ssd() called twice in scheduler fetch path (wasted SQLite query) - Wire close_ssd_tier() into scheduler reset() for clean shutdown - Make reserve_budget() actually reserve (increment _current_memory) instead of just checking, to prevent budget overcommit during concurrent promotions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds several security and reliability improvements:
Security Fixes
Reliability
API Changes